In the design of large scale systems, hardware cost plays an important role. Especially in the era of big data and an ever-increasing amount of data to process. One facet of scaling is scaling out (also known as horizontal scaling) which means distributing data on multiple machines to improve performance and reliability.
There are two common ways to distribute data across multiple nodes, replication and sharding.1
Replication is the concept of keeping the same data on multiple servers (each copy is of the data is known as replica).
There are two primary reasons for replicating data. One is to increase the reliability of a system, the other is to improve performance.2
- Reliability increases with the number of copies as the chance of losing data decreases as the number of copies increases.
- Performance is also proportional to the number of servers as using more servers usually means more processing capacity.
Sharding (also known as partitioning) is a way of dividing a large dataset into smaller ones (shards) so that it could be split between multiple machines. There are multiple reasons to do that, but we’ll focus here only on performance reasons.
For this article, I’ll define efficiency as a measure inverse proportional to the cost of the system (number of servers ) that can satisfy some pre-defined reliability and performance requirements ().
I have defined two dimensions for vertical scaling. Also, claiming that both may improve performance.
I want to focus on the following problem:How to choose the number of shards, and the number of replicas such that efficiency is optimal (satisfy performance requirements with the lowest number of servers)?
When a share-nothing system is close to its performance capacity I’ve seen the following approach to be used in practice:
- if better write performance is needed → increase the number of shards
- if better read performance is needed → increase the replication factor (number of replicas)
I want to argue that the choice isn’t necessarily binary, and propose a model for reasoning about this problem.
The goal of this post is to explore a way of thinking about this problem rather than explore the full problem space of horizontal scaling. To keep it simple we’ll model an abstract system with the following properties:
- shared-nothing architecture; ignores complexities related to cross-shard communication
- asynchronous circular replication; ignores replica synchronization cost on writes
- reads and writes have the same cost, and it is constant
- physical replication and sharding is considered (server level/no virtual sharding)
These properties aren’t necessarily of a fictional system. In-memory key-value databases come quite close to this description.
See 3 for a list of complexities that arise in a shared-nothing architecture.
- number of servers
- write throughput requirements
- read throughput requirements
- server processing capacity / operations per second it can handle
- , where write capacity and read capacity
- replication factor (number of copies)
- number of acceptable replica failures
- number of shards
Let’s imagine an initial system that is consists of 2 shards with a replication factor of 3.
A server is designed to handle 11 operations per second (), which are divided into 9 writes per second and 2 reads per second.
The total write capacity of the system is the number of writes an instance can handle multiplied by the number of shards. Replicas have to repeat the same write operations, they don’t contribute in any way to the total write capacity (hachured background). In other words, this is the sum of the heights of all red shaded rectangles.
Total read capacity is given by the number of replicas multiplied by the number of reads a single instance handles. Sum of all green shaded rectangles heights.
Imagine we have the following requirement now: the system should handle twice as many reads (). Probably doubling the number of replicas comes first to the mind.
What happened here is we double the sum of green shaded rectangles heights. Indeed, it gives us twice the read capacity.
By doubling the number of replicas we also doubled the amount of servers that are used.
Giving a total of 12 instances.
But are there other options? More efficient maybe? Compare the following drawing to the previous ones.
Instead of doubling the number of replicas, I added a new shard instead.
Let’s evaluate how this system would perform.
This deployment still has to handle the same amount of writes as the initial one, but now the writes are spread onto 3 shards instead of 2. This means that each shard has to do only 6 writes per second.
A server was handling 11 operations per second before. If now each instance processes only 6 writes per second it means that there is room for 5 reads per second per instance instead of just 2.
By adding a shard instead of doubling the number of replicas not only the system can handle more load it is also more efficient!
Only 9 servers are needed in this case. More performance for less money. 🧙♂️
Note: This result doesn’t mean that always adding more shards is the correct solution, I’ll show why later.
Hopefully, the above example convinced you that it is worth thinking about this problem and there is value to be extracted if it is approached correctly.
Let’s return to the initial problem and formalize it a bit.
How to choose the number of shards, and the number of replicas such that efficiency is optimal?
We want to optimize for smallest such that performance requirements are satisfied.
The maximum write throughput the system can handle is number shards multiplied by shard capacity. Since the number of replicas doesn’t influence write capacity shard capacity becomes processing server capacity.
Read throughput is limited by the total number of replicas (we ignore failure tolerance () for a moment).
Let’s recall some previous notations (, ) and do some substitutions.
Note that appears only in the numerator. This implies a linear relation to N. We would never want to increase it.
If failure tolerance is to be ignored, then RF=1 is what we want. It makes sense intuitively as well. Replicas don’t contribute to write capacity in any way. And replicas read capacity is limited by node capacity, minus the capacity used for writes. Adding shards, however, contributes to both. Write capacity is spread between more nodes, this frees up capacity from every single node that can be used to process reads.
Let’s introduce failure tolerance into the model. will denote the number of replicas failures we want to tolerate.
When , the at which the minimum is achieved varies depending on the mix of the write and read workload, and the desired !
All models are wrong, but some are useful. –George E. P. Box
Real-world systems are usually much more complicated compared to what I have tried to model here.3 However, I believe this simple model to be a strong pillar for reasoning about scaling shared-nothing systems when efficiency is a goal.
I wasn’t able to find a lot of related work, if you have any relevant references please send them @nvartolomei.