We have discussed Table partitioning and how it impacts Scalability, Query efficiency and Load balancing. Now in this question, we will discuss how to gain advantage over the above mentioned benefits and how to decide on the partition size which gives the best performance in all the above mentioned parameters (in a given situation). We will discuss different partition sizes and see its
disadvantages and advantages in each case.
The size of the partition depends on the number of entities it contains.
What effects the size of the partition ?
The direct answer is 'Partition key'. Depending on the Partition key, partition size varies.
Basically, there are three types of Partition keys where you can choose from:
First let us consider a table with a single partition key value for the entire table. All the entities in the table belong to the same partition. All entities in the table are being served from a single partition server.
In some scenarios, choosing a single value for the Partition key is convenient. The advantage of having a single partition is that we can perform Entity group transactions in all the entities of the table and this gives very good performance when we have less number of entities in the table.
If the number of entities are becoming more, having a single partition limits the scalability. Also, we can send only 500/second requests to a single partition. If the situation arises, where we need to scale the partition, we have to re-think of selecting a Partition key value. When the table is storing data and that data need not be scaled, we can choose this option.
For example, if an online store application stores its product details in a table with a single value of the Partition key as products. Suppose, the number of products are 20,000,000, then all the products fall under the same partition. This can cause a dip in performance.
Now, let us consider a table with a unique partition key values (New PartitionKey Value for Every Entity). Here, each and every entity belongs to a different partition. This is the maximum level we can divide the partition key. In this scenario, scalability is very high but performing batch transactions are not possible.
One important thing we have to understand when we are using unique partition key values is, azure creates range partitions if the Partition key values are in an ascending or descending order.
What is a range partition?
Range partitions group the entities together and are very useful when we perform range query.
Third case is a table with multiple or different partition key values. When there are multiple values, azure creates multiple partitions. The size of the partition depends on number of entities it contains. There many be small partitions or large partitions. If one partition becomes the hot partition, azure load balances the partition. Range queries need to go to multiple servers to execute a query.
See More Questions and Answers on - Azure Blobs and Queues