Q70. Explain how Partition Key is linked to Scalability?

The most important benefit of partitioning an Azure Table is scalability.

Let us consider a Table without any partition. This table is also referred to as a Table with a single partition. In a high volume of data storage (high traffic read/ write) situation, this single node has to serve all the requests from the application instances. This can cause a dip in the performance.

Consider the table with Partition key

As data size increases, Azure splits the Table into partitions and spreads the partitions on different servers. This is done based on the Partition Key so that data of same kind of data stay together. As the partitions are created and spread across many servers, scalability is achieved, that is the capacity to store more data, and at the same time no compromise is made on performance as data of same kind is stored together in partitions.

employee table partitioned on department name

We do not have any control on the process of partition which also means we can not specify on which node which set of entities should reside. Windows Azure automatically does this. In practice, a single partition can never run out of space. Azure handles this on its own.

Note :- Suppose a customer purchases goods from a company. All data pertaining to a particular customer such as his details, invoices raised and payments received reside in a single partition. This happens when the table is partitioned based on the customer. For a single customer, all requests will be routed to the same partition.

Azure implements storage scalability by distributing partitions across many storage nodes. Remember that a single partition need not reside in a single node, it can reside on multiple storage nodes. However, all entities pertaining to a single partition are served by a single server. Note that Azure manages all scalability issues with one goal that is, there should not be a drop in performance during high traffic times. As we have discussed, entities in a Table are organized by a partition. Each partition is identified with a key called partition key which is a string property.

The Partition Key and Row Key are required properties for each entity. Note that, the Partition Key determines how the entities get distributed across storage nodes and entity's row key is its unique identifier within a partition. So, row key is the primary key within a partition. The Partition Key combined with the Row Key uniquely identifies an entity in a table. Both of these keys play an important role on how the data is partitioned and scaled. They also determine the performance of queries.