Massive scaling of Azure Storage

Why do we need storage which scales massively? Let us consider an online application which gains popularity on a day to day basis. There are many examples of such applications – facebook, google, instagram to name a few. These applications accumulate data at a rapid pace and the data could be text, numerics, pictures and videos.

Such applications require a hosting service which can provide additional storage space as and when required. Applications which grow rapidly require storage space which can increase exponentially, and traditional hosting companies will not be able to adapt to such requirements.

The feature which allows massive scaling is partitioning which is not available in traditional hosting.

Partitioning is an integral feature of Cloud storage. Partitioning is the physical separation of tables which is handled by Azure. However, Azure considers the Partition Key which specified by us (the application developer) when designing the tables.

Azure monitors the traffic performance and if any partition becomes a hotspot it immediately places the partition on a different node. Apart from the benefit providing scalable additional storage space, partitioning results in load balancing which is a big benefit as big as - automatically scalable storage space. The first point to note here is how scalability is integrated into the storage design. Azure takes your specification of partition key which you assign to an entity, and uses this key and the traffic load to decide which partition needs to be placed on a different partition server.
Observe the diagram given below:
azure partitions
  1. Item Category is Partition Key and Item Code is RowKey.
  2. Rowkey uniquely identifies an entity within a partition. All entities are ordered by Partionkey and Row key within a table.
  3. Both Partion Key and Row key are strings.
  4. As shown in fig. both the partions (Raw Material and Finished Product) are served by Partition Server A.
  5. Note that the number of partions servered by any single partiotion server depends on the partition size and the traffic.

azure partitions when traffic increases and load balancing is done As shown in fig, Partition 2 has become a hot partition due to the intense traffic and Azure load balanaces that partion. Now all the requests to this partiotion are served by Server B.

The next point to note is the load balancing benefit. Azure ensures that as requests increase, responses are handled in such a way that no one part of the storage is overloaded. This ensures that there is no drop in performance.

Compare this with the traditional system wherein developers had to handle both scaling and load balancing on their own by installing traffic monitoring software and doing the necessary changes to the database and application design. Developers also had to interact with the hosting service provider and subscribe to additional space as and when required.

With one stroke, Azure has removed these two responsibilities from the developer.

Storage Design - I have used the phrase storage design in the above paragraphs to mean to cover all three types of storage that is table, blob and queues. However, the way azure handles scalability differs for each of them. I will discuss this when we come to the articles on tables, blobs and queues individually.

It must be clear to you by now, that it is important to consider scaling when we specify the Partition key. I will discuss these specific guidelines in the next articles.