In Part 2, we discovered the principle of a disk pool and its ability to unify heterogeneous block storage devices regardless of manufacturer make or model. We also covered the important concept of thin-provisioning and its operational and financial impacts. Now we will explore other advanced functions that emerge from the disk pool.
The Right Data At The Right Performance Level
Since disk pools can “pool” many different types of block storage devices, it is possible that disks with different performance characteristics could end up in the pool. In the old days (ie. the last decade), it was a best practice to isolate different types of storage with different performance characteristics. For example, you would never want to mix 10k disk with 15k disk, since the aggregation of these would negatively affect the overall group performance.
Although isolating disks into separate pools based on performance wasn’t a terribly difficult task, it became obvious that it was inefficient. If you needed high and low performance disk on the same system you needed to carve out two logical volumes, one from each disk pool. That meant that you needed to be careful to place high intensity data on the high performance volume and low intensity data on the other. This may have worked for a while, but data access patterns change. If the change was large enough, you may have needed to move large amounts of data to a different disk pool. Uggghhhhh. Too much babysitting. There needs to be a better way.
It would be great if we could take multiple disk performance classes and place them into a single pool. Then within the pool, designate which disk was faster and which was slower. Then implement an intelligence that would shift data around to faster or slower disk based on access frequency. Amazing right? This concept is universally known as Automated Storage Tiering.
Automated storage tiering based on access frequency
A significant result of this new capability was that you no longer needed to necessarily purchase high-speed disk for all your data. On average, only 10-15% of all datasets contain “hot” data (frequently accessed), the rest is “cold” data (infrequently accessed). If you find that your entire dataset requires high-speed disk all the time, then so be it, at least you have options.
Thin-provisioning delivers major financial benefits in terms of storage utilization efficiency. Auto-tiering delivers additional financial benefits in terms of minimizing the amount of expensive high-speed disk that is required, since high intensity data will get access to high performance disk when needed. It is uneconomical to use expensive high-speed disk for data that is mostly dormant, right? This will be important to remember in the next section.
Disaster Avoidance Before Disaster Recovery
There is a lot of talk out there about disaster recovery methods, model, processes, etc. There is no doubt that a disaster recovery plan is super important. But, wouldn’t it be better to avoid the disaster all together, or at least mitigate the chances of disaster? This is where the concept of synchronous data mirroring comes in.
Synchronous mirroring is where the production data is mirrored to two separate independent storage systems. The distance between the two mirror nodes is usually less than 100km. And because it is synchronous, the link between the mirror nodes must be a very reliable high-speed connection.
General synchronous mirror model
Since the data is contained within a disk pool which has consistent and uniform structures (SAUs), it shouldn’t be that difficult to mirror those structures to another storage virtualization node. But what is the compelling reason for doing this? Think about how much effort goes into achieving redundancy at layers above the data within the architecture (paths, switches, controllers, etc). It is all for nothing if the data itself is inaccessible, right?
The general assumption is that the data will always be there. But if you have been a systems engineer long enough, you know that no silo of storage is bulletproof. Think of how many times you said to your boss, “That shouldn’t have happened”. In fact, given the current nature of storage systems, the highest risk of failure lies with the storage layer itself, not so much with the connectivity layer. So why do we put so much trust in the storage device, when that is the most likely component to fail? To make matters worse, what happens when it is more than just an interruption in connectivity? What about when it is a failure with the physical disk components that causes data corruption or loss? Without a disaster avoidance model such as synchronous mirroring, you now have to rely on a disaster recovery model.
One of the most common arguments against this model is, “I can’t afford to purchase two complete storage systems to mirror my data to!!”. First I would ask, “Can you afford to lose access to your data, or lose your data all together?” Second, (remember I told you to remember something) between the cost savings of thin-provisioning, auto-tiering and the fact that you don’t need to purchase expensive feature-rich gear any longer (since you own the features separately), the cost model isn’t as bad as you would initially expect. With all the enterprise features that storage virtualization brings to the table, don’t buy the features again at the storage hardware layer. Focus on storage systems that are reliable and have good support, but also from vendors that don’t force you to buy features you don’t need. Keep in mind, 65-70% of a storage platform is the cost of the vendor software, the other 30-35% is commodity storage components. Why in the world would you continue to purchase the storage software over and over and over again?
When All Efforts Have Failed… Disaster Recovery Time
The day may come when all efforts to avoid the disaster has failed. The production site is now completely down due to a major event. This would be a great time to have a remote site setup so that rapid recovery of all production systems could take place. This is where asynchronous data mirroring comes in.
Asynchronous mirroring, as the name implies, is asynchronous. This type of data mirroring can take place over any distance since it is designed to handle latency and/or disruptions in the transmission line (such as over the internet or a private long distance circuit). The circuit should be sized accordingly based on the average rate of data change (ie. write I/O) the system experiences. This will ensure that the remote site is not too far behind the production site.
General asynchronous mirror model
The ability to perform test failovers without interfering with production is also desirable. As with any disaster recovery plan, you will want to test bringing your applications online at the remote site regularly. This will ensure that the data has full integrity, will help you perfect the recovery process (like practicing fire drills in school when you were a kid), and give you peace of mind knowing that if and when you need to perform a recovery, the system will work as designed.
There are other powerful features of storage virtualization (some of which have already been written about on this blog site). These are:
Over the course of this three part series we have covered many important aspects of storage abstraction and virtualization. The advantages that abstraction provides is significant, which is why there is so much buzz in the industry surrounding Software-Defined Storage. This is a very interesting time indeed within the storage industry. It is changing fast and having a solution that keeps you agile is key.
Software-Defined Storage is yet another term that has been greatly abused and misunderstood. My next article will be about the principles of Software-Defined Storage. Until next time.