In Part 1, I defined what abstraction is and what has resulted from that achievement. To briefly recap, compute abstraction ultimately resulted in compute virtualization and storage abstraction ultimately resulted in storage virtualization. In Part 2 of this journey, we’ll explore the question, “Now that we have storage abstraction, what can we do with it?”.
If you recall:
Compute abstraction should allow a piece of software (ie. an operating system) to run and control disparate server hardware platforms without needing modification for each platform.
The same should also be true of storage abstraction:
Storage abstraction should allow a piece of software (ie. a storage virtualization engine) to run and control disparate storage hardware platforms without needing modification for each platform.
One can’t help but notice the blaring parallel between both types of abstraction. Leveraging any x86-64 compute hardware with the ability to carve up compute resources (as applies to compute virtualization) and similarly, leveraging any block level storage device with the ability to carve up storage resources (as applies to storage virtualization).
Let’s Take A Dip In The Pool
So what is the unifying principle, the common denominator, the foundation of all of this? Good question. First, there needs to be a common, consistent, underlying construct through which all storage devices are unified. This common construct is generally referred to as a disk pool. Disk pools unify heterogeneous storage devices so they can be managed as one homogeneous storage device (albeit with different characteristics). The disk pool accomplishes this unification by laying down consistent and uniform structures on the physical disk necessary to keep things in order once data is written to the pool and/or if the pool structure changes (ie. adding or removing a storage device).
An important concept to keep in mind for later is this idea of consistent and uniform structures. It is these structures that give way to a whole host of advanced volume manipulation techniques which result in making the storage administrator’s life much easier. I will refer to these structures generally as storage allocation units (SAUs) from this point forward.
Logical Disk Structure Once Joined To A Disk Pool
The importance of disk pools are very readily apparent. Now that we have SAUs governing the physical disks within a disk pool, we have more power to manipulate these SAUs, the data, and even the physical disks contained therein. For instance, an administrator should now be able to easily modify the disk pool physical disk members, conceivably, without interrupting the production workloads. As a result of this capability, administrators can move new storage technology into the pool (as it becomes commercially available) and remove end-of-life storage technology from the pool as needed. There are many more powerful use cases where this manipulation is beneficial, as we will soon see.
Now That We Have The Pool, How Do We Use It?
So we have a disk pool. The disk pool has some block storage device contained within it, the storage allocation units are in place, and some amount of capacity is available. At this point, the disk pool is technically operational and ready to go to work, but how do we use the pool? So far we have only spoken of the physical side of the storage virtualization engine. Let’s explore the virtual side.
Storage Virtualization Has Two Faces
The administrator simply creates a virtual volume from the existing disk pool. Once the virtual volume is created, then it can be mapped to the application server. The storage virtualization engine, over iSCSI (encapsulation of SCSI storage commands within TCP/IP protocols) or Fibre Channel (encapsulation of SCSI storage commands within Fibre Channel Protocol), presents the virtual volume to the application server (or host) as a standard SCSI physical disk device. The virtual volume can be consumed by the application just like any other disk device.
Keep It Thin, Keep It Real
For the sake of the length of this post, I will only review one more benefit to disk pooling (which is partially out of the bag at this point anyways). There will be more posts furthering this discussion, so don’t worry.
Since we control the disk pool, the SAUs, the data, and the physical disks, let’s see if there is any optimization we can introduce across them all. Enter now the concept of thin-provisioning. In order to understand storage with thin-provisioning, you need to understand storage without it.
Before the days of thin-provisioning, an administrator would take a storage device and carve it up into large pieces or logical volumes. This would be necessary to ensure that enough space would be available contiguously down the road without needing to perform volume surgery later to extend the logical volume (major pain, not to mention risky). As a result, most times, there was a lot of space that had been assigned to a particular logical volume, but never used. This would further result in the need to purchase more storage since all of the available space had already been fully allocated. Notice what I said. It wasn’t that the storage device was out of space literally, but it had all been fully allocated.
So the concept of thin-provisioning was born out of necessity, as all great inventions typically are. Instead of using an allocate-on-provision model where large portions of the storage capacity were provisioned for a specific purpose and therefore no longer available to other applications, let’s consider an allocate-on-write model. In an allocate-on-write model, the physical storage is not allocated until data has been written to the volume. Additionally, when the data is actually written, instead of allocating large portions of the available storage capacity, only allocate a relatively small chunk to accommodate the data written. This method is referred to Just-In-Time provisioning, or simply thin-provisioning.
Just-In-Time Provisioning or Thin-provisioning
Remember, the virtual volume that is presented to the host appears as a fully allocated and provisioned physical disk. What the host does not know is that NOTHING has actually been allocated or provisioned whatsoever. But once the first write operation comes down from the host (which is usually a format command or some type of logical volume preparation command issued by the operating system), then and only then is a small chunk of space allocated on the back-end physical disk. As the application on the host writes more and more data, the chunks get filled and the storage virtualization engine allocates more and more chunks to accommodate the new data. Pretty cool right?
Statistics show that on average, without thin-provisioning, you will lose access to 65% of the usable volume space on your storage device. Meaning, when you seemingly “run out of space”, you will have only written data to 35% of the storage device. With thin-provisioning however, you can effectively reclaim the majority of the 65% and drive your data-to-capacity ratio to over 90%. That is a much better ROI than the previous model. And remember the contiguous space problem administrators had to account for? Well, that is no longer a problem when using thin-provisioning. This is because there is no penalty for creating very large (2TB+) logical virtual volumes for your applications since NOTHING is allocated until it is written.
Now you can see why I wanted to end this article with thin-provisioning (that’s if you are even still awake, lol). We have only just scratched the surface on what disk pools, and storage virtualization in general, can do to enhance the storage architecture. In the next article (you guessed it, Part 3), we will cover many of the other benefits of using storage virtualization. Stay tuned, much more to come.
Ready for even more? See Part 3 of this series here.