What is the big deal with Virtual Volumes, VMware?

June 30 2014, mark the date, people. This is the day when VMware announced their public beta of Virtual Volumes. Virtual Volumes, or VVOL as VMware likes to call them, put a Virtual Machine and its disks, rather than a LUN, into the storage management spotlight. Through a specific API, vSphere APIs for Storage Awareness (VASA), your storage array becomes aware of Virtual Machines and their Virtual Disks. VASA allows to offload certain Virtual Machine operations such as snapshotting and cloning to the (physical) storage array.

Now, what is the big deal with Virtual Volumes, VMware? Open vStorage has been designed to allow administrators to manage each disk of a Virtual Machine individually from day one. We don’t call it Virtual Volumes but call it VM-centric, just like anyone else in storageland does. VMware, don’t get me wrong, I applaud that you are validating the VM-centric approach of software-defined storage solutions like Open vStorage. For over 4 years, the Open vStorage team has worked at creating a VM-centric storage solution which supports multiple hypervisors such as VMware ESXi and KVM but also many backends. It is nice to see that the view we had back then is now validated by a leader in the virtualization industry.

What confuses me a bit is that while the whole world is moving towards shifting storage functionality into software, that you take the bold, opposite approach and push VM-centric functionality towards the hardware. This behavior is strange as everyone else is taking functionality out of the legacy storage arrays and is more and more treating storage as a bunch of disk managed by intelligent software. If I remember it correctly, you declared at VMworld 2013 a storage array to be something of the past by announcing VSAN. The fact that storage arrays are according to most people past their expiry date was recently confirmed by another IT behemoth, Dell, by OEM-ing a well-known hyperconverged storage appliance.

A said before, Open vStorage has been designed with VM-centric functionality across hypervisor flavors in mind. This means that taking a snapshot or cloning a single Virtual Machine is as easy as clicking a button. Being a VM-centric solution doesn’t stop there. One of the most important features is replication on a per Virtual Machine basis. Before implementing this critical feature, the Open vStorage team has had a lot of discussion about where the replication functionality should be in the stack. We could have taken a short-cut and pushed the replication back to the storage backend (or storage array as VMware calls it). Swift and Ceph for example have replication as their middle name and can replicate data across multiple locations worldwide. But, by moving the replication functionality towards the storage backend you lose your VM-awareness. Pushing functionality towards the storage array is not the solution, intelligent storage software is the only answer to a VM-centric future.

A distribution center for Virtual Machine data

walmart_distribution_centerAt open vStorage we quite often get the question: “Ok, you are a Grid Storage Router but what is that exactly?”

To explain what a Grid Storage Router is and why it is essential in a virtual environment, I’d like to make the analogy with Walmart, the largest retailer in the world. You can compare Open vStorage to the grid of distribution centers of Walmart. Walmart has 42 regional U.S. distribution centers with over 1 million square feet. In total these distribution centers have more than 12 miles of conveyor belts to move 5.5 billion cases of merchandise.

Sam Walton, the founder of Walmart, realized very quickly that in order to sell a lot of goods the company had to develop a multilayered distribution system and identified logistics as its primary expertise. With multiple stores, they could have opted to arrange that goods go directly from manufacturer to the end-shops but the Walmart management quickly realized that having a central hub in between these 2 makes sense. Instead of having their more than 100,000 suppliers dropping of their goods at the door of the 5,000 stores, they ask the suppliers to drop their goods off at one of the regional distribution centers. Having only a limited amount of distribution centers allowed the trucks with supplies to be stocked to the maximum level and hence optimizing round-trips and making the best use of the available truckload capacity.

The distribution centers have grown from being just a temporary storage location, to a fully automated high-tech center where every move is orchestrated and tracked. On top of just temporarily storing goods, the distribution centers also offer additional services such as splitting up large volumes into smaller parcels, repackaging and keep track of the stock.
Typically one distribution center can cater the needs of multiple stores but there is a limit to this capacity. When needed a new distribution center, which typically follows a blueprint, will open to relieve the pressure and to prepare for yet more stores. This allows to scale-out the stores without bringing the chain of supply in danger.

Just as Walmart considers their distribution centers to be as important as their stores, you should attribute Open vStorage with the same importance. Open vStorage is the hub between one or more Storage Backends (Swiftstack, Ceph, …) and your Virtual Machines. Like distribution centers Open vStorage doesn’t only connect suppliers (Storage Backend) and stores (Virtual Machines) with each other in an optimized fashion to build scale-out Virtual Machine storage but it also brings additional value to the table. With its VM-centric architecture it allows efficient unlimited snapshotting, VM replication*, compression and encryption at the VM level*.

Having this grid of Storage Routers, where one can take over from another, it allows to improve the reliability and as all metric of these Storage Routers and the Virtual Machines are tracked, the troubleshooting in case of an issue becomes much easier.

Where distribution centers work with different types of stores (Walmart Supercenters or Walmart Express), Open vStorage is hypervisor agnostic and can handle both VMware ESXi and KVM workloads. For both hypervisors every write gets accelerated on SSD or PCI flash cards and reads are optimized by deduplicating identical content and hence making the best use of the available fast storage.

Just as the distribution centers are key in the success of Walmart, Open vStorage is key in building out scalable, high performance, VM-centric storage.

* Planned for Q3 2014

VM-Centric Storage

We’ve designed Open vStorage to be VM-centric and a lot of the features we create over the next few months will leverage the VM-centric architecture. The idea of VM-centric storage has been implemented by new age storage companies such as Tintri, VMware (with vSAN) and us when designing our products.This approach is a huge departure from the “traditional” LUN based approach taken by SAN vendors. This brings the question – “What exactly do we mean by VM-centric architecture?”.

Managing VMs and not storage
The key feature with VM-centric storage is that IT administrators manage virtual machines and not storage. All storage actions are taken on a per virtual machine basis rather than having to understand LUNs, RAID groups, storage interfaces, etc. With Open vStorage any IT administrator having knowledge on managing virtual machines should be able to manage it rather than having to get into the complexities of storage and storage networking specifics. A simple example could be cloning. With Open vStorage an administrator can use a VM template to create clone(s) while a traditional storage administrator would need to clone a LUN and then export a particular VM to achieve the same result.

Storage Policies by Virtual Machines
VM-centric design allows for storage policies to be defined on a per virtual machine basis. A few examples of this is as follows:

  1. Replication from one storage cluster to another, can be done on a per virtual machine basis. Moreover, different replication schedules can be used for different virtual machines – as an administrator you may choose an important VM to be replicated every 30 minutes while a not so important VM to be replicated once a day. Moreover, different virtual machines could be replicated to different clusters.
  2. Data retention policies can be defined on a per virtual machine basis. Open vStorage takes snapshots of virtual machines every hour and soon we will be adding the capability whereby an administrator can set the retention policies for these snapshots differently for different virtual machines. For example for one virtual machine an administrator may set retention for 7 days while for another virtual machine it could be 3 years.
  3. With Open vStorage one can use multiple back-ends such as a NAS, file system, object store. We call these back-end storage systems as vPools. With Open vStorage one can have multiple such vPools per host. The administrator can then select vPool on which the data for a particular virtual machine is stored on again on a per virtual machine basis. Hence, an example would be a VM that needs fast I/O could use a vPool that provides fast throughput while a VM requiring not so fast I/O could be stored on a vPool that has a slower throughput.
    We feel that VM-centric storage is here to stay, and that over a period of time the idea of requiring separate storage management processes, software and people would become obsolete. Our engineers are not stopping at the above mentioned VM-centric features, and our key R&D staff are working on capabilities such as encryption (on a per VM basis), dynamic QoS (tuned per VM) and many more exciting innovations.