vMotion, Storage Router Teamwork

Important note: this blog posts talks about vMotion, a VMware feature. KVM fans should not be disappointed as Live Migration, the KVM version of vMotion is also supported by Open vStorage. We use the term vMotion as it is the most used term for this feature by the general IT public.

In a previous blogpost we explained why Open vStorage is different. One thing we do differently is not implementing a distributed file system. This sparked the interest of a lot of people but also raised questions for more clarification. Especially more information on how we pulled off vMotion without a distributed file system or expensive SAN raised a lot of fascination. Time for a blog post to explain how it all works under the hood.

Normal behavior

Under normal circumstances a volume, a disk of a Virtual Machine, can be seen by all hosts in the Open vStorage Cluster as it is a file on the vPool (a datastore in VMware) but the underlying, internal object (Volume Driver volume) is owned by a single host and can only be accessed by this single host. Each host can see the whole content of the datastore as each NFS and Fuse instance shows all the files on the datastore. This means the hosts believe they are using shared storage. But in reality only the metadata of the datastore is shared between all hosts but the actual data is not shared at all. To share the metadata across hosts a distributed database is used. To keep track of which host is ‘owning’ the volume and hence can access the data, we use an Object Registry which is implemented on top of a distributed database. The technology which tricks hosts in believing they are using shared storage while only one host really has access to the data is the core Open vStorage technology. This core technology consists out of 3 components which are available on all hosts with a Storage Router:
* The Object Router
* The Volume Driver
* The File Driver

The Object Router
The Object Router is the component underneath the NFS (VMware) and the FUSE (KVM) layer and dispatches requests for data to the correct core component. For each write the Object Router will check if it is the owner of the file on the datastore. In case the Object Router is the owner of the file it will hand off the data to underlying File or Volume Driver on the same Storage Router. Otherwise the Object Router will check in the Object Registry, stored in the distributed database, which Object Router owns the file and forwards the data to that Object Router. The same process is followed for read requests.

The Volume Driver
All the read and write requests for an actual volume (a flat-VMDK or raw file) are handled by the Volume Driver. This component is responsible for turning a Storage Backend into a block device. This is also the component which takes care of all the caching. Data which is no longer needed is sent to the backend to make room for new data in the cache. In case data is not in the cache but requested by the Virtual Machine, the Volume Driver will get the needed data from the backend. Note that a single volume is represented by a single bucket on the Storage Backend. It is important to see that only 1 Volume Driver will do the communication with the Storage Backend for a single volume.

The File Driver
The File Driver is responsible for all non volume files (vm config files, …). The File Driver stores the actual content of these files on the Storage Backend. Each small file is represented by a single file or key/value pair on the Storage Backend. In case a file is bigger than 1MB, it is split in smaller pieces to improve performance. All the non-volume files for a single datastore end up in a single, shared bucket. It is important to see that only 1 File Driver will do the communication with the Storage Backend for a file in the datastore.

Open vStorage - normal

vMotion Step 1

When a Virtual Machine is moved between hosts, vMotioned, vCenter calls the shots. In a first step vCenter will kick off the vMotion process as none of the hosts involved will complain as they believe they are using shared storage. As under normal vMotion behavior, the memory of the Virtual Machine will be copied to the destination host while the source VM continues to run (so no interruption for end-users there). Once the memory is almost completely copied the Virtual Machine is quiesced, the Virtual Machine state is transferred, the missing pieces of the memory are copied and the Virtual Machine is resumed on the destination. As for vMotion both hosts have access to the VMDK files, there is no special action needed on the storage level. But with Open vStorage the volumes of the Virtual Machine are not really shared between the hosts, remember the Object Router of the source host is the owner of the volumes. Open vStorage must tackle this when read or write requests happen. In case a write happens to the volume of the moved Virtual Machine, the Object Router on the destination host will see that it is not the owner of the volume. The destination Object Router will check in the Object Registry which Object Router owns the volumes and will forward the write requests to that Object Router. The Object Router on the source forwards the write to the Volume Driver on the source as under normal behavior. The same happens for read requests. To summarize, in a first step only the Virtual Machine is moved to the destination while the volumes of the Virtual Machine are still being served by the source Storage Router.

Open vStorage - vMotion 1

vMotion Step 2

After the first step of the vMotion process, the volumes of the Virtual Machine are still being owned and served by the Object Router of the source. This is of course a situation which can’t be sustained in case a lot of IO occurs on the volumes of the Virtual Machine. Once an IO threshold is passed, the Object Router of the destination will start negotiating with the Object Router on the source to hand over the volumes. Just as with the memory, the metadata of the volumes gets assembled in the Volume Driver at the destination. Once this process is complete a point in time is arranged to copy the last metadata. To complete the process the Source Object Router marks the volumes as owned by destination Object Router and from then on the volumes are served by the destination Object Router.

Open vStorage - vMotion 2

Summary

vMotion is supported by Open vStorage although a volume can only by written and read by a single host. In a first step vCenter will move the Virtual Machine to the destination host but the volumes of the Virtual Machine will still be served on the source hosts. This means that communication between the Object Routers on the 2 hosts is required for all IO traffic to the volumes. In a second phase, after an IO threshold is passed, the Object Routers will negotiate and agree to make the Object Router of the destination the owner of the volumes. Only after this second phase the whole Virtual Machine, both compute and disks, is running on the destination host.

What is the big deal with Virtual Volumes, VMware?

June 30 2014, mark the date, people. This is the day when VMware announced their public beta of Virtual Volumes. Virtual Volumes, or VVOL as VMware likes to call them, put a Virtual Machine and its disks, rather than a LUN, into the storage management spotlight. Through a specific API, vSphere APIs for Storage Awareness (VASA), your storage array becomes aware of Virtual Machines and their Virtual Disks. VASA allows to offload certain Virtual Machine operations such as snapshotting and cloning to the (physical) storage array.

Now, what is the big deal with Virtual Volumes, VMware? Open vStorage has been designed to allow administrators to manage each disk of a Virtual Machine individually from day one. We don’t call it Virtual Volumes but call it VM-centric, just like anyone else in storageland does. VMware, don’t get me wrong, I applaud that you are validating the VM-centric approach of software-defined storage solutions like Open vStorage. For over 4 years, the Open vStorage team has worked at creating a VM-centric storage solution which supports multiple hypervisors such as VMware ESXi and KVM but also many backends. It is nice to see that the view we had back then is now validated by a leader in the virtualization industry.

What confuses me a bit is that while the whole world is moving towards shifting storage functionality into software, that you take the bold, opposite approach and push VM-centric functionality towards the hardware. This behavior is strange as everyone else is taking functionality out of the legacy storage arrays and is more and more treating storage as a bunch of disk managed by intelligent software. If I remember it correctly, you declared at VMworld 2013 a storage array to be something of the past by announcing VSAN. The fact that storage arrays are according to most people past their expiry date was recently confirmed by another IT behemoth, Dell, by OEM-ing a well-known hyperconverged storage appliance.

A said before, Open vStorage has been designed with VM-centric functionality across hypervisor flavors in mind. This means that taking a snapshot or cloning a single Virtual Machine is as easy as clicking a button. Being a VM-centric solution doesn’t stop there. One of the most important features is replication on a per Virtual Machine basis. Before implementing this critical feature, the Open vStorage team has had a lot of discussion about where the replication functionality should be in the stack. We could have taken a short-cut and pushed the replication back to the storage backend (or storage array as VMware calls it). Swift and Ceph for example have replication as their middle name and can replicate data across multiple locations worldwide. But, by moving the replication functionality towards the storage backend you lose your VM-awareness. Pushing functionality towards the storage array is not the solution, intelligent storage software is the only answer to a VM-centric future.

What the World Cup can teach us about storage

The World Cup 2014 has started this weekend in sunny Brasil. The first victories have been celebrated, the first dreams smashed to smithereens. The Belgian Team, the Red Devils, still needs to play their first game and the Open vStorage Team has very high expectations. In case you have players like Thibaut Courtois, Eden Hazard and Vincent Kompany in your team, losing is not an option.

But the World Cup can also teach us something about storage. Just like in a football team where the trainer needs to make a decision about which player he puts at which position, an (storage) administrator needs to decide on which storage he stores his Virtual Machines. In football you have different types of players such as a goalkeeper, defender, striker etc, but somehow we have allowed storage for Virtual Machines to be uniform. Especially running all your Virtual Machines from local disks only, accelerated by SSDs seems to be the new mantra. A single unified namespace for all! This makes no sense as different Virtual Machines have different storage requirements. On top, enterprises and service providers have made huge investments in NAS, distributed file systems and object stores that are highly reliable and already in use.

The second problem arises when you as administrator try to mix different storage types inside one environment. With current setups mixing different backends is just not done when running Virtual Machines as this would be a management nightmare. Imagine having a Virtual Machine with one disk on a SAN and another one on local disks. There are different GUI’s to use and maybe even different storage interfaces. This would be like mixing football players and basket players in a single team and force all to follow the same football rules. Open vStorage allows you to have a single pane of glass across all your storage backends, removing the management nightmare. With Open vStorage there is 1 GUI which hides away the complexity of running part of a Virtual Machine on your old SAN and the other part on your new Object Store. Open vStorage acts as frontend to these different backends, and turns these backends immediately into fast storage for virtual machines.

PS: Go Belgium go!

Open vStorage 1.2

Another month flew by which means the Open vStorage team is very proud to announce Open vStorage 1.2. The new big features for this release are:

  • apt-get install openvstorage: during the month of May we have improved our install process, making it much easier to install Open vStorage. The advised way to install Open vStorage is now by using Debian packages. When using Apt-Get, the software can also be installed on Ubuntu Server 14.04 LTS. Stay tuned for more supported OSes in the future.
  • SNMP monitoring: in this new release all parameters, ranging from vDisks, vMachines, vPools to the actual Grid Storage Routers of the Open vStorage Cluster can be monitored through SNMP. This means you can now use your favorite monitoring tool (Zenoss, Nagios, Cacti, …) to see real-time statistics and create historic graphs of IOPS, memory usage and many more parameters.
  • Improvement in the vStorage Driver to allow clones to be created in parallel from a vTemplate.

This release also contains a lot of bug fixes. The most important fixes are:

  • Various bugfixes to harden the rebooting of an Open vStorage Cluster.
  • Fix to the Add vPool wizard to make sure it doesn’t get stuck.
  • Making the deployOvs.py user messages more clear.
  • Fix for a create from vTemplate issue on VMware in case of multiple nodes.
  • Removing a vPool is now possible in case there are vMachines on another vPool.
  • Incorrect tooltip on the vPool Detail Management tab.
  • After deletion of a vTemplate, the corresponding VM is also deleted from the vSphere client and the vPool.

Enjoy!

There’s more than one way to change a tyre

During talks with people who are interested in the Open vStorage concept, we quite often get the following question:

How are you different compared to <insert name of software-defined storage product of the moment>

Well the answer is something which has been the adagio within CloudFounders from day one: Keep IT Simple, Stupid. The KISS principle states that most systems work best if they are kept simple rather than made complicated. This is equally through in the storage industry. For decades we are chasing the utopian dream of Virtual Machines that are always online. I must admit, we are very close but we should look back at the price we are paying: complexity. But don’t be fooled, hand in hand with complexity comes its buddy error.
Within our company we have an average of 15 years’ experience within datacenter and storage environments. The most important thing we learned down the road is that complexity is your worst enemy when things go bad. The second thing we learned is that things will go bad eventually, no matter how prepared you are. Ok, when you have added enough complexity like active-active SANs, firewalls, databases, etc, things will go bad less often but when disaster strikes your downtime will be much more than when you would have kept things simple.
When we designed Open vStorage the KISS principle has always been in the back of our mind. I’d like to give you 2 examples of where we keep things simple while others are making it complicated, expensive and dangerous.

  • To support functionality such as vMotion and VMware High Availability (HA), you can use an expensive SAN. In order to decrease the hardware complexity and cost, architects are nowadays turning towards distributed file systems. Don’t get me wrong, distributed file systems such as GlusterFS, developed by more than 70 software engineers, is a great file system. But in the end it is a file system, it is designed to store files and was never designed to run Virtual Machines. But a clever engineer saw that GlusterFS solved his problem of a unified namespace, making all files on the file system available on every host. But let us take a step back to see the unified-namespace-problem in perspective. What we want to achieve is that the Virtual Machine volume, a block device, of a Virtual Machine can move from one host to another host without having to bring the VM down. Instead of using a distributed file system to store the virtual disk, Open vStorage uses a different less complex approach to tackle this problem. We store the volumes on a shared backend. To make sure that VM data isn’t accessed at the same time by two hosts we have created a rule within Open vStorage that a volume can only by attached to one host. We store the ‘owner’ of the host in a distributed database. When vMotion is triggered, the 2 hosts work together to make the transition. Once the volume is available on the second host, the 2 hosts sync the data which is not yet on the backend and the move is complete. To summarize instead of making all Virtual Machine volumes available on all hosts all the time, Open vStorage makes sure that a volume is connected to a single host at the right time.
  • To support multiple hypervisors and multiple storage backends, we also took an easy approach. Instead of trying to convert each hypervisor type directly into different storage protocols, we created a front and a backend. The complexity was greatly reduced as we only need to write per hypervisor a storage frontend and immediately the new hypervisor can talk with all storage backends. The same goes for a new backend: write a new backend extension and all hypervisors can use that backend. This flexibility does not only minimize the development effort but on top different hypervisors can share the same storage pool. Also as we use raw volumes, moving a Virtual Machine from VMware to KVM could (in the future) just be a reboot on the right host, no conversion needed.

Now back to the title, you can change a flat tyre in different ways, one way it to just replace the flat the tyre. Or, you could instead strip the whole car from around that flat tyre and reassemble it around a new tyre. This way you fix the same issue but the cost will be higher, the process will be more complex and you will probably need more than 70 engineers to fix your flat tyre.

The Open vStorage future looks bright

Jason Ader, analyst at the William Blair & Company, released a new research document: The State of Storage – a 2014 Update. In this detailed report, which is a must-read for everyone in the storage industry, he discusses the biggest short and long term headwinds for traditional storage vendors. Some of these headwinds are caused by newcomers in the storage industry. He estimates that these newcomers will grow with a rate of around 40%, which will cause the combined impact to reach almost 10% of industry sales in 2014. These headwinds can not be ignored in terms of revenue or growth so it is worthwhile to discuss these headwinds in more detail and explain where Open vStorage fits in.

  • Object-Based Storage gets a second wind as migration of data to the cloud (private or public) is on the rise. Use cases and applications adjusted for Object Stores are still limited but on the rise. With Open vStorage you can turn your Object Store (Swift, Cloudian, …) into a block device for Virtual Machines and turn it into a high performance, distributed, VM-centric storage platform.
  • Emergence of Flash-Centric Architectures: Open vStorage leverages flash inside the ESXi or KVM hosts to accelerate reads and writes. It brings together the benefits of storage local to the server and external scalable storage by leveraging SSDs and PCIe flash technology inside the server for acceleration.
  • Software-defined Storage (SDS) will, according to report, “have the largest impact on the storage market in the long term as it will drive ASPs down and push toward the commoditization of specialized storage hardware platforms.” This trend is clearly to be seen in the rise of open-source file systems such as GlusterFS and Ceph and VMware’s vSAN. Open vStorage is a software-defined storage layer which brings storage intelligence into software that runs closer to compute with the ability to use multiple storage back ends in a uniform way.

Open vStorage 1.1

The Open vStorage team is on fire. We released a new version of the Open vStorage software (select Test as QualityLevel). The new big features for this release are:

  • Logging and log collecting: we have added logging to Open vStorage. The logs are also centrally gathered and stored in a distributed Search Engine (Elasticsearch). The logs can be viewed, browsed, searched and analyzed through Kibina, a very nice designed GUI.
  • HA Support: Open vStorage does with this release not only support vMotion but also the HA functionality of VMware. This means that in case an ESXi Host dies and vCenter starts the VMs on another Host, the volumes will automatically be migrated along.

Some small feature that got into this release:

  • Distributed filesystem can now be selected as Storage Backend in the GUI. Earlier you could select the file system but you could not extend it to more Grid Storage Routers (GSR). Now you can extend it across more GSRs and do vMotion on top of f.e. GlusterFS.
  • The status of VMs on KVM is now updated quicker in the GUI.
  • Manager.py has now an option to specify the version you want to install (run manager.py -v ‘version number’)
  • Under administration there is now an About Open vStorage page displaying the version of all installed components.

In addition, the team also fixed 35 bugs.

A distribution center for Virtual Machine data

walmart_distribution_centerAt open vStorage we quite often get the question: “Ok, you are a Grid Storage Router but what is that exactly?”

To explain what a Grid Storage Router is and why it is essential in a virtual environment, I’d like to make the analogy with Walmart, the largest retailer in the world. You can compare Open vStorage to the grid of distribution centers of Walmart. Walmart has 42 regional U.S. distribution centers with over 1 million square feet. In total these distribution centers have more than 12 miles of conveyor belts to move 5.5 billion cases of merchandise.

Sam Walton, the founder of Walmart, realized very quickly that in order to sell a lot of goods the company had to develop a multilayered distribution system and identified logistics as its primary expertise. With multiple stores, they could have opted to arrange that goods go directly from manufacturer to the end-shops but the Walmart management quickly realized that having a central hub in between these 2 makes sense. Instead of having their more than 100,000 suppliers dropping of their goods at the door of the 5,000 stores, they ask the suppliers to drop their goods off at one of the regional distribution centers. Having only a limited amount of distribution centers allowed the trucks with supplies to be stocked to the maximum level and hence optimizing round-trips and making the best use of the available truckload capacity.

The distribution centers have grown from being just a temporary storage location, to a fully automated high-tech center where every move is orchestrated and tracked. On top of just temporarily storing goods, the distribution centers also offer additional services such as splitting up large volumes into smaller parcels, repackaging and keep track of the stock.
Typically one distribution center can cater the needs of multiple stores but there is a limit to this capacity. When needed a new distribution center, which typically follows a blueprint, will open to relieve the pressure and to prepare for yet more stores. This allows to scale-out the stores without bringing the chain of supply in danger.

Just as Walmart considers their distribution centers to be as important as their stores, you should attribute Open vStorage with the same importance. Open vStorage is the hub between one or more Storage Backends (Swiftstack, Ceph, …) and your Virtual Machines. Like distribution centers Open vStorage doesn’t only connect suppliers (Storage Backend) and stores (Virtual Machines) with each other in an optimized fashion to build scale-out Virtual Machine storage but it also brings additional value to the table. With its VM-centric architecture it allows efficient unlimited snapshotting, VM replication*, compression and encryption at the VM level*.

Having this grid of Storage Routers, where one can take over from another, it allows to improve the reliability and as all metric of these Storage Routers and the Virtual Machines are tracked, the troubleshooting in case of an issue becomes much easier.

Where distribution centers work with different types of stores (Walmart Supercenters or Walmart Express), Open vStorage is hypervisor agnostic and can handle both VMware ESXi and KVM workloads. For both hypervisors every write gets accelerated on SSD or PCI flash cards and reads are optimized by deduplicating identical content and hence making the best use of the available fast storage.

Just as the distribution centers are key in the success of Walmart, Open vStorage is key in building out scalable, high performance, VM-centric storage.

* Planned for Q3 2014

White Paper: Turning Object Storage into Virtual Machine Storage

In this White Paper “Turning Object Storage into Virtual Machine Storage” we discuss Object Storage as storage solution for Virtual Machines. Due to technical hurdles it is impossible to run Virtual Machines directly from an Object Store. The main issues are eventual consistency, performance and latency and different management paradigms. A distributed file system is also no fit for the job.

To turn Object Storage into primary storage for hypervisors, the solution must be especially designed for Virtual Machines and take a radical different approach compared to existing storage technologies. Open vStorage takes this different approach and is designed from the ground up with Virtual Machines and their performance requirements in mind. Open vStorage is the layer between the hypervisor and Object Store and turns the Object Store into a high performance, distributed, VM-centric storage platform.

You can download the full white paper here.