White Paper – Open vStorage + Swift, a killer VM storage combo

OpenStack Swift, probably the most used Object Storage for private cloud implementations, offers features which are very appealing for building a distributed storage solution. Alas, issues such as eventual consistency and the fact that hypervisors require block storage and can’t work with Object Storage, make it unsuitable for primary Virtual Machine storage. Open vStorage is the solution to turn Swift into a block device for Virtual Machines.

The combination of Open vStorage + Swift offers great performance due to aggressive caching inside the Host, a unified namespace and many VM-centric features. Open vStorage + Swift is the right choice to build a high performance, distributed storage platform which lowers the management overhead and offers features such as zero-copy snapshots, thin provisioning, bulk provisioning and quick restores.

Download the full white paper here!

Open vStorage 1.3

July arrived, a new Open vStorage release sees the light. We are happy to announce Open vStorage 1.3 which is a milestone release for Open vStorage: it is the first release where all content within a vPool is stored on the Storage Backend. On top, we now support the most used object store in the world, Swift.

  • The File Driver: In earlier version we requested the user to set up a distributed file system (which is very complex) to store the non-volume data or use a BitTorrent sync protocol to keep these non-volume files in sync between the different hosts. This release doesn’t require any of those workarounds anymore as we store these non-volume data ourselves in a highly available way on the storage backend.
  • Swift support: Swift is the most used object store for large public and private clouds worldwide and is implemented by PayPal , Wikipedia , MercadoLibre (Latin America’s largest online marketplace) and Disney. Now we fully support this Object Store as storage backend. This basically means we turn Swift into a block device. This is a new use case for Swift and is something nobody else can currently do.

These 2 milestones above are really important but the team also added some smaller features such as the new Arakoon release and better validation when configuring a vPool. Fixing bugs was of course also on the TODO-list. An overview of the most important fixes:

  • Fix so in use mountpoints are not listed when creating a new vPool
  • Prevent unnecessary volume stealing
  • Adding a second vPool on VMware should not stop the first vPool
  • Issue where Logstash is consuming too much CPU
  • Fixed Logrotation for OVS components
  • Fix for Arakoon not starting after a power failure in some cases
  • Fix for removing a vPool through the GUI
  • Support for /dev/vdx disks
  • Fix for sudo -s not working during package installation
  • Improved errorhandling in case of an S3 backend
  • GUI issue fixed for Firefox 30
  • Fix for issue with failing password update
  • Fix so you can remove an empty vPool while other vPools have vMachines
  • Fix for failing snapshots
  • Improved input validation when entering a vPool name

vMotion, Storage Router Teamwork

Important note: this blog posts talks about vMotion, a VMware feature. KVM fans should not be disappointed as Live Migration, the KVM version of vMotion is also supported by Open vStorage. We use the term vMotion as it is the most used term for this feature by the general IT public.

In a previous blogpost we explained why Open vStorage is different. One thing we do differently is not implementing a distributed file system. This sparked the interest of a lot of people but also raised questions for more clarification. Especially more information on how we pulled off vMotion without a distributed file system or expensive SAN raised a lot of fascination. Time for a blog post to explain how it all works under the hood.

Normal behavior

Under normal circumstances a volume, a disk of a Virtual Machine, can be seen by all hosts in the Open vStorage Cluster as it is a file on the vPool (a datastore in VMware) but the underlying, internal object (Volume Driver volume) is owned by a single host and can only be accessed by this single host. Each host can see the whole content of the datastore as each NFS and Fuse instance shows all the files on the datastore. This means the hosts believe they are using shared storage. But in reality only the metadata of the datastore is shared between all hosts but the actual data is not shared at all. To share the metadata across hosts a distributed database is used. To keep track of which host is ‘owning’ the volume and hence can access the data, we use an Object Registry which is implemented on top of a distributed database. The technology which tricks hosts in believing they are using shared storage while only one host really has access to the data is the core Open vStorage technology. This core technology consists out of 3 components which are available on all hosts with a Storage Router:
* The Object Router
* The Volume Driver
* The File Driver

The Object Router
The Object Router is the component underneath the NFS (VMware) and the FUSE (KVM) layer and dispatches requests for data to the correct core component. For each write the Object Router will check if it is the owner of the file on the datastore. In case the Object Router is the owner of the file it will hand off the data to underlying File or Volume Driver on the same Storage Router. Otherwise the Object Router will check in the Object Registry, stored in the distributed database, which Object Router owns the file and forwards the data to that Object Router. The same process is followed for read requests.

The Volume Driver
All the read and write requests for an actual volume (a flat-VMDK or raw file) are handled by the Volume Driver. This component is responsible for turning a Storage Backend into a block device. This is also the component which takes care of all the caching. Data which is no longer needed is sent to the backend to make room for new data in the cache. In case data is not in the cache but requested by the Virtual Machine, the Volume Driver will get the needed data from the backend. Note that a single volume is represented by a single bucket on the Storage Backend. It is important to see that only 1 Volume Driver will do the communication with the Storage Backend for a single volume.

The File Driver
The File Driver is responsible for all non volume files (vm config files, …). The File Driver stores the actual content of these files on the Storage Backend. Each small file is represented by a single file or key/value pair on the Storage Backend. In case a file is bigger than 1MB, it is split in smaller pieces to improve performance. All the non-volume files for a single datastore end up in a single, shared bucket. It is important to see that only 1 File Driver will do the communication with the Storage Backend for a file in the datastore.

Open vStorage - normal

vMotion Step 1

When a Virtual Machine is moved between hosts, vMotioned, vCenter calls the shots. In a first step vCenter will kick off the vMotion process as none of the hosts involved will complain as they believe they are using shared storage. As under normal vMotion behavior, the memory of the Virtual Machine will be copied to the destination host while the source VM continues to run (so no interruption for end-users there). Once the memory is almost completely copied the Virtual Machine is quiesced, the Virtual Machine state is transferred, the missing pieces of the memory are copied and the Virtual Machine is resumed on the destination. As for vMotion both hosts have access to the VMDK files, there is no special action needed on the storage level. But with Open vStorage the volumes of the Virtual Machine are not really shared between the hosts, remember the Object Router of the source host is the owner of the volumes. Open vStorage must tackle this when read or write requests happen. In case a write happens to the volume of the moved Virtual Machine, the Object Router on the destination host will see that it is not the owner of the volume. The destination Object Router will check in the Object Registry which Object Router owns the volumes and will forward the write requests to that Object Router. The Object Router on the source forwards the write to the Volume Driver on the source as under normal behavior. The same happens for read requests. To summarize, in a first step only the Virtual Machine is moved to the destination while the volumes of the Virtual Machine are still being served by the source Storage Router.

Open vStorage - vMotion 1

vMotion Step 2

After the first step of the vMotion process, the volumes of the Virtual Machine are still being owned and served by the Object Router of the source. This is of course a situation which can’t be sustained in case a lot of IO occurs on the volumes of the Virtual Machine. Once an IO threshold is passed, the Object Router of the destination will start negotiating with the Object Router on the source to hand over the volumes. Just as with the memory, the metadata of the volumes gets assembled in the Volume Driver at the destination. Once this process is complete a point in time is arranged to copy the last metadata. To complete the process the Source Object Router marks the volumes as owned by destination Object Router and from then on the volumes are served by the destination Object Router.

Open vStorage - vMotion 2

Summary

vMotion is supported by Open vStorage although a volume can only by written and read by a single host. In a first step vCenter will move the Virtual Machine to the destination host but the volumes of the Virtual Machine will still be served on the source hosts. This means that communication between the Object Routers on the 2 hosts is required for all IO traffic to the volumes. In a second phase, after an IO threshold is passed, the Object Routers will negotiate and agree to make the Object Router of the destination the owner of the volumes. Only after this second phase the whole Virtual Machine, both compute and disks, is running on the destination host.

What is the big deal with Virtual Volumes, VMware?

June 30 2014, mark the date, people. This is the day when VMware announced their public beta of Virtual Volumes. Virtual Volumes, or VVOL as VMware likes to call them, put a Virtual Machine and its disks, rather than a LUN, into the storage management spotlight. Through a specific API, vSphere APIs for Storage Awareness (VASA), your storage array becomes aware of Virtual Machines and their Virtual Disks. VASA allows to offload certain Virtual Machine operations such as snapshotting and cloning to the (physical) storage array.

Now, what is the big deal with Virtual Volumes, VMware? Open vStorage has been designed to allow administrators to manage each disk of a Virtual Machine individually from day one. We don’t call it Virtual Volumes but call it VM-centric, just like anyone else in storageland does. VMware, don’t get me wrong, I applaud that you are validating the VM-centric approach of software-defined storage solutions like Open vStorage. For over 4 years, the Open vStorage team has worked at creating a VM-centric storage solution which supports multiple hypervisors such as VMware ESXi and KVM but also many backends. It is nice to see that the view we had back then is now validated by a leader in the virtualization industry.

What confuses me a bit is that while the whole world is moving towards shifting storage functionality into software, that you take the bold, opposite approach and push VM-centric functionality towards the hardware. This behavior is strange as everyone else is taking functionality out of the legacy storage arrays and is more and more treating storage as a bunch of disk managed by intelligent software. If I remember it correctly, you declared at VMworld 2013 a storage array to be something of the past by announcing VSAN. The fact that storage arrays are according to most people past their expiry date was recently confirmed by another IT behemoth, Dell, by OEM-ing a well-known hyperconverged storage appliance.

A said before, Open vStorage has been designed with VM-centric functionality across hypervisor flavors in mind. This means that taking a snapshot or cloning a single Virtual Machine is as easy as clicking a button. Being a VM-centric solution doesn’t stop there. One of the most important features is replication on a per Virtual Machine basis. Before implementing this critical feature, the Open vStorage team has had a lot of discussion about where the replication functionality should be in the stack. We could have taken a short-cut and pushed the replication back to the storage backend (or storage array as VMware calls it). Swift and Ceph for example have replication as their middle name and can replicate data across multiple locations worldwide. But, by moving the replication functionality towards the storage backend you lose your VM-awareness. Pushing functionality towards the storage array is not the solution, intelligent storage software is the only answer to a VM-centric future.