Open vStorage 2.2 alpha 1

Today we released a first version of our upcoming 2.2 release. We will from now one do more frequent releases which will cover the latest changes but keep in mind that these release have not gone through our full QA cycle. Having these ‘alpha’ versions released will allow you, the Open vStorage community, to have earlier access to new features and bugfixes but on the other hand these releases are less stable than our ‘beta’-releases. Documentation for these releases will also not always be available. In case you need help, the Open vStorage Google Groups is there to help.

What is new in the 2.2 alpha 1:

  • Huge VMware performance improvement: we have reworked our NFS integration with VMware and have made significant performance improvements (5-10x faster). Please note this is still experimental and f.e. cloning from template on VMware will in this version not work. But by all means, give it a go and let us know your experience!
  • Status of the physical devices (SSDs and SATA drives) of a Storage Router are now shown in the GUI on the Storage Router Detail page. You can also see in detail which partitions are located on which device. In a later stage we plan to make the partitioning adjustable through the GUI.
  • We have improved the performance and reduced the CPU impact of the GUI.

Small feature improvements:

  • Added check in OVS setup which disallows the possibility to rerun the setup.
  • Cinder gets automatically configured if you configured OpenStack as Hypervisor Management Center.
  • ovs-snmp port is now configurable.
  • Option to add a password when a new user is created.
  • Rename of an OpenStack volume updates the vDisk name.
  • Added the possibility to install the Open vStorage Backend packages after configuring Open vStorage.
  • Improvements to the performance of the ASDs.
  • Option to remove an Open vStorage Backend.
  • Option to define the replication factor of an Open vStorage Backend.
  • Option to enable compression for a Storage Backend.
  • ASD nodes can now be collapsed in the Backend detail page.
  • Highlight the ASDs on which an action applies in the Backend details page.
  • Impact of removing an ASD is made clear.

Bugfixes:

  • Fixed the issue where ASD’s are labeled as dead under high load.
  • Initializing a new disk (as replacement disk of a broken disk) fails.
  • Open vStorage port range 8870+ overlaps with c-api port 8876 causing n-api service on devstack to fail to restart with address already in use.
  • vDisk naming is now more consistent with the reality.
  • Fix for multiple vPools using the same read cache path.
  • Hardening vPool creation.
  • Bugfixes for various issues with the Open vStorage Backend.
  • Fix for dmesg output does not show up in syslog or kern.log
  • Failed to create an ASD if a filesystem exists on the disk.
  • Timestamps not being added in upstart logs.
  • Fix for sync disk with reality sometimes fails.
  • Incorrect permission on ovs user’s .ssh folder causes login using authorized_keys to fail.
  • Fix for issues with rabbitmqctl during install.
  • ovs collect logs doesn’t collect all logs through the GUI.
  • Metadataserver quickly fills up root partition.

How do you install this version:
When installing, add the alpha repo instead of the beta repo.

echo "deb http://apt-ovs.cloudfounders.com alpha/" > /etc/apt/sources.list.d/ovsaptrepo.list

For people using OpenStack:
Before creating a vPool, add the OpenStack controller node as Hypervisor Management Center (Admin > Hypervisor Management Center) and select all hosts on the second part of the screen. When you create a vPool, Cinder will now be automatically deployed and configured. The nova and libvirtd changes as listed in the documentation still need to be applied to the compute hosts though.

vMotion, Storage Router Teamwork

Important note: this blog posts talks about vMotion, a VMware feature. KVM fans should not be disappointed as Live Migration, the KVM version of vMotion is also supported by Open vStorage. We use the term vMotion as it is the most used term for this feature by the general IT public.

In a previous blogpost we explained why Open vStorage is different. One thing we do differently is not implementing a distributed file system. This sparked the interest of a lot of people but also raised questions for more clarification. Especially more information on how we pulled off vMotion without a distributed file system or expensive SAN raised a lot of fascination. Time for a blog post to explain how it all works under the hood.

Normal behavior

Under normal circumstances a volume, a disk of a Virtual Machine, can be seen by all hosts in the Open vStorage Cluster as it is a file on the vPool (a datastore in VMware) but the underlying, internal object (Volume Driver volume) is owned by a single host and can only be accessed by this single host. Each host can see the whole content of the datastore as each NFS and Fuse instance shows all the files on the datastore. This means the hosts believe they are using shared storage. But in reality only the metadata of the datastore is shared between all hosts but the actual data is not shared at all. To share the metadata across hosts a distributed database is used. To keep track of which host is ‘owning’ the volume and hence can access the data, we use an Object Registry which is implemented on top of a distributed database. The technology which tricks hosts in believing they are using shared storage while only one host really has access to the data is the core Open vStorage technology. This core technology consists out of 3 components which are available on all hosts with a Storage Router:
* The Object Router
* The Volume Driver
* The File Driver

The Object Router
The Object Router is the component underneath the NFS (VMware) and the FUSE (KVM) layer and dispatches requests for data to the correct core component. For each write the Object Router will check if it is the owner of the file on the datastore. In case the Object Router is the owner of the file it will hand off the data to underlying File or Volume Driver on the same Storage Router. Otherwise the Object Router will check in the Object Registry, stored in the distributed database, which Object Router owns the file and forwards the data to that Object Router. The same process is followed for read requests.

The Volume Driver
All the read and write requests for an actual volume (a flat-VMDK or raw file) are handled by the Volume Driver. This component is responsible for turning a Storage Backend into a block device. This is also the component which takes care of all the caching. Data which is no longer needed is sent to the backend to make room for new data in the cache. In case data is not in the cache but requested by the Virtual Machine, the Volume Driver will get the needed data from the backend. Note that a single volume is represented by a single bucket on the Storage Backend. It is important to see that only 1 Volume Driver will do the communication with the Storage Backend for a single volume.

The File Driver
The File Driver is responsible for all non volume files (vm config files, …). The File Driver stores the actual content of these files on the Storage Backend. Each small file is represented by a single file or key/value pair on the Storage Backend. In case a file is bigger than 1MB, it is split in smaller pieces to improve performance. All the non-volume files for a single datastore end up in a single, shared bucket. It is important to see that only 1 File Driver will do the communication with the Storage Backend for a file in the datastore.

Open vStorage - normal

vMotion Step 1

When a Virtual Machine is moved between hosts, vMotioned, vCenter calls the shots. In a first step vCenter will kick off the vMotion process as none of the hosts involved will complain as they believe they are using shared storage. As under normal vMotion behavior, the memory of the Virtual Machine will be copied to the destination host while the source VM continues to run (so no interruption for end-users there). Once the memory is almost completely copied the Virtual Machine is quiesced, the Virtual Machine state is transferred, the missing pieces of the memory are copied and the Virtual Machine is resumed on the destination. As for vMotion both hosts have access to the VMDK files, there is no special action needed on the storage level. But with Open vStorage the volumes of the Virtual Machine are not really shared between the hosts, remember the Object Router of the source host is the owner of the volumes. Open vStorage must tackle this when read or write requests happen. In case a write happens to the volume of the moved Virtual Machine, the Object Router on the destination host will see that it is not the owner of the volume. The destination Object Router will check in the Object Registry which Object Router owns the volumes and will forward the write requests to that Object Router. The Object Router on the source forwards the write to the Volume Driver on the source as under normal behavior. The same happens for read requests. To summarize, in a first step only the Virtual Machine is moved to the destination while the volumes of the Virtual Machine are still being served by the source Storage Router.

Open vStorage - vMotion 1

vMotion Step 2

After the first step of the vMotion process, the volumes of the Virtual Machine are still being owned and served by the Object Router of the source. This is of course a situation which can’t be sustained in case a lot of IO occurs on the volumes of the Virtual Machine. Once an IO threshold is passed, the Object Router of the destination will start negotiating with the Object Router on the source to hand over the volumes. Just as with the memory, the metadata of the volumes gets assembled in the Volume Driver at the destination. Once this process is complete a point in time is arranged to copy the last metadata. To complete the process the Source Object Router marks the volumes as owned by destination Object Router and from then on the volumes are served by the destination Object Router.

Open vStorage - vMotion 2

Summary

vMotion is supported by Open vStorage although a volume can only by written and read by a single host. In a first step vCenter will move the Virtual Machine to the destination host but the volumes of the Virtual Machine will still be served on the source hosts. This means that communication between the Object Routers on the 2 hosts is required for all IO traffic to the volumes. In a second phase, after an IO threshold is passed, the Object Routers will negotiate and agree to make the Object Router of the destination the owner of the volumes. Only after this second phase the whole Virtual Machine, both compute and disks, is running on the destination host.

What is the big deal with Virtual Volumes, VMware?

June 30 2014, mark the date, people. This is the day when VMware announced their public beta of Virtual Volumes. Virtual Volumes, or VVOL as VMware likes to call them, put a Virtual Machine and its disks, rather than a LUN, into the storage management spotlight. Through a specific API, vSphere APIs for Storage Awareness (VASA), your storage array becomes aware of Virtual Machines and their Virtual Disks. VASA allows to offload certain Virtual Machine operations such as snapshotting and cloning to the (physical) storage array.

Now, what is the big deal with Virtual Volumes, VMware? Open vStorage has been designed to allow administrators to manage each disk of a Virtual Machine individually from day one. We don’t call it Virtual Volumes but call it VM-centric, just like anyone else in storageland does. VMware, don’t get me wrong, I applaud that you are validating the VM-centric approach of software-defined storage solutions like Open vStorage. For over 4 years, the Open vStorage team has worked at creating a VM-centric storage solution which supports multiple hypervisors such as VMware ESXi and KVM but also many backends. It is nice to see that the view we had back then is now validated by a leader in the virtualization industry.

What confuses me a bit is that while the whole world is moving towards shifting storage functionality into software, that you take the bold, opposite approach and push VM-centric functionality towards the hardware. This behavior is strange as everyone else is taking functionality out of the legacy storage arrays and is more and more treating storage as a bunch of disk managed by intelligent software. If I remember it correctly, you declared at VMworld 2013 a storage array to be something of the past by announcing VSAN. The fact that storage arrays are according to most people past their expiry date was recently confirmed by another IT behemoth, Dell, by OEM-ing a well-known hyperconverged storage appliance.

A said before, Open vStorage has been designed with VM-centric functionality across hypervisor flavors in mind. This means that taking a snapshot or cloning a single Virtual Machine is as easy as clicking a button. Being a VM-centric solution doesn’t stop there. One of the most important features is replication on a per Virtual Machine basis. Before implementing this critical feature, the Open vStorage team has had a lot of discussion about where the replication functionality should be in the stack. We could have taken a short-cut and pushed the replication back to the storage backend (or storage array as VMware calls it). Swift and Ceph for example have replication as their middle name and can replicate data across multiple locations worldwide. But, by moving the replication functionality towards the storage backend you lose your VM-awareness. Pushing functionality towards the storage array is not the solution, intelligent storage software is the only answer to a VM-centric future.

There’s more than one way to change a tyre

During talks with people who are interested in the Open vStorage concept, we quite often get the following question:

How are you different compared to <insert name of software-defined storage product of the moment>

Well the answer is something which has been the adagio within CloudFounders from day one: Keep IT Simple, Stupid. The KISS principle states that most systems work best if they are kept simple rather than made complicated. This is equally through in the storage industry. For decades we are chasing the utopian dream of Virtual Machines that are always online. I must admit, we are very close but we should look back at the price we are paying: complexity. But don’t be fooled, hand in hand with complexity comes its buddy error.
Within our company we have an average of 15 years’ experience within datacenter and storage environments. The most important thing we learned down the road is that complexity is your worst enemy when things go bad. The second thing we learned is that things will go bad eventually, no matter how prepared you are. Ok, when you have added enough complexity like active-active SANs, firewalls, databases, etc, things will go bad less often but when disaster strikes your downtime will be much more than when you would have kept things simple.
When we designed Open vStorage the KISS principle has always been in the back of our mind. I’d like to give you 2 examples of where we keep things simple while others are making it complicated, expensive and dangerous.

  • To support functionality such as vMotion and VMware High Availability (HA), you can use an expensive SAN. In order to decrease the hardware complexity and cost, architects are nowadays turning towards distributed file systems. Don’t get me wrong, distributed file systems such as GlusterFS, developed by more than 70 software engineers, is a great file system. But in the end it is a file system, it is designed to store files and was never designed to run Virtual Machines. But a clever engineer saw that GlusterFS solved his problem of a unified namespace, making all files on the file system available on every host. But let us take a step back to see the unified-namespace-problem in perspective. What we want to achieve is that the Virtual Machine volume, a block device, of a Virtual Machine can move from one host to another host without having to bring the VM down. Instead of using a distributed file system to store the virtual disk, Open vStorage uses a different less complex approach to tackle this problem. We store the volumes on a shared backend. To make sure that VM data isn’t accessed at the same time by two hosts we have created a rule within Open vStorage that a volume can only by attached to one host. We store the ‘owner’ of the host in a distributed database. When vMotion is triggered, the 2 hosts work together to make the transition. Once the volume is available on the second host, the 2 hosts sync the data which is not yet on the backend and the move is complete. To summarize instead of making all Virtual Machine volumes available on all hosts all the time, Open vStorage makes sure that a volume is connected to a single host at the right time.
  • To support multiple hypervisors and multiple storage backends, we also took an easy approach. Instead of trying to convert each hypervisor type directly into different storage protocols, we created a front and a backend. The complexity was greatly reduced as we only need to write per hypervisor a storage frontend and immediately the new hypervisor can talk with all storage backends. The same goes for a new backend: write a new backend extension and all hypervisors can use that backend. This flexibility does not only minimize the development effort but on top different hypervisors can share the same storage pool. Also as we use raw volumes, moving a Virtual Machine from VMware to KVM could (in the future) just be a reboot on the right host, no conversion needed.

Now back to the title, you can change a flat tyre in different ways, one way it to just replace the flat the tyre. Or, you could instead strip the whole car from around that flat tyre and reassemble it around a new tyre. This way you fix the same issue but the cost will be higher, the process will be more complex and you will probably need more than 70 engineers to fix your flat tyre.

Open vStorage 1.1

The Open vStorage team is on fire. We released a new version of the Open vStorage software (select Test as QualityLevel). The new big features for this release are:

  • Logging and log collecting: we have added logging to Open vStorage. The logs are also centrally gathered and stored in a distributed Search Engine (Elasticsearch). The logs can be viewed, browsed, searched and analyzed through Kibina, a very nice designed GUI.
  • HA Support: Open vStorage does with this release not only support vMotion but also the HA functionality of VMware. This means that in case an ESXi Host dies and vCenter starts the VMs on another Host, the volumes will automatically be migrated along.

Some small feature that got into this release:

  • Distributed filesystem can now be selected as Storage Backend in the GUI. Earlier you could select the file system but you could not extend it to more Grid Storage Routers (GSR). Now you can extend it across more GSRs and do vMotion on top of f.e. GlusterFS.
  • The status of VMs on KVM is now updated quicker in the GUI.
  • Manager.py has now an option to specify the version you want to install (run manager.py -v ‘version number’)
  • Under administration there is now an About Open vStorage page displaying the version of all installed components.

In addition, the team also fixed 35 bugs.

A distribution center for Virtual Machine data

walmart_distribution_centerAt open vStorage we quite often get the question: “Ok, you are a Grid Storage Router but what is that exactly?”

To explain what a Grid Storage Router is and why it is essential in a virtual environment, I’d like to make the analogy with Walmart, the largest retailer in the world. You can compare Open vStorage to the grid of distribution centers of Walmart. Walmart has 42 regional U.S. distribution centers with over 1 million square feet. In total these distribution centers have more than 12 miles of conveyor belts to move 5.5 billion cases of merchandise.

Sam Walton, the founder of Walmart, realized very quickly that in order to sell a lot of goods the company had to develop a multilayered distribution system and identified logistics as its primary expertise. With multiple stores, they could have opted to arrange that goods go directly from manufacturer to the end-shops but the Walmart management quickly realized that having a central hub in between these 2 makes sense. Instead of having their more than 100,000 suppliers dropping of their goods at the door of the 5,000 stores, they ask the suppliers to drop their goods off at one of the regional distribution centers. Having only a limited amount of distribution centers allowed the trucks with supplies to be stocked to the maximum level and hence optimizing round-trips and making the best use of the available truckload capacity.

The distribution centers have grown from being just a temporary storage location, to a fully automated high-tech center where every move is orchestrated and tracked. On top of just temporarily storing goods, the distribution centers also offer additional services such as splitting up large volumes into smaller parcels, repackaging and keep track of the stock.
Typically one distribution center can cater the needs of multiple stores but there is a limit to this capacity. When needed a new distribution center, which typically follows a blueprint, will open to relieve the pressure and to prepare for yet more stores. This allows to scale-out the stores without bringing the chain of supply in danger.

Just as Walmart considers their distribution centers to be as important as their stores, you should attribute Open vStorage with the same importance. Open vStorage is the hub between one or more Storage Backends (Swiftstack, Ceph, …) and your Virtual Machines. Like distribution centers Open vStorage doesn’t only connect suppliers (Storage Backend) and stores (Virtual Machines) with each other in an optimized fashion to build scale-out Virtual Machine storage but it also brings additional value to the table. With its VM-centric architecture it allows efficient unlimited snapshotting, VM replication*, compression and encryption at the VM level*.

Having this grid of Storage Routers, where one can take over from another, it allows to improve the reliability and as all metric of these Storage Routers and the Virtual Machines are tracked, the troubleshooting in case of an issue becomes much easier.

Where distribution centers work with different types of stores (Walmart Supercenters or Walmart Express), Open vStorage is hypervisor agnostic and can handle both VMware ESXi and KVM workloads. For both hypervisors every write gets accelerated on SSD or PCI flash cards and reads are optimized by deduplicating identical content and hence making the best use of the available fast storage.

Just as the distribution centers are key in the success of Walmart, Open vStorage is key in building out scalable, high performance, VM-centric storage.

* Planned for Q3 2014