Important note: this blog posts talks about vMotion, a VMware feature. KVM fans should not be disappointed as Live Migration, the KVM version of vMotion is also supported by Open vStorage. We use the term vMotion as it is the most used term for this feature by the general IT public.
In a previous blogpost we explained why Open vStorage is different. One thing we do differently is not implementing a distributed file system. This sparked the interest of a lot of people but also raised questions for more clarification. Especially more information on how we pulled off vMotion without a distributed file system or expensive SAN raised a lot of fascination. Time for a blog post to explain how it all works under the hood.
Under normal circumstances a volume, a disk of a Virtual Machine, can be seen by all hosts in the Open vStorage Cluster as it is a file on the vPool (a datastore in VMware) but the underlying, internal object (Volume Driver volume) is owned by a single host and can only be accessed by this single host. Each host can see the whole content of the datastore as each NFS and Fuse instance shows all the files on the datastore. This means the hosts believe they are using shared storage. But in reality only the metadata of the datastore is shared between all hosts but the actual data is not shared at all. To share the metadata across hosts a distributed database is used. To keep track of which host is ‘owning’ the volume and hence can access the data, we use an Object Registry which is implemented on top of a distributed database. The technology which tricks hosts in believing they are using shared storage while only one host really has access to the data is the core Open vStorage technology. This core technology consists out of 3 components which are available on all hosts with a Storage Router:
* The Object Router
* The Volume Driver
* The File Driver
The Object Router
The Object Router is the component underneath the NFS (VMware) and the FUSE (KVM) layer and dispatches requests for data to the correct core component. For each write the Object Router will check if it is the owner of the file on the datastore. In case the Object Router is the owner of the file it will hand off the data to underlying File or Volume Driver on the same Storage Router. Otherwise the Object Router will check in the Object Registry, stored in the distributed database, which Object Router owns the file and forwards the data to that Object Router. The same process is followed for read requests.
The Volume Driver
All the read and write requests for an actual volume (a flat-VMDK or raw file) are handled by the Volume Driver. This component is responsible for turning a Storage Backend into a block device. This is also the component which takes care of all the caching. Data which is no longer needed is sent to the backend to make room for new data in the cache. In case data is not in the cache but requested by the Virtual Machine, the Volume Driver will get the needed data from the backend. Note that a single volume is represented by a single bucket on the Storage Backend. It is important to see that only 1 Volume Driver will do the communication with the Storage Backend for a single volume.
The File Driver
The File Driver is responsible for all non volume files (vm config files, …). The File Driver stores the actual content of these files on the Storage Backend. Each small file is represented by a single file or key/value pair on the Storage Backend. In case a file is bigger than 1MB, it is split in smaller pieces to improve performance. All the non-volume files for a single datastore end up in a single, shared bucket. It is important to see that only 1 File Driver will do the communication with the Storage Backend for a file in the datastore.
vMotion Step 1
When a Virtual Machine is moved between hosts, vMotioned, vCenter calls the shots. In a first step vCenter will kick off the vMotion process as none of the hosts involved will complain as they believe they are using shared storage. As under normal vMotion behavior, the memory of the Virtual Machine will be copied to the destination host while the source VM continues to run (so no interruption for end-users there). Once the memory is almost completely copied the Virtual Machine is quiesced, the Virtual Machine state is transferred, the missing pieces of the memory are copied and the Virtual Machine is resumed on the destination. As for vMotion both hosts have access to the VMDK files, there is no special action needed on the storage level. But with Open vStorage the volumes of the Virtual Machine are not really shared between the hosts, remember the Object Router of the source host is the owner of the volumes. Open vStorage must tackle this when read or write requests happen. In case a write happens to the volume of the moved Virtual Machine, the Object Router on the destination host will see that it is not the owner of the volume. The destination Object Router will check in the Object Registry which Object Router owns the volumes and will forward the write requests to that Object Router. The Object Router on the source forwards the write to the Volume Driver on the source as under normal behavior. The same happens for read requests. To summarize, in a first step only the Virtual Machine is moved to the destination while the volumes of the Virtual Machine are still being served by the source Storage Router.
vMotion Step 2
After the first step of the vMotion process, the volumes of the Virtual Machine are still being owned and served by the Object Router of the source. This is of course a situation which can’t be sustained in case a lot of IO occurs on the volumes of the Virtual Machine. Once an IO threshold is passed, the Object Router of the destination will start negotiating with the Object Router on the source to hand over the volumes. Just as with the memory, the metadata of the volumes gets assembled in the Volume Driver at the destination. Once this process is complete a point in time is arranged to copy the last metadata. To complete the process the Source Object Router marks the volumes as owned by destination Object Router and from then on the volumes are served by the destination Object Router.
vMotion is supported by Open vStorage although a volume can only by written and read by a single host. In a first step vCenter will move the Virtual Machine to the destination host but the volumes of the Virtual Machine will still be served on the source hosts. This means that communication between the Object Routers on the 2 hosts is required for all IO traffic to the volumes. In a second phase, after an IO threshold is passed, the Object Routers will negotiate and agree to make the Object Router of the destination the owner of the volumes. Only after this second phase the whole Virtual Machine, both compute and disks, is running on the destination host.