2017, the Open vStorage predictions

2017, the Open vStorage predictions
2017 promises to be an interesting year for the storage industry. New technology is knocking at the door and present technology will not surrender without a fight. Not only new technology will influence the market but the storage market itself is morphing:

Further Storage consolidation

Let’s say that December 2015 was an appetizer with Netapp buying Solidfire. But in 2016 the storage market went through the first wave of consolidation: Docker storage start-up ClusterHQ shut its doors, Violin Memory filed for chapter 11, Nutanix bought PernixData , Nexgen was acquired by Pivot 3, Broadcom acquired Brocade, Samsung acquired Joyent. Lastly there was also the mega merger between storage mogul EMC and Dell. This consolidation trend will continue in 2017 as the environment for hyper-converged, flash and object storage startups is getting tougher because all the traditional vendors now offer their own flavor. As the hardware powering these solutions is commodity, the only differentiator is software.

Some interesting names to keep an eye on for M&A action or closure: Cloudian, Minio, Scality, Scale Computing, Stratoscale, Atlantis Computing, HyperGrid/Gridstore, Pure Storage, Tegile, Kaminario, Tintri, Nibmle Storage, Simplivity, Scale Computing, Primary Data, … We are pretty sure some of these name will not make it past 2017.

Open vStorage has already a couple of large projects lined up. 2017 sure looks promising for us.

The Hybrid cloud

Back from the dead like a phoenix. I expect a new live for the the hybrid cloud. Enterprises increasingly migrated to the public cloud in 2016 and this will only accelerate, both in speed and numbers. There are now 5 big clouds: Amazon AWS, Microsoft Azure, IBM, Google and Oracle.
But connecting these public cloud with in-house datacenter assets will be key. The gap between public and private clouds has never been smaller. AWS and VMware, 2 front runners, are already offering products to migrate between both solutions. Network infrastructure (performance, latency) is now finally also capable of turning the hybrid cloud into reality. Numerous enterprises will realise that going to the public cloud isn’t the only option for future infrastructure. I believe migration of storage and workloads will be one of the hottest features of Open vStorage in 2017. Hand in hand with the migration of workloads we will see the birth of various new storage as a service providers offering S3, secondary but also primary storage out of the public cloud.

On a side note, HPE (Helion), Cisco (Intercloud) and telecom giant Verizon closed their public cloud in 2016. It will be good to keep an eye out on these players to see what they are up to in 2017.

The end of Hyper-Convergence hype

In the storage market prediction for 2015 I predicted the rise of hyper-convergence. Hyper-converged solutions have lived up to their expectations and have become a mature software solution. I believe 2017 will mark a turning point for the hyper-convergence hype. Let’s sum up some reasons for the end of the hype cycle:

  • The hyper-converged market is mature and the top use cases have been identified: SMB environments, VDI and Remote Office/Branch Office (ROBO).
  • Private and public clouds are becoming more and more centralised and large scale. More enterprises will come to understand that the one-size-fits-all and everything-in-a-single-box approach of hyper-converged systems doesn’t scale to a datacenter level. This is typically an area where hyper-converged solutions reach their limits.
  • The IT world works like a pendulum. Hyper-convergence brought flash as cache into the server as the latency to fetch data over the network was too high. With RDMA and round trip times of 10 usec and below, the latency of the network is no longer the bottleneck. The pendulum is now changing its direction as the so web-scalers, the companies on which the hyper-convergence hype is ented, want to disaggregate storage by moving flash out of each individual server into more flexible, centralized repositories.
  • Flash, Flash, Flash, everything is becoming flash. As stated earlier, the local flash device was used to accelerate slow SATA drives. With all-flash versions, these hyper-converged solutions go head to head with all-flash arrays.

One of the leaders of the hyper-converged pack has already started to move into the converged infrastructure direction by releasing a storage only appliance. It will be interesting to see who else follows.

With the new Fargo architecture which is designed for large scale, multi petabyte, multi datacenter environments, we already capture the next trend: meshed, hyper-aggregated architectures. The Fargo release supports RDMA, allows to built all-flash storage pools and incorporates a distributed cache across all flash in the datacenter. 100% future proof and ready to kickstart 2017.

PS. If you want to run Open vStorage hyper-converged, feel free to do so. We have componetized Open vStorage so you can optimize it for your use case: run everything in a single box or spread the components across different servers or even datacenters!

IoT storage lakes

More and more devices are connected to the internet. This Internet of Things (IoT) is posed to generate a tremendous amount of data. Not convinced? Intel research for example estimated that autonomous cars will produce 4 terabytes of data daily per car. These Big Data lakes need a new type of storage: storage which is ultra-scalable. Traditional storage is simply not suited to process this amount of storage. On top in 2017 we will see artificial intelligence increasingly being used to mine data in these lakes. This means the performance of the storage needs to able to serve real-time analytics. Since IoT device can be located anywhere in the world, geo-redundancy and geo-distribution are also required. Basically IoT use cases are a perfect match for the Open vStorage technology.

Some interesting fields and industries to follow are consumer goods (smart thermostats, IP cameras, toys, …), automotive and healthcare.

Seagate Kinetic Open Storage Project Plugfest

Open vStorage was invited to host a session during the Seagate Kinetic plugfest on Tuesday, September 20 to demo and discuss advances in Ethernet-connected storage. Kinetic is a drive architecture in which the drive is a key/value server with Ethernet connectivity. With Open vStorage we have created ALBA ASD software that mimics this key/value behaviour for normal SATA drives. Kinetic drives can of course also be used as archiving backend for an Open vStorage cluster.

Read more about the Kinetic Open Storage Project here.

Edge: HA, failure and the moving of volumes explained

edge HA FailoverOpen vStorage is designed to be rock solid and survive failures. These failures can come in many forms and shapes: nodes might die, network connections might get interrupted, … Let’s give an overview of the different tactics that are used by Open vStorage when disaster strikes by going over some possible use cases where the new edge plays a role.

Use case 1: A hypervisor fails

In case the hypervisor fails, the hypervisor management (OpenStack, vCenter, …) will detect the failure and restart the VM on another hypervisor. Since the VM is started on another hypervisor, the VM will talk to the edge client on the new hypervisor. The edge client will connect to a volume driver in the vPool and enquire which volume driver owns the disks of the VM. The volume driver responds who is the owner and the edge connects to the volume driver owning the volume. This all happens almost instantaneously and in the background so the the IO of the VM isn’t affected.

Use case 2: A Storage Router fails

In case a Storage Router and hence the volume driver on it die, the edge client automatically detects that the connection to the volume driver is lost. Luckily the edge keeps a list of volume drivers which also serve the vPool and it connects to one of the remaining volume drivers in the vPool. It is clear that the edge prefers to fail-over to a volume driver which is close-by f.e. within the same datacenter. The new volume driver to which the edge connects detects that it isn’t the owner of the volume. As the old volume driver is no longer online, the new volume driver steals the ownership of the VMs volume. Stealing is allowed in this case as the old volume driver is down. Once the new volume driver becomes the owner of the volumes, the edge client can start serving IO. This whole process process happens in the background and halts the IO of the VM for a fraction of a second.

Use case 3: Network issues

In some exceptional cases it isn’t the hypervisor or the storage router that fails but the network in between. This is an administrator’s worst nightmare as it might lead to split brain scenarios. Even in this case the edge is able to outlive the disaster. As the network connection between the edge and the volume driver is lost, the edge will assume the volume driver is dead. Hence, as in use case 2 the edge connects to another volume driver in the same vPool. The volume driver first tries to contact the old volume driver.

Now there are 2 options:

  • The new volume driver can contact the old volume driver. After some IO is exchanged the new volume driver asks the old volume driver to hand over the volume. This handover doesn’t impact the edge.
  • The new volume driver can also not contact the old volume driver. In that case old volume driver steals the volume from the old volume driver. It does this by updating the ownership of the volume in the distributed DB and by uploading a new key to the backend. As the ALBA backend uses a conditional write approach, it only writes the IO to disks of the backend if the accompanying key is valid, it can ensure only the new volume driver is allowed to write to the backend. If the old volume driver would still be online (split brain) and try to update the backend, the write would fail as it is using an outdated key.

IT Administrator – Fargo – Hot and fresh

IT Administrator Fargo - Heis und freshThe new Fargo release of Open vStorage is featured in the German IT Administrator magazine of December. The IT Administrator team did a deep dive of almost 5 pages into the new Fargo architecture and gave the new release a testdrive through Ansible and docker.

Open vStorage erhält mit dem neuen Fargo-Release eine völlig überarbeitete Architektur. Die quelloffene Storage-Umgebung macht sich den Shared-Memory-Ansatz zunutze und verspricht noch mehr Leistungsfähigkeit, eine bessere Performance und ein Höchstmaß an Sicherheit. Damit ist Open vStorage die ideale Lösung für Multi-Petabyte Multi-Datacenter Storage-Cluster – so sehen das zumindest die Entwickler.

Open vStorage opens up its API kimono

oai
With the Fargo release Open vStorage opens up its API kimono. In earlier versions of Open vStorage the API was something that was well hidden in the documentation section. As a result many of our integration partners had questions on how to use the API, what exactly was possible with the API or for example what the required parameters were to take a snapshot. It was clear for everyone that we had to give the API some more spotlight.

Why an API?

An API is especially important because it dictates how the developers of these integrators can create new apps, websites and services on top of the Open vStorage storage solution. A hosting provider has for example built an OpenStack-like GUI for its KVM + Open vStorage cluster. They create vDisks on Open vStorage directly from their GUI, take snapshots and even scrub the vDisks on demand. They are consuming every aspect of our API. During this integration it became clear that keeping our API documentation up to date was a challenge. The idea grew to make the API self-describing and browsable.

Open API

APIs come in many forms but some standards are crystallizing. Open vStorage follows the Open API specification (OAI). This specification is supported by some of the big names in the IT industry such as Google, Microsoft, IBM and PayPal. It also means some great open-source tools can be leveraged such as NSwag and Swagger UI. NSwag is a Swagger API toolchain for .NET, Web API and TypeScript (jQuery, AngularJS, Angular 2, Aurelia, KnockoutJS, and more). Swagger UI is a tool that dynamically generates beautiful documentation and a sandbox to play with straight from the browser.

Browsable API

To explore the Open vStorage API, download the Swagger UI , unzip the archive and serve the dist folder from either your file system or a web server.

Next, enter in the textbox https://[ip of the GUI]/api/swagger.json and press enter.

open-vstorage-api

You can now browse through the API. As an example you can verify which parameters are required to move a vDisk between Storage Routers.

open-vstorage-api-move-vdisk

One small, but important remark. Currently Swagger-UI doesn’t support OAuth2 yet. This means you can browse the API but you can’t execute API requests as these need to be authenticated.

Open vStorage Releases

release-managementSince Open vStorage is running in production at customers we need to carefully plan our releases as a small glitch might cause a disaster. For storage software there is a golden rule

If it ain’t broken, don’t fix it!

With the release of Fargo RC1 we are entering a new cycle of intermediate releases and bugfixes. Once Fargo is GA we will push out a new update at regular intervals. Before installing an update customers like to know what is exactly fixed in a certain update. That is why for each release, even an intermediate release, the release notes are documented. Let’s take as an example the Fargo Release Candidate 1. This release consists out of following packages:

The content of each package e.g. the webapps package can be found on the appropriate repository (or you can click the link in the release notes). The release notes of the package contain a summary of all fixed issues in that exact package. In case you want to be kept up to date of new releases, add the the release page as RSS feed (https://github.com/openvstorage/home/releases.atom) to your favourite RSS Feed reader. If you prefer to be kept up to date by email, you can use Sibbell, Blogtrottr or a similar service.

Moving block storage between datacenters: the Demo

keep-calm-it-s-just-a-bloody-datacenter-moveProbably the coolest feature of the new Fargo release is the GeoScale capability, spreading data across multiple datacenters. With this feature Open vStorage can offer distributed block storage across multiple locations. In the below demo, storage is spread across 3 datacenters in la douce France (Roubaix, Strasbourg and Gravelines). The demo also explains how the storage is spread across these datacenters and shows the live migration of a running VM and its storage between 2 datanceters. The whole migration process completes within a few seconds. The GeoScale functionality can be compared with solving a Sudoku puzzle. The data gets chopped up in chunks which are distributed across all the nodes and datacenters in the cluster. As long as you have enough chunks (disk, nodes or datacenters) left, you can always recover the data. In the demo even a datacenter loss is supported.

GeoScale FAQ

Can I survive a datacenter outage?

Yes, in a GeoScale cluster, the data is spread over multiple datacenters and is available from each location. If one of these datacenters goes offline, the GeoScale cluster stays up and running and continues to serve data. Virtual Machines running in the datacenter that went down can be migrated to one of the other datacenters in just seconds without having to copy all of the data.

Will storing data across multiple datacenters not be too slow for my database, VMs, … ?

No, Open vStorage aggregates all flash (SSDs, NVMe, PCIe) within each datacenter to initiate a global cache. To speed up reads Open vStorage uses these local cache pools to speed up incoming reads and writes.

How far can the datacenters be apart?

Open vStorage supports metroscale clusters where the datacenters are only a couple of miles away, such as the greater New York region, but can even clusters where datacenters a couple of thousands of miles apart are supported.

Support for Ubuntu 16.04

ubuntuLast Friday, November 4th, the Open vStorage team released the first RC of the new Fargo version. We are really excited about Fargo as there are a lot of new features being added to it. To name some of the new features:

  • Support for Ubuntu 16.04.
  • HA for the Edge which allows automatic failover in case the host running the VolumeDriver goes down.
  • Support for Arakoon as distributed config management.
  • 64TB volumes.

Earlier versions of Open vStorage supported Ubuntu 14.04. With the release of Ubuntu 16.04, which is an Ubuntu LTS version and hence will have updates and support for the next 5 years, it was essential for us to also update the Open vStorage software to work on Ubuntu 16.04.

Get started with Ubuntu 16.04:

Installing Open vStorage on Ubuntu 16.04 is almost as easy as installing on 14.04. One change is that the software packages are now signed. Signing the packages allows you, the installer of the packages, to verify that no modifications occurred after the packages were signed. The steps to get the latest packages are as simple as:

  • Download and install Ubuntu 16.04 on the host.
  • Add the Open vStorage repo to the host:
    echo "deb http://apt.openvstorage.com unstable main" > /etc/apt/sources.list.d/ovsaptrepo.list
  • Add the key:
    apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 4EFFB1E7
  • Make sure the Open vStorage packages have a higher preference so our packages are installed:
    cat < /etc/apt/preferences
    Package: *
    Pin: origin apt.openvstorage.com
    Pin-Priority: 1000
    EOF
  • Run apt-get update to get the latest packages

To install the Open vStorage software you can follow the normal flow as described here.

Distributed Config Management

Distributed Config ManagementWhen you are managing large clusters, keeping the configuration of every system up to date can be quite a challenge: new nodes are joining the cluster, old nodes need to be replaced, vPools are created and removed, … . In Eugene and earlier versions we relied on simple config files which were located on each node. It should not come as a surprise that in large clusters it proved to be a challenge to keep the config files in sync. Sometime a clusterwide config parameter was updated while one of the nodes was being rebooted. This had as consequence that the update didn’t make it to the node and after the reboot it kept running with an old config.
For Fargo we decided to tackle this problem. The answer: Distributed Config Management.

Distributed Config Management

All config files are now stored in a distributed config management system. When a component starts, it now retrieves the latest configuration settings from the management system. Let’s have a look at how this works in practice. For example a node is down and we remove the vPool from that node. As the vPool was shrunk, the config for that VolumeDriver is removed from the config management system. When the node restarts it will try to get the latest configuration settings for the vPool from the config management system. As there is no config for the removed vPool, the VolumeDriver will no longer serve the vPool. In a first phase we have added support for Arakoon, our beloved and in-house developed distributed key/value store, as distributed config management system. As an alternative to Arakoon, ETCD has been incorporated but do know that in our own deployments we always use Arakoon (hint).

How to change a config parameter:

Changing parameters in the config management system is very easy through the Open vStorage CLI:

  • ovs config list some: List all keys with the given prefix.
  • ovs config edit some-key: Edit that key in your configured editor. If the key doesn’t exist, it will get created.
  • ovs config get some-key: Print the content of the given key.

The distributed config management also contains a key for all scheduled tasks and jobs. To update the default schedule, edit the key /ovs/framework/scheduling/celery and plan the tasks by adding a crontab style schedule.