The Open vStorage iSCSI integration

Since Open vStorage released the Edge, our own lightweight block API, it was possible to turn vDisks into iSCSI exposed volumes. As this was a first version, we didn’t put a lot of effort in making it very user friendly. As such, it took quite some steps and manual tweaking to get it working alas. On the other hand it gave us valuable feedback on whether it was working well and how an automated flow had to look like. Hence it is with great pleasure I can announce that we completed the integration of the Linux SCSI target framework (TGT) with Open vStorage. Do note that the integration is only available as part of the Open vStorage Enterprise Edition.

The iSCSI Manager

In order to easily turn vDisks into iSCSI disks, we have developed an iSCSI Manager package. This package can be installed on any server where the Edge is already installed. The package deploys a small API for managing iSCSI servers and targets and registers the iSCSI target server with the Open vStorage framework. Once the target server is registered, vDisks can be exposed as iSCSI disks by the server. Optionally ACLs (access control list) can be set to allow certain IPs or networks segments to have access to the vDisk.

Exposing a vDisk as iSCSI disk

Exposing a vDisk as iSCSI disk is really simple. Open the Open vStorage GUI and go to the vDisk page. In case the iSCSI framework plugin is installed an additional option becomes available on the detail page of the vDisk to expose the vDisk.

In the expose wizard you can indicate on which iSCSI target server the disk should be exposed. You also need to provide the Edge username and password credentials and optionally an ACL. That is it!

Final Notes

Currently only TGT is supported as iSCSI target server. But the iSCSI manager was designed as such that for example LIO could later be easily plugged in as replacement of TGT. One other item which is planned for the next iteration of the iSCSI manager is CHAP authentication to restrict iSCSI access to vDisks. Let us know if you think we missed some features. We are always open for suggestions.

SCOs , chunks & fragments

For frequent readers it is stating the obvious to say that ALBA is a complex piece of software. One of the most dark caves of the ALBA OCaml code is the one where SCOs, the objects coming from the Volume Driver, are split into objects. These objects are subsequently stored on the ASDs in the ALBA backend. It is time to clear up the mist around policies, SCOs, chunks and fragments as uncareful setting of these values might result in performance loss or an explosion of the backend metadata.

The fragment basics

Open vStorage uses an append-only strategy for data written to a volume. Once enough data is accumulated, the Volume Driver hands the log-file, a SCO (Storage Container Object), over to the ALBA proxy. This ALBA proxy is responsible for encrypting, compressing and erasure coding or replicating the SCOs based upon the selected preset. One important part of the preset is the policy (k, m, c, x). These 4 numbers can have a great influence on the performance of your Open vStorage cluster. But for starters, let’s first recap the meaning of these 4 numbers:

  • k: the amount of data fragments
  • m: the amount of parity fragments
  • c: the minimum number of fragments been written before the write is acknowledged
  • x: the maximum number of fragments per storage node

When c is lower than k+m, one or more slow responding ASDs won’t have impact on the write performance to the backend. The fragments which should have been stored on the slow ASD(s) will simply be rewritten at a later point in time by the maintenance process.

This was the easy part of how these numbers can influence the performance. Now comes the hard part. When you have a SCO of let’s say 64MB it is according to the policy split into k data objects and m parity objects. Assume k is set to 8 and hence we should end up with 8 objects of 8MB. There is however another (hidden) value which plays a role: the maximum fragment size. The fragment size does have an impact on the write performance as larger fragments tend to provide higher write bandwidth to the underlying hard disk. It is not a secret that traditional SATA disks love large pieces of consecutive data to write. But on the other hand, the bigger the fragments are, the less relevant they are to cache in the fragment cache and the longer it takes to read them from the backend in case of cache misses. To summarize, the size of the fragments should be big but not too big.

So to make sure fragments are not too big you can set a maximum fragment size. The default maximum fragment size is 4MB. As the fragment size in the example above was 8MB and the maximum fragment size for the backend is only 4MB something will need to happen: chunking. Chunking splits large SCOs into smaller chunks so the fragments of these chunks are smaller than the maximum fragment size. So in our example above the SCO will be split in smaller chunks. To calculate the amount of chunks needed, a simple formula can be used:

Amount of chunks = ROUNDUP(SCO size/min(k*maximum fragment size,SCO size))

In the our example we end up with 2 chunks – roundup(64/min(8*4,64). These 2 chunks are next erasure coded using the (k, m, c, x) policy. Basically you end up with 2 chunks of 8 4MB fragments and per chunk an additional m parity fragments.

Global Backends

So far we only covered the fragment basics so let’s make it a bit more complex by introducing stacked backends. Open vStorage allows multiple local backends to be combined into a global backend. This means there are 2 sets of fragments: the fragments at the global level and at the local level. Let’s continue with our previous example where we had 64MB SCOs and a 4MB fragment size. This means that the fragments which serve as input for the local backends are only 4MB. Assume that we also configure erasure coding with policy (k’, m’, c’, x’) at the local backend level. In that case each 4MB fragment will be split into another k’ fragments and m’ parity fragments. If k’ is for example set to 8, you will end up with 512KB fragments. There are 2 issues with this relatively small size of the fragments. The first issue was already outlined above. Traditional SATA drives are optimized for large chunks of consecutive data and 512KB is probably too small to reach the hard disks’ write bandwidth limit. This means we have suboptimal write performance. The second issue is related to the metadata size. Each object in the ALBA backend is referenced by metadata and in order to optimize the performance all metadata should be kept in RAM. Hence it is essential to keep the data/metadata ratio as high as possible in order to keep the required RAM to address the whole backend under control. In the above example with an (8, 2, c, x) policy for both the global and local backend we would end up with around 10KB of metadata for every 64MB SCO. With an optimal selection of the global policy (4,1, c, x) and a maximum fragment size of 16MB on the global backend, the metadata for the same SCO is only 5KB. This means that with the same amount of RAM reserved for the metadata, twice the amount of backend storage can be addressed. Next to storing the metadata in RAM, the metadata is also persistently store d on disk (NVMe, SSD) in an Arakoon cluster. By default Arakoon uses a 3-way replication scheme so with the optimized settings the metadata will occupy 6 time less disk space. The optimal global policy of (4,1, c, x) will, next to a lower memory footprint for the metadata, also provide better performance as 4MB fragments are written to the SATA drives instead of the smaller 512KB fragments.

Conclusion

Whatever you decide as ABLA backend policy, SCO size and maximum fragment size, choose wisely as these values have an impact on various aspects of the Open vstorage cluster ranging from performance to Total Cost of Ownership (TCO).

The Storage and Networking DNA.

Today I want to discuss a less technical but more visionary assertion: the concept that storage and networking share the same DNA. This is especially the case for large scale deployments. This insight surfaced as the rumor of Cisco buying Netapp roared again. Allow to me to explain why I believe exabyte storage clusters and large scale networks have a lot in common.

The parallels in storage and networking:

The first feature both networks and exabyte storage share is that they are highly scalable. Both topologies typically start small and grow overtime. Adding more capacity can be achieved seamlessly by adding more hardware to the cluster. This allows for a higher bandwidth, higher capacity and more users to be served.

Downtime is typically unacceptable for both and SLAs to ensure a multi-nine availability are common. To achieve this level of availability both rely on hyper-meshed, shared nothing architectures. These highly redundant architectures ensure that if one component fails another component takes over. To illustrate, switches typically are used in a redundant fashion as a single server is connected to 2 independent switches. If one switch fails the other one takes over. The same holds for storage. Data is also stored redundant. This could be achieved with replication or erasure coding across multiple disks and servers. If a disk or server would fail, data can still be retrieved from other disks and servers in the storage cluster.

These days you can check your Facebook timeline or Twitter account from almost anywhere in the world. Large scale networks allow users to have access from anywhere in the world. This global network spans across the globe and interlinks different smaller networks. The same holds for storage as we are moving to a world where data is stored in geographically dispersed places and even in different clouds.

With new technologies like Software-Defined Networking (SDN) network management has moved towards a Single point of Governance. Accordingly the physical network can be configured on a high level while the detailed network topology is pushed down to the physical and virtual devices that make up the network. The same trend is happening in the storage industry with Software-Defined Storage (SDS). These software applications allow to configure and manage the physical hardware in the storage cluster, even across multiple devices, sites and even different clouds through a single high-level management view.

A last point I’d like to touch is that for both networking and storage, the hardware brands and models hardly matter as they can all work together due to network standards. The same goes for storage hardware. Different brands of disks, controllers and servers can all be used to build an exabyte storage cluster. Users of the network are not aware of the exact topology of the network (brands, links, routing, …). The same holds for storage. The user shouldn’t know on which disk his data is stored exactly, the only thing he cares about is that he or she gets the right data on time when needed and it is safely stored.

Open vStorage, taking the network analogy to the next step

Let’s have a look at the components of a typical network. On the left we have the consumer of the network, in this case a server. This server is physically connected with the network through a Network Interface Controller (NIC). A NIC driver provides the necessary interfaces for the application on the server to use the network. Data which is sent down the network traverses the TCP-IP stack down to the NIC where data is converted into individual packets. Within the network various components play a specific role. A VPN provides encrypted tunnels, WAN accelerators provide caching and compression features, DNS services store the hierarchy of the network and switches/routers route and forward the packets to the right destination. The core-routers form the backbone of the network and connect multiple data centers and clouds.

Each of the above network components can be mapped to an equivalent in the Open vStorage architecture. The Open vStorage Edge offers a block interface to the applications (Hypervisors, Docker, …) on the server. Just like the TCP-IP stack converts the data into network packets, the Volume Driver converts the data received through the block interface into objects (Storage Container Objects). Next we have the proxy which takes up many roles: it encrypts the data for security, provides compression and routes the SCOs after chopping them down in fragments to the right backend. For reads the proxy also plays an important caching role by fetching the data from the correct cache backend. Lastly we have Arkoon, our own distributed key-value store, which stores the metadata of all data in the backend of the storage cluster. A backend consists out of SSDs and HDD in JBODs or traditional x86 servers. There can of course be multiple backends and they can even be spread across multiple data centers.

Conclusion

When reading the first alinea of this blog post it might have crossed your mind that I was crazy. I do hope that after reading through the whole post you realized that networking and storage have a lot in common. As a Product Manager I keep the path that networking has already covered in mind when thinking about the future of storage. How do you see the future of storage? Let me know!

Open vStorage High Availability (HA)

Last week I received an interesting question from a customer:

What about High-Availability (HA)? How does Open vStorage protect against failures?

This customer was right to ask that question. In case you run a large scale, multi-petabyte storage cluster, HA should be one of your key concerns. Downtime in such a cluster doesn’t only lead to production loss but might be a real PR disaster or even lead to foreclosure. When end-customers start leaving your service, it can become a slippery slope and before you are aware there is no customer left on your cluster. Hence, asking the HA question beforehand is a best practice for every storage engineer challenged with doing a due diligence of a new storage technology. Over the past few years we already devoted a lot of words to Open vStorage HA so I thought it was time for a summary.

In this blog post I will discuss the different HA scenarios starting from top (the edge) to bottom (the ASD).

The Edge

To start an Edge block device, you need to pass the IP and port of a Storage Router with the vPool of the vDisk. On initial connection the Storage Router will return to the Edge a list of fail-over Storage Routers. The Edge caches this information and switches automatically to another Storage Router in case it can’t communicate with the Storage Router for 15 seconds.
Periodically the Edge also asks the Storage Router to which Storage Router it should connect. This way the Storage Router can instruct the Edge to connect to another Storage Router, for example because the original Storage Router will be shut down.
For more details, check the following blog post about Edge HA.

The Storage Router

The Storage Router also has multiple HA features for the data path. As a vDisk can only be active and owned by a single Volume Driver, the block to object conversion process of the Storage Router, a mechanism is in place to make sure the ownership of the vDisks can be handed over (happy path) or stolen (unhappy path) by another Storage Router. Once the ownership is transferred the volume is started on the new Storage Router and IO requests can be processed. In case the old Storage Router would still try to write to the backend, fencing will kick in which prevents data to be stored on the backend.
The ALBA proxy is responsible for encrypting, compressing and erasure code the Storage Container Objects (SCOs) coming from the Volume Driver and sending the fragments to the ASD processes on the SSD/SATA disks. Each Storage Router also has multiple proxies and can switch between these proxies in cases of issues and timeouts.

The ALBA Backend

An ALBA backend typically consist out of a multiple physical disks across multiple servers. The proxies generate redundant parity fragments via erasure coding which are stored across all devices of the backend. As a result, a device or even a complete server failure doesn’t lead to data loss. On top, backends can be recursively composed. Let’s take as example the case where you have 3 data centers. One could create a (local) backend containing the disks of each data center and create a (global) backend on top of these these (local) backends. Data could for example be replicated 3 times, one copy in each data center, and erasure coded within the data center for storage efficiency. Using this approach a data center outage wouldn’t cause any data loss.

The management path HA

The previous sections of this blog post discussed the HA features of the data path. The management path is also high available. The GUI and API can be reached from all master nodes in the cluster. The metadata is also stored redundantly and is spread across multiple nodes or even data centers. Open vStorage has 2 types of metadata: the volume metadata and the backend metadata. The volume metadata is stored in a networked RocksDB using a master-slave concept. More information about that can be found here and in a video here.
The backend metadata is stored in our own, in-house developed, always consistent key-value store named Arakoon. More info on Arakoon can be found here.

That’s in a nutshell how Open vStorage makes sure a disk, server or data center disaster doesn’t lead to storage downtime.

Jobs, Jobs, Jobs, …

iNuron, the company behind Open vStorage, is growing rapidly. With more and more customers selecting Open vStorage and more and more multi-petabyte storage clusters being deployed, we are looking for more hands to help out. Currently we are looking for 2 profiles:

OPERATIONS ENGINEER

In case you are not afraid of large storage pools and you like to assist international, high profile customers with Proof of Concepts, then look no further as we have the ideal job for you: Operation Engineer. As part of our brilliant OPS team you will be responsible for keeping our large scale storage clusters up and running. Our engineering teams aren’t perfect so their code may lead to actual issues and customer support requests. As a preventive measure our OPS engineers are also responsible for developing, executing and maintaining software test plans and take part in the acceptance process of each new Open vStorage version.

Read the complete jobpost here.

SOFTWARE ENGINEER

Do you eat Python for breakfast and breath Javascript? In that case we have the ideal job for you as we are looking for additional software engineers for our framework team. Being part of our framework team means you will be responsible for the GUI, API and managing the different components that make up the whole Open vStorage cluster.

Read the complete jobpost here.

Docker and persistent Open vStorage volumes

Docker, the open-source container platform, is currently one of the hottest projects in the IT infrastructure business. With support of some of the world’s leading companies such as PayPal, Ebay, General Electric and many more, it is quickly becoming a cornerstone of any large deployment. Next, it also introduces a paradigm shift in how administrators see servers and applications.

Pets vs. cattle

In the past servers were treated like dogs and cats or any family pet: you give it a cute name, make sure it is in optimal condition, take care of it when it is sick, … With VMs a shift already occurred: names became more general like WebServer012 but keeping the VM healthy was still a priority for administrators. With Docker, VMs are decomposed into a sprawl of individual, clearly, well-defined applications. Sometimes there can even be multiple instances of the same application running at the same time. With thousands of containerized applications running on a single platform, it becomes impossible to treat these applications as pets but instead they are treated as cattle: they get an ID, when having issues they are taken off-line, terminated, and replaced.

Docker Storage

The original idea behind Docker was that containers would be stateless and hence didn’t need persistent storage. But over the years the insight has grown that also some applications and hence containers require persistent storage. Since the Docker platforms at large companies are housing thousands of containers, the required storage is also significant. Typically these platforms also span multiple locations or even clouds. Storage across locations and clouds is the sweet spot of the Open vStorage feature set. In order to offer distributed, persistent storage to containers, the Open vStorage team created a Docker plugin on top of the Open vStorage Edge, our lightweight block device. Note that the Docker plugin is part of the Open vStorage Enterprise Edition.

Open vStorage and Docker

Using Open vStorage to provision volumes for Docker is easy and straightforward thanks to Docker’s volume plugin system. To show how easy it is to create a volume for a container, I will give you the steps to run Minio, a minimal , open-source object store, on top of a vDisk.

First install the Open vStorage Docker plugin and the necessary packages on the compute host running Docker:
apt-get install libovsvolumedriver-ee blktap-openvstorage-ee-utils blktap-dkms volumedriver-ee-docker-plugin

Configure the configuration of the plugin by updating /etc/volumedriver-ee-docker-plugin/config.toml

[volumedriver]
hostname="IP"
port=26203
protocol="tcp"
username="root"
password="rooter"

Change the IP and port to the IP on which the vPool is exposed on the Storage Router you want to connect to (see Storage Router detail page).

Start the plugin service
systemctl start volumedriver-ee-docker-plugin.service

Create the Minio container and attach a disk for the data (minio_export) and one for the config (minio_config)

docker run --volume-driver=ovs-volumedriver-ee -p 9000:9000 --name minio \
-v minio_export:/export \
-v minio_config:/root/.minio \
minio/minio server /export

That is it. You now have a Minio object store running which stores its data on Open vStorage.

PS. Want to see more? Check the “Docker fun across Amazon Google and Packet.net”-video

The Open Core Model

It was Paul Dix, Founder and CTO of InfluxDB, that rocked the boat with his opening keynote at the last PerconaLive conference. His talk, titled “The Open Source Business Model is Under Siege”, discussed the existential struggle that open source software companies are facing. The talk is based on his experience building a viable business around open source over the last three and a half years with InfluxDB. You can see the full video here.

Infrastructure Software, a tough market …

Paul is right, building a viable open source company around infrastructure software is hard. Building a company around infrastructure tout court is hard these day. Need some examples? HPE buying storage unicorn Simplivty well below its recent valuation, Tintri doing an IPO as last option, Nutanix keeps piling up the losses quarter after quarter, RethinkDB and Basho shutting down and there are many more examples.

Open Core Model

I can offer only 1 advice for the above companies, It’s never too late to do the next right thing. And that next right thing was for Open vStorage moving away from a pure-play open source business model. Currently Open vStorage goes with the open core model. This means that we have a core distributed block storage project which is open source and free to use. But on the other hand we also have a closed source, commercial Enterprise Edition which adds more functionality to the core.
Maybe the term open core sounds a bit too pejorative. What we release as core is a fully functional distributed block storage platform. Deciding which feature ends up in the core and which in the Enterprise Edition is a difficult assessment. As rule of thumb, the core version should allow small clusters to be set up and operated without data loss and with decent performance. Even block storage clusters which span across multiple data centers can be set up with the core version. Enterprises which are looking to build their company (or part of it) on a service which couldn’t be built without the Open vStorage technology are gently steered towards the Enterprise Edition. These are typically well established, large enterprises which are looking to offer a new or better service to their customers. They also understand that one size doesn’t fit all and they want to be able to fiddle with all bells and whistles of Open vStorage. They want for example full control over which vDisk is using which part of the distributed cache. Or they want best in class performance and to achieve this they need features like the High Performance Read Mesh. Over time the list of ‘Enterprise Edition only’- features will grow. On the other hand nothing prevents us from moving features from the Enterprise Edition to the open source version down the line.

A final note

The open core model might offend some people. Yet, we aren’t the only one operating under an open core model. The open core business model is for example also used by Docker, MySQL, InfluxDB, MongoDB, Puppet, Midokura and many, many other software companies. It isn’t an easy business model as there is always discussion on which features to release as part of the open source project and which as part of the Enterprise Edition. But, we are confident that the open core model is the path forward. Not only for us but also for the whole software infrastructure market.

PS: Keep following our blog as over the next few weeks we will demonstrate the success of our open core business model with some extensive, multi petabyte, multi data center implementations.

Cache Policy Management: A Closer Look

Don’t you hate a noisy neighbour? Someone who blasts his preferred music just loud enough so you can hear it when trying to get some sleep or having a relaxing commute. Well the same goes for noisy neighbours in storage. It is not their deafening music that is annoying but the fact that other volumes can’t meet their desired performance as one volume gobbles up all IOPS.

Setting cache quota

This situation typically occurs when a single volume takes up the whole cache. In order to allow every vDisk to get a fair share of the cache, the Open vStorage Enterprise Edition allows to put a quota on the cache usage. When creating a vPool you can set a default quota per vDisk allowing each vDisk to get a fair share of the cache. Do note that the quota system is flexible. It is for example possible to set a larger value than the default for a specific vDisk in case it would benefit from more caching. It is even possible to oversubscribe the cache. This way the cache space can be optimally used.

Block and Fragment cache

One more point about cache management in Open vStorage. There are actually 2 types of cache which can be configured in Open vStorage. The first one caches complete fragments, the result of erasure coding a Storage Container Object (SCO). Hence it is called the fragment cache and it is typically used for newly written data. The stored fragments are typically large in size as to limit the amount of metadata and consequently these aren’t ideal to be used for (read) caching. The cache hit ratio is under normal circumstances inversely proportional to the size of the fragments. For that reason another cache, specifically tuned for read caching, was added. This block cache gets filled on reads and limits the size of the blocks in the cache to a couple of KB (f.e. 32-256KB). This means a more granular approach can be taken during cache eviction, eventually leading to a higher cache hit ratio.

The Open vStorage High Performance Read Mesh (HPRM)

When you are developing a storage solution your biggest worry is data loss. As an Open vStorage platform can lose a server or even a complete data center without actual data loss, we are pretty sure we have that base covered. The next challenge is to make sure that safely stored data can be quickly accessed when needed. In this blog section we already discussed a lot of the performance improvements we made over the past releases. We introduced the Edge component for guaranteed performance, the accelerated ALBA as read cache, multiple proxies per volume driver and various performance tuning options.

Today it is time to introduce the latest performance improvement: High Performance Read Mesh (HPRM). This HPRM is an optimization of the read path and allows the compute host to directly fetch the data from the drives where the data is located. Earlier the read path always had to go through the Volume Driver before the data was fetched from the ASD. This newly introduced short read path can only be taken in case the Edge has the necessary metadata of where (SCO, fragment, disk) each LBA’s data is stored. In case the Edge doesn’t have the needed metadata, for example because the cached metadata is outdated, the slow path is taken through the Volume Driver. For the write path nothing is changed as all writes go through the Volume Driver.

The short read path which bypasses the Volume Driver has 2 direct advantages: lower latency on reads and less network traffic as data only goes once over the network. Next, the introduction of the HPRM also allows for a cost reduction on the hardware front. Since the hosts running the Volume Driver are no longer in the read path in many cases, they are freed up and can focus on processing incoming writes. This means the ratio between compute hosts running the Edge and the Volume Driver can be increased. Since the Volume Driver hosts are typically beefy servers with expensive NVMe devices for the write buffer and the distributed databases, a significant change in the Compute/Volume Driver ratio means a significant reduction of the hardware cost.

HPRM, the technical details

Let’s have a look under the hood on how the HPRM works. First we will have a look at the write path. The application, f.e. the hypervisor, writes to the block device exposed by the Edge client. The Edge client will connect to its server part which in its turn, writes the data to the write buffer of the Volume Driver. Once enough writes are accumulated in the buffer, a SCO (Storage Container Object) is created and dispatched to the ALBA backend through the proxy. The proxy makes sure the data is spread across different ASDs according to the specified ALBA preset. Which ASDs contain the fragments of the SCO is stored in a manifest.
Once a read comes for the LBA, the Edge client will check its local metadata cache for the SCO info and manifest of the SCO. If the info is available the Edge will get the LBA data through the PRACC (Partial Read ACCelerator) client which can directly fetch the data from the ASDs. If the info isn’t available in the cache or if it is outdated, the manifest and SCO info are retrieved by the Edge client from the Volume Driver and stored in the Edge metadata cache.
The Edge also pushes the IO statistics to the Volume Driver so these can be queried by the Framework or the monitoring components. Gathering IO statistics is done by the Edge as it is the only component that has a view on both the fast path, through the PRACC, and the slow path through the Volume Driver.


Note that the High Performance Read Mesh is part of the Open vStorage Enterprise Edition. Contact us for more info on the Open vStorage Enterprise Edition.

Connecting Open vStorage with Amazon

In an earlier blog post we already discussed that Open vStorage is the storage solution to implement a hybrid cloud. In this blog post we will explain the technical details on how Open vStorage can be used in a hybrid cloud context.

The components

For frequent readers of this blog the different Open vStorage components should not hold any secrets anymore. For newcomers we will give a short overview of the different components:

  • The Edge: a lightweight software component which exposes a block device API and connects across the network to the Volume Driver.
  • The Volume Driver: a log structured volume manager which converts blocks into objects.
  • The ALBA Backend: an object store optimized as backend for the Volume Driver.

Let’s see how these components fit together in a hybrid cloud context.

The architecture

The 2 main components of any hybrid cloud are an on-site, private part and a public part. Key in a hybrid cloud is that data and compute can move between the private and the public part as needed. As part of this thought exercise we take the example where we want to store data on premises in our private cloud and burst with compute into the public cloud when needed. To achieve this we need to install the components as follows:

The Private Cloud part
In the private cloud we install the ALBA backend components to create one or more storage pools. All SATA disks are gathered in a capacity backend while the SSD devices are gathered in a performance backend which accelerates the capacity backend. On top of these storage pools we will deploy one or more vPools. To achieve this we run a couple of Volume Driver instances inside our private cloud. On-site compute nodes with the Edge component installed can use these Volume Drivers to store data on the capacity backend.

The Public Cloud part
For the Public Cloud part, let’s assume we use Amazon AWS, there are multiple options depending on the desired performance. In case we don’t require a lot of performance we can use an Amazon EC2 instance with KVM and the Edge installed. To bring a vDisk live in Amazon, a connection is made across the internet With the Volume Driver in the private cloud. Alternatively an AWS Direct Connect link can be used for a lower latency connection. Writes to Vdisk which is exposed in Amazon will be sent by the Edge to the write buffer of the Volume Driver. This means that writes will only be acknowledged to the application using the vDisk once the on premises located write buffer has received the data. Since the Edge and the Volume Driver connect over a rather high latency link, the write performance isn’t optimal in this case.
In case more performance is required we need an additional Storage Optimized EC2 instance with one or more NVMe SSDs. In this second EC2 instance a Volume Driver instance is installed and the vPool is extended from the on-site, private cloud into Amazon. The NVMe devices of the EC2 instance are used to store the write buffer and the metadata DBs. It is of course possible to add some more EBS Provisioned IOPS SSDs to the EC2 instance as read cache. For an even better performance, use dedicated Open vStorage powered cache nodes in Amazon. Since the write buffer is located in Amazon the latency will be substantially lower than in the first setup.

Use cases

As last part of this blog post we want to discuss some use cases which can be deployed on top of this hybrid cloud.

Analytics
Note that based upon the above architecture, a vDisk in the private cloud can be cloned into Amazon. The cloned vDisk can be used for business analytics inside Amazon without impacting the live workloads. When the analytics query is finished, the clone can be removed. The other way around is of course also possible. In that case the application data is stored in Amazon while the business analytics run on on-site compute hardware.

Disaster Recovery
Another use case is disaster recovery. As disaster recovery requires data to be on premises but also in the cloud additional instance need to be added with a large amount of HDD disks. Replication or erasure coding can be used to spread the data across the private and public cloud. In case of a disaster where the private cloud is destroyed, one can just add more compute instances running the Edge to bring the workloads live in the public cloud.

Data Safety
A last use case we want to highlight is for users that want to use public clouds but don’t thrust these public cloud providers with all of their data. In that case you need to get some instances in each public cloud which are optimized for storing data. Erasure coding is used to chop the data in encrypted fragments. These fragments are spread across the public clouds in such a way that non of the public clouds store the complete data set while the Edges and the Volume Drivers still can see the whole data set.