Fargo GA

After 3 Release Candidates and extensive testing, the Open vStorage team is proud to announce the GA (General Availability) release of Fargo. This release is packed with new features. Allow us to give a small overview:

NC-ECC presets (global and local policies)

NC-ECC (Network Connected-Error Correction Code) is an algorithm to store Storage Container Objects (SCOs) safely in multiple data centers. It consists out of a global, across data center, preset and multiple local, within a single data center, presets. The NC-ECC algorithm is based on forward error correction codes and is further optimized for usage with a multi data center approach. When there is a disk or node failure, additional chunks will be created using only data from within the same data center. This ensures the bandwidth between data centers isn’t stressed in case of a simple disk failure.

Multi-level ALBA

The ALBA backend now supports different levels. An all SSD ALBA backend can be used as performance layer in front of the capacity tier. Data is removed from the cache layer using a random eviction or Least Recently Used (LRU) strategy.

Open vStorage Edge

The Open vStorage Edge is a lightweight block driver which can be installed on Linux hosts and connect with the Volume Driver over the network (TCP-IP). By creating different components for the Volume Driver and the Edge compute and storage can scale independently.

Performance optimized Volume Driver

By limiting the size of a volume’s metadata, the metadata now fits completely in RAM. To keep the metadata at an absolute minimum, deduplication was removed. You can read more about why we removed deduplication here. Other optimizations are multiple proxies per Volume Driver (the default amount is 2), bypassing the proxy and go straight from the Volume Driver to the ASD in case of partial reads, local read preference in case of global backends (try to read from ASDs in the same data center instead of going over the network to another data center).

Multiple ASDs per device

For low latency devices adding multiple ASDs per device provides a higher bandwidth to the device.

Distributed Config Management

When you are managing large clusters, keeping the configuration of every system up to date can be quite a challenge. With Fargo all config files are now stored in a distributed config management system on top of our distributed database, Arakoon. More info can be found here.

Ubuntu 16.04

Open vStorage is now supported on Ubuntu 16.04, the latest Long Term Support (LTS) version of Ubuntu.

Smaller features in Fargo:

  • Improved the speed of the non-cached API and GUI queries by a factor 10 to 30.
  • Hardening the remove node procedure.
  • The GUI is adjusted to better highlight clusters which are spread across multiple sites.
  • The failure domain concept has been replaced by tag based domains. ASD nodes and storage routers can now be tagged with one or more tags. Tags can be used to identify a rack, site, power feed, etc.
  • 64TB volumes.
  • Browsable API with Swagger.
  • ‘asd-manager collect logs’ identical to the ‘ovs collect logs’.
  • Support for the removal of the ads-manager packages.

Since this Fargo release introduces a completely new architecture (you can read more about it here) there is no upgrade possible between Eugene and Fargo. The full release notes can be found here.

Fargo: the updated Open vStorage Architecture

With the Fargo release of Open vStorage we are focussing even more on the Open vStorage sweet spot: multi-petabyte, multi-datacenter storage clusters which offer super-fast block storage.
In order to achieve this we had to significantly change the architecture for the Fargo release. Eugene, the version before Fargo, already had the Shared Memory Server (SHM) in its code base but its wasn’t activated by default. The Fargo release now primarily uses the SHM approach. To make even more use of it, we created the Open vStorage Edge. The Edge is a lightweight block storage driver which can be installed on Linux servers (hosts running the hypervisor or inside the VM) and talks across the network to the Shared Memory of a remote Volume Driver. Both TCP/IP and the low latency RDMA protocol can be used to connect the Edge with the Volume Driver. Northbound the Edge has an iSCSI, Blktap and QEMU interface. Additional interfaces such as iSER and FCoE are planned. Next to the new Edge interface, the slower Virtual Machine interface which exposes a Virtual File System (NFS, FUSE), is still supported.

Architecture

The Volume Driver has also been optimized for performance. The locks in the write path have been revised in order to minimize their impact. More radical is the decision to remove the deduplication functionality from the Volume Driver in order to keep the size of the metadata of the volumes to a strict minimum. By removing the bytes reserved for the hash, we are capable of keeping all the metadata in RAM and push the performance across 1 million IOPS per host on decent hardware. For those who absolutely need deduplication there is still a version available of the Volume Driver which has support for deduplication.

With the breakthrough of RDMA, the network bottleneck is removed and network latency is brought down to a couple of microseconds. Open vStorage makes use of the possibilities RDMA offers to implement a shared cache layer. To achieve this it is now possible to create an ALBA backend out of NVMe or SSD devices. This layer acts as a local, within a single datacenter, cache layer in front of an SATA ALBA backend, the capacity tier, which is spread across multiple datacenters.
This means all SSDs in a single datacenter devise a shared cache for the data of that datacenter. This minimizes the impact of an SSD failure and removes the cold cache effect when moving a volume between hosts. In order to minimize the impact of a single disk failure we introduced the NC-ECC (Network and Clustered Error Correction Codes) algorithm. This algorithm can be compared with solving a Sudoku puzzle. Each SCO, a collection of consecutive writes, is chopped up in chunks. All these chunks are distributed across all the nodes and datacenters in the cluster. The total amount of chunks can be configured but allows for example to recover from a multi node failure or a complete datacenter loss. A failure, whether it is a disk, node or datacenter will cross out some numbers from the complete Sudoku puzzle but as long as you have enough numbers left, you can still solve the puzzle. The same goes for data stored with Open vStorage: as long as you have enough chunks (disk, nodes or datacenters) left, you can always recover the data. The NC-ECC algorithm is based on forward error correction codes and is further optimized for usage within a multi-datacenter approach. When there is a disk or node failure, additional chunks will be created using only data from within the same datacenter. This ensures the bandwidth between datacenters isn’t stressed in case of a simple disk failure.

By splitting up the Edge, the Volume Driver, the cache layer and the capacity tier, you have the ultimate flexibility to build the storage cluster of your needs. You can run everything on the same server, hyperconverged, or you can install each component on a dedicated server to maximize scalability and performance.

The first alpha version of Fargo is now available on the repo.