Fargo GA

After 3 Release Candidates and extensive testing, the Open vStorage team is proud to announce the GA (General Availability) release of Fargo. This release is packed with new features. Allow us to give a small overview:

NC-ECC presets (global and local policies)

NC-ECC (Network Connected-Error Correction Code) is an algorithm to store Storage Container Objects (SCOs) safely in multiple data centers. It consists out of a global, across data center, preset and multiple local, within a single data center, presets. The NC-ECC algorithm is based on forward error correction codes and is further optimized for usage with a multi data center approach. When there is a disk or node failure, additional chunks will be created using only data from within the same data center. This ensures the bandwidth between data centers isn’t stressed in case of a simple disk failure.

Multi-level ALBA

The ALBA backend now supports different levels. An all SSD ALBA backend can be used as performance layer in front of the capacity tier. Data is removed from the cache layer using a random eviction or Least Recently Used (LRU) strategy.

Open vStorage Edge

The Open vStorage Edge is a lightweight block driver which can be installed on Linux hosts and connect with the Volume Driver over the network (TCP-IP). By creating different components for the Volume Driver and the Edge compute and storage can scale independently.

Performance optimized Volume Driver

By limiting the size of a volume’s metadata, the metadata now fits completely in RAM. To keep the metadata at an absolute minimum, deduplication was removed. You can read more about why we removed deduplication here. Other optimizations are multiple proxies per Volume Driver (the default amount is 2), bypassing the proxy and go straight from the Volume Driver to the ASD in case of partial reads, local read preference in case of global backends (try to read from ASDs in the same data center instead of going over the network to another data center).

Multiple ASDs per device

For low latency devices adding multiple ASDs per device provides a higher bandwidth to the device.

Distributed Config Management

When you are managing large clusters, keeping the configuration of every system up to date can be quite a challenge. With Fargo all config files are now stored in a distributed config management system on top of our distributed database, Arakoon. More info can be found here.

Ubuntu 16.04

Open vStorage is now supported on Ubuntu 16.04, the latest Long Term Support (LTS) version of Ubuntu.

Smaller features in Fargo:

  • Improved the speed of the non-cached API and GUI queries by a factor 10 to 30.
  • Hardening the remove node procedure.
  • The GUI is adjusted to better highlight clusters which are spread across multiple sites.
  • The failure domain concept has been replaced by tag based domains. ASD nodes and storage routers can now be tagged with one or more tags. Tags can be used to identify a rack, site, power feed, etc.
  • 64TB volumes.
  • Browsable API with Swagger.
  • ‘asd-manager collect logs’ identical to the ‘ovs collect logs’.
  • Support for the removal of the ads-manager packages.

Since this Fargo release introduces a completely new architecture (you can read more about it here) there is no upgrade possible between Eugene and Fargo. The full release notes can be found here.

Fargo RC3

We released Fargo RC3 . This release focusses on bugfixing (13 bugs fixed) and stability.

Some items where also added to improve the supportability of an Open vStorage cluster:

  • Improved the speed of the non-cached API and GUI queries by a factor 10 to 30.
  • It is now possible to add more NSM clusters to store the data for a backend through an API instead of doing it manually.
  • Blocking to set a clone as template.
  • Hardening the remove node procedure.
  • Removed ETCD support for the config management as it was no longer maintained.
  • Added an indicator in the GUI which displays when a domain is set as recovery domain and not as primary anywhere in the cluster.
  • Support for the removal of the ASD manager.
  • Added a call to list the manually started jobs (f.e. verify namespace) on ALBA.
  • Added a timestamp to list-asds so it can be tracked how long an ASD is already part of the backend.
  • Removed the Volume Driver testing by creating a new volume in the Health Check as it created too many false positives to be used reliable.

Fargo RC2

We released Fargo RC2 . Biggest new items in this release:

  • Multiple performance improvements such as multiple proxies per volume driver (the default amount is 2), bypassing the proxy and go straight from the volume driver to the ASD in case of partial reads, local read preference in case of global backends (try to read from ASDs in the same datacenter instead of going over the network to another datacenter).
  • API to limit the amount of data that gets loaded into the memory of the volume driver host. Instead of loading all metadata ofa vdisk into RAM, you can now specify the % it can take in RAM.
  • Counter which keeps track of the amount of invalid checksum per ASD so we can flag bad ASDs faster.
  • Configuring the scub proxy to be cache on write.
  • Implemented timeouts for the volume driver calls.

The team also solved 110 issues between RC1 and RC2. An overview of the complete content can be found here: Added Features | Added Improvements | Solved Bugs

Open vStorage Releases

release-managementSince Open vStorage is running in production at customers we need to carefully plan our releases as a small glitch might cause a disaster. For storage software there is a golden rule

If it ain’t broken, don’t fix it!

With the release of Fargo RC1 we are entering a new cycle of intermediate releases and bugfixes. Once Fargo is GA we will push out a new update at regular intervals. Before installing an update customers like to know what is exactly fixed in a certain update. That is why for each release, even an intermediate release, the release notes are documented. Let’s take as an example the Fargo Release Candidate 1. This release consists out of following packages:

The content of each package e.g. the webapps package can be found on the appropriate repository (or you can click the link in the release notes). The release notes of the package contain a summary of all fixed issues in that exact package. In case you want to be kept up to date of new releases, add the the release page as RSS feed (https://github.com/openvstorage/home/releases.atom) to your favourite RSS Feed reader. If you prefer to be kept up to date by email, you can use Sibbell, Blogtrottr or a similar service.

Eugene Release

To start the new year with a bang, the Open vStorage Team is proud to release Eugene:

The highlights of this release are:

Policy Update
Open vStorage enables you to actively add, remove and update policies for specific ALBA backend presets. Updating active policies might result in Open vStorage to automatically rewrite data fragments.

ALBA Backend Encryption
When configuring a backend presets, AES-256 encryption algorithms can be selected.

Failure Domain
A Failure Domain is a logical grouping of Storage Routers. The Distributed Transaction Log (DTL) and MetaDataServer (MDS) for Storage Router groups can be defined in the same Failure Domain or in a Backup Failure domain. When the DTL and MDS are defined in a Backup Failure Domain, data loss in case of a non-functioning Failure Domain is prevented. Defining the DTL and MDS in a backup Failure Domain requires low latency network connections.

Distributed Scrubber
Snapshots which are out of retention period are indicated as garbage and removed by the Scrubber. With the Distributed Scrubber functionality you can now decide to run the actual scrubbing process away from the host that holds the volume. This way, hosts that are running Virtual Machines do not experience any performance hit when the snapshots of those Virtual Machines are scrubbed.

Scrubbing Parent vDisks
Open vStorage allows to create clones of vDisks. The maximal depth of the clone tree is limited to 255. When a clone is created, scrubbing is still applied to the actual parent of the clone.

New API calls
Following API’s are added:

  • vDisk templates (set and create from template)
  • Create a vDisk (name, size)
  • Clone a vMachine from a vMachine Snapshot
  • Delete a vDisk
  • Delete a vMachine snapshot

These API calls are not exposed in the GUI.

Removal of the community restrictions
The ALBA backend is no longer restricted and you are no longer required to apply for a community license to use ALBA. The cluster needs to be registered within 30 days otherwise the GUI will stop working until the cluster is registered.

Remove Node
Open vStorage allows for nodes to be removed from the Open vStorage Cluster. With this functionality you can remove any node and scale your storage cluster along with your changing storage requirements. Both active and broken nodes can be consistently removed from the cluster.

Some smaller Feature Requests were added also:

  • Removal of the GCC dependency.
  • Option to label a manual snapshot as ‘sticky’ so it doesn’t get removed by the automated snapshot cleanup.
  • Allow stealing of a volume when no Hypervisor Management Center is configured and the node rowning the volume is down.
  • Set_config_params for vDisk no longer requires the old config.
  • Automatically reconfigure the DTL when DTL is degraded.
  • Automatic triggering of a repair job when an ASD is down for 15 minutes.
  • ALBA is independent of broadcasting.
  • Encryption of the ALBA socket communication.
  • New Arakoon client (pyarakoon).

Following are the most important bug fixes in the Eugene release:

  • Fix for various issues when first node is down.
  • “An error occurred while configuring the partition” while trying to assign DB role to a partition.
  • Cached list not updated correctly.
  • Celery workers are unable to start.
  • Nvme drives are not correctly detected.
  • Volume restart fails due to failure while clearing the DTL.
  • Arakoon configs not correct on 4th node.
  • Bad MDS Slave placement.
  • DB role is required on every node running a vPool but isn’t mandatory.
  • Exception in tick crashes the ovs-scheduled-task service.
  • OVS-extensions is very chatty.
  • Voldrv python client hangs if node1 is down.
  • Bad logic to decide when to create extra alba namespace hosts.
  • Timeout of CHAINED ensure single decorator is not high enough.
  • Possibly wrong master selected during “ovs setup demote”.
  • Possible race condition when adding nodes to cluster.

Welcome Chicago, the windy release

It has been a while since the Open vStorage team released an Open vStorage version. From now Open vStorage releases will carry a name instead of a number. The latest release is called Chicago. The nickname of Chicago, the windy city, is appropriate for this release as with the new cache tuning features Open vStorage is velocious. Just to say that this release has some long anticipated features:

Flex SSD layout:
We have been working towards this feature since May but finally it is ready. You now have the flexibility to configure how you want to use you SATA, SSDS and PCIe cards through the GUI. For example you can use the SATA drives in your ALBA Backend, the SSDs as read cache devices and the PCIe card to accelerate the writes. To achieve this, every Storage Router displays the detected drives (SATA, SSD, …) and allows to assign roles to the drives. Currently there are 4 roles:

  • DB: The DB role stores the distributed database and metadata of the volumes. The DB role must be assigned to an SSD. This will reserve 10% of the SSD for the distributed database. All storage Routers must have at least one disk with the DB role.
  • Scrub: The scrubber is the application which does the garbage collection of snapshot data which is out of the retention. This will reserve 300 GB of space. All storage Routers must have at least one disk with the scrubbing role.
  • Read: This will allow to use the disk as read cache.
  • Write: This will allow to use the disk as write cache.

You can now assign a part of an SSD to be used as read or/and write cache by a specific vPool. To assign a role to a disk, click the gear icon and select the appropriate role from the dropdown.

Tune the cache:
Open vStorage now exposes a whole set of caching parameters for a vDisk. You can configure if you want to use the Distributed Transaction Log (DTL), whether to Cache on Read or Write, if a volume should use the deduped cache or not and what the SCO size and amount of outstanding data in the write buffer can be before throttling occurs. You can even set how much each individual vDisk may consume of the read cache. These options makes Open vStorage the most configurable storage solution in the field.
2015-09-30_17-06-13

The DTL over RDMA:
The Distributed Transaction Log (DTL) is making sure that you don’t have data loss when a host goes down by storing incoming writes also on another host. With the Chicago release you can use low-latency RDMA technology to increase the performance of the DTL.

Move to GitHub:
The source of Open vStorage can from now on be found on GitHub. In a nutshell, Open vStorage has had 3,623 commits made by 33 contributors representing 113,455 lines of code and took an estimated 29 years of effort (COCOMO model). How cool is that!

Open vStorage 2.2 alpha 6

Today we released Open vStorage 2.2 alpha 6. Like the alpha 4 this is a bugfix release.

Bugs that were fixed as part of alpha 6:

  • If required, Open vStorage updates the Cinder driver to be compatible with the installed Open vStorage version.
  • Various issue where the ALBA proxy might get stuck.
  • Issue where the ALBA proxy fails to add fragments to the fragment cache.
  • Arakoon cluster remains masterless in some cases.
  • Various issues with the updater.
  • Remove vPool leaves MDS files.
  • An xml file visible through FUSE isn’t accessible but it is available on the backend.
  • While the extending vPool task is running, the vPool can’t be added or removed from other Storage Routers in the GUI.
  • Timeout for backend syncs and migration is added to ensure more graceful handling of live migration failures.

Open vStorage 2.2 alpha 4

We released Open vStorage 2.2 Alpha 4 which contains following bugfixes:

  • Update of the About section under Administration.
  • Open vStorage Backend detail page hangs in some cases.
  • Various bugfixes for the use case when adding a vPool with a vPool name which was previously used.
  • Hardening the vPool removal.
  • Fix daily scrubbing not running.
  • No log output from the scrubber.
  • Failing to create a vDisk from a snapshot tries to delete the snapshot.
  • ALBA discovery starts spinning if network is not available.
  • ASD is no longer used by the proxy even after it has been requalified.
  • Type checking through Descriptor doesn’t work consistently.

Open vStorage 2.2 alpha 3

Today we released Open vStorage 2.2 alpha 3. The only new features are on the Open vStorage Backend (ALBA) front:

  • Metadata is now stored with a higher protection level.
  • The protocol of the ASD is now more flexible in the light of future changes.

Bugfixes:

  • Make it mandatory to configure both read- and writecache during the ovs setup partitioner.
  • During add_vpool on devstack, the cinder.conf is updated with notification_driver which is incorrectly set as “nova.openstack.common.notifier.rpc_notifier” for Juno.
  • Added support for more physical disk configuration layouts.
  • ClusterNotReachableException during vPool changes.
  • Cannot extend vPool with volumes running.
  • Update button clickable when an update is ongoing.
  • Already configured storage nodes are now removed from the discovered ones.
  • Fix for ASDs which don’t start.
  • Issue where a slow long-running task could fail because of a timeout.
  • Message delivery from albamgr to nsm_host can get stuck.
  • Fix for ALBA Namespace doesn’t exists while it exists.