Don’t you hate a noisy neighbour? Someone who blasts his preferred music just loud enough so you can hear it when trying to get some sleep or having a relaxing commute. Well the same goes for noisy neighbours in storage. It is not their deafening music that is annoying but the fact that other volumes can’t meet their desired performance as one volume gobbles up all IOPS.
Setting cache quota
This situation typically occurs when a single volume takes up the whole cache. In order to allow every vDisk to get a fair share of the cache, the Open vStorage Enterprise Edition allows to put a quota on the cache usage. When creating a vPool you can set a default quota per vDisk allowing each vDisk to get a fair share of the cache. Do note that the quota system is flexible. It is for example possible to set a larger value than the default for a specific vDisk in case it would benefit from more caching. It is even possible to oversubscribe the cache. This way the cache space can be optimally used.
Block and Fragment cache
One more point about cache management in Open vStorage. There are actually 2 types of cache which can be configured in Open vStorage. The first one caches complete fragments, the result of erasure coding a Storage Container Object (SCO). Hence it is called the fragment cache and it is typically used for newly written data. The stored fragments are typically large in size as to limit the amount of metadata and consequently these aren’t ideal to be used for (read) caching. The cache hit ratio is under normal circumstances inversely proportional to the size of the fragments. For that reason another cache, specifically tuned for read caching, was added. This block cache gets filled on reads and limits the size of the blocks in the cache to a couple of KB (f.e. 32-256KB). This means a more granular approach can be taken during cache eviction, eventually leading to a higher cache hit ratio.
When you are developing a storage solution your biggest worry is data loss. As an Open vStorage platform can lose a server or even a complete data center without actual data loss, we are pretty sure we have that base covered. The next challenge is to make sure that safely stored data can be quickly accessed when needed. In this blog section we already discussed a lot of the performance improvements we made over the past releases. We introduced the Edge component for guaranteed performance, the accelerated ALBA as read cache, multiple proxies per volume driver and various performance tuning options.
Today it is time to introduce the latest performance improvement: High Performance Read Mesh (HPRM). This HPRM is an optimization of the read path and allows the compute host to directly fetch the data from the drives where the data is located. Earlier the read path always had to go through the Volume Driver before the data was fetched from the ASD. This newly introduced short read path can only be taken in case the Edge has the necessary metadata of where (SCO, fragment, disk) each LBA’s data is stored. In case the Edge doesn’t have the needed metadata, for example because the cached metadata is outdated, the slow path is taken through the Volume Driver. For the write path nothing is changed as all writes go through the Volume Driver.
The short read path which bypasses the Volume Driver has 2 direct advantages: lower latency on reads and less network traffic as data only goes once over the network. Next, the introduction of the HPRM also allows for a cost reduction on the hardware front. Since the hosts running the Volume Driver are no longer in the read path in many cases, they are freed up and can focus on processing incoming writes. This means the ratio between compute hosts running the Edge and the Volume Driver can be increased. Since the Volume Driver hosts are typically beefy servers with expensive NVMe devices for the write buffer and the distributed databases, a significant change in the Compute/Volume Driver ratio means a significant reduction of the hardware cost.
HPRM, the technical details
Let’s have a look under the hood on how the HPRM works. First we will have a look at the write path. The application, f.e. the hypervisor, writes to the block device exposed by the Edge client. The Edge client will connect to its server part which in its turn, writes the data to the write buffer of the Volume Driver. Once enough writes are accumulated in the buffer, a SCO (Storage Container Object) is created and dispatched to the ALBA backend through the proxy. The proxy makes sure the data is spread across different ASDs according to the specified ALBA preset. Which ASDs contain the fragments of the SCO is stored in a manifest.
Once a read comes for the LBA, the Edge client will check its local metadata cache for the SCO info and manifest of the SCO. If the info is available the Edge will get the LBA data through the PRACC (Partial Read ACCelerator) client which can directly fetch the data from the ASDs. If the info isn’t available in the cache or if it is outdated, the manifest and SCO info are retrieved by the Edge client from the Volume Driver and stored in the Edge metadata cache.
The Edge also pushes the IO statistics to the Volume Driver so these can be queried by the Framework or the monitoring components. Gathering IO statistics is done by the Edge as it is the only component that has a view on both the fast path, through the PRACC, and the slow path through the Volume Driver.
Note that the High Performance Read Mesh is part of the Open vStorage Enterprise Edition. Contact us for more info on the Open vStorage Enterprise Edition.