Hamburgers, french fries and hyperconvergence

hamburgerDuring the first Open vStorage roadshow in the US, I noticed people have a lot of questions about convergence and hyperconvergence:

Can you help me with the term “hyperconverged”? I believe it is a marketing buzzword, but it is something that my executives have glommed onto.

While I was waiting to fly back home, I was eating a burger and french fries. Let’s be honest, the US has the best places to eat burgers but while eating and staring at the planes, I suddenly had an epiphany on how to explain convergence, hyperconvergence and how Open vStorage is related: burgers and french fries.

Let’s say that hamburgers are the compute (RAM, CPU, the host where the VM’s are running), french fries are the storage and let barbecue sauce be the storage performance. In that case a converged solution is like ordering a hamburger menu. One SKU will get you a plate with a hamburger and french fries on the side. You even have different menus with smaller or bigger hamburgers and more or less french fries. When you order a ‘converged burger’ the barbecue sauce will be on the french fries (SSDs inside the SAN). It works but it is not ideal. With a ‘hyperconverged burger’, instead of receiving french fries separately, you will receive a single hamburger with french fries and barbecue sauce as topping of the burger. Allow me to explain. With a hyperconverged appliance both the compute (hamburger), the Tier 1 (barbecue sauce) and Tier 2 (french fries) storage are inside the same appliance. Open vStorage is none of the previous. With Open vStorage, the hamburger will be topped with barbecue sauce (compute and Tier 1 inside the same host) but you get the french fries on the side.

As said, Open vStorage should not be used as a hyperconverged solution like Nutanix or Simplivity. The Open vStorage software allows to be used like that but we at CloudFounders don’t believe hyperconverged is the right way to build scalable solutions. We believe a converged solution with Tier 1 inside the compute, let’s call it flexi-converged, is a much better fit for multiple reasons:

  • Storage growth: typically storage needs grows 3 times faster than CPU needs. So adding more compute (CPU & RAM, hypervisors) just because you need more Tier 2 backend storage is just throwing away money. If you go to a hamburger restaurant and you want more french fries, you just order another portion of fries. It just doesn’t make sense to order another hamburger (with french fries as topping) if you only want french fries.
  • Storage performance: since a hyperconverged appliance only has a limited amount of bays, you have to decide between adding an SSD or a SATA drive in a bay. You need the SSDs for performance so that limits the available bays for capacity optimized SATA disks. A hyperconverged appliance makes a trade-off between storage performance (more flash) and storage capacity (more SATA). As a result you end up with appliances costing $180,000! which can run 100 Virtual Machines but can store only a total amount of 5TB (15TB raw) worth of data. Due to the 3-way replication, storing all data 3 times for redundancy reasons, the balance is completely off: each Virtual Machine can only have 50GB of data! What you want is to be able to scale both storage capacity and storage performance independently. Let’s make it clearer. When you order the ‘hyperconverged burger’ you get a burger with barbecue sauce and french fries on top of the burger. Since every burger has a certain size, there is a limit to the amount of french fries and barbecue sauce you can add as topping to the burger. If you want more french fries, you will have to cut back on the barbecue sauce. It is as simple as that. With Open vStorage, the french fries are on the side so you can order as many additional portions as needed. With the Seagate Kinetic integration you can simply add the additional drives to your pool of Tier 2 backend storage, et voilà, you have more space for Virtual Machine data without having to sacrifice storage performance.
  • Performance of the backend: when implementing a Tiered architecture, you don’t want your Tier 2 storage layer (‘the cold storage’) to limit the performance of your Tier 1 layer (‘the caching layer’). The Tier 1 is expensive and optimized for storage performance by using SSDs or PCIe flash so it is a big issue if the speed of the Tier 2 storage becomes a bottleneck to digest data coming from Tier 1. The performance of the Tier 2 storage is determined by the amount of disks and their speed. This is why you see hyperconverged models using two 1TB disks instead of a single 2TB disk. They need the spindles in the backend to make sure the Tier 1 caching layer isn’t impacted by a choking backend. This is a real issue. At CloudFounders we have had situations in the past where had to add disks to the backend just to make sure it could digest what is coming from the cache. Let’s do the math to explain the issue in more detail. Your Tier 1 can easily do 50-70K IOPS of 4K blocks. Let’s assume that this is a mix: 20K write IOPS and 50K read IOPS. The SSD/PCIe flash card will take the first hit for these 20K write IOPS (which is a piece of cake for a flash technology) but once data is evicted from that SSD it needs to go to the backend. Storage solutions will typically do some aggregation of those 4K writes into bigger chunks (Nutanix creates 1MB (4k*250) chunks, Open vStorage accumulates 1000 writes into objects of 4MB) to minimize the backend traffic. So Nutanix needs to store in the optimal scenario 80 IOPS (20K/250) to the backend. This is the optimal scenario as they don’t work with a append-style log but we will devote another blog post to this. Nutanix uses 3 way replication so 80 IOPS become 240 IOPS across multiple disks. These disks contain a file system so there is some additional IO overhead as each hop in the directory structure is another IO. Let’s assume for the sake of simplicity that we only have to go down 1 directory but it could be more hops. So in total to store the 80 IOPS coming out of the cache , you need at least 480 backend IOPS to store it on disk. A normal SATA disk does 90 IOPS so you see that these backend disks become a bottleneck real quickly. In our simple use case we would at least need 6 drives to make sure we can accommodate the data coming from the cache. If among the read IO, which we didn’t take into account, there is also cache misses, those 6 drives will not be enough. It is really painful and costly to add additional SATA disks to your backend which is only 20% full just to make sure you have the spindles to accommodate the data coming from your Tier 1. This is also why Open vStorage likes the Seagate Kinetic drives. The Kinetic drives don’t have a file system so for Open vStorage they are an IOPS saver. If you take the same amount of SATA drives and Seagate Kinetic drives, the Kinetic drives will outperform the SATA drives in our use case. Although Open vStorage supports Ceph and Swift, which use a file system on their OSD, that is why we prefer the Kinetic drives as they provide better performance for the same amount of drives. The Seagate Kinetic drives really are a valuable asset to our portfolio of supported backends.
  • Replace broken disks: the trend is to replace bigger chunks of hardware when they fail. Google, a company hyperconverged solutions like to refer to, has been doing it for years. They no longer care about a broken disk and replace complete servers. Those big storage arrays are made to leave dead disks behind and add a new nodes and only replace the node once X% has failed. You don’t want to go to the datacenter every time a disk fails, with hyperconverged appliances you simply can’t risk leaving a dead disk as you need the spindles for the backend performance. Storage maintenance also mains you need to move VMs of that host which is always a risk.

So let’s look back at what we learned:

  • The lesson learned from converged solutions is that a single SKU makes sense. You have a single point of contact to praise or blame.
  • The lesson learned from hyperconverged solutions is that having your caching layer inside the host is the best solution. Keeping the compute and read and write IO as close as possible makes sense. Having your cold storage inside the same appliance isn’t a good idea for reasons of scalability and performance.
  • Open vStorage keeps these these lessons in mind: it keeps the Tier 1 inside the compute host but allows to scale storage performance and capacity independently. Using the Seagate Kinetic drives as Tier 2 storage makes sense as it is an easy way to increase the backend storage performance.

To summarize, a converged solution with Tier 1 in the host and a scalable backend on top of Kinetic drives is in every aspect a much better solution compared to a traditional converged or hyperconverged solution if you want to build a cost-effective, scalable solution. The world has been making hamburgers for more than 100 years and we came to to conclusion that having the french fries on the side is the best option. By putting the fries as topping on the burger you are in for a mess so in that spirit let’s also not do it with our compute and (cold) storage.

Open vStorage by CloudFounders

basementIn a recent conference call an attendee expressed the following:

There is a real company behind Open vStorage? I thought this was a project done by 2 guys in their basement.

There is a big misconception about open-source projects. Some of these projects are indeed started and maintained by 2 guys in their basement. But on the other hand you see more and more projects where a couple of hundred people contribute. Take as an example OpenStack. To this open-source project companies such as Red Hat, IBM, HP, Rackspace, SwiftStack, Mirantis, Intel and many more are contributing code and are actually paying people to work on the project.

Open vStorage is a similar project being backed up by a real company: CloudFounders. At CloudFounders we love to build technology. People working for CloudFounders have done this for companies such as Oracle/Sun, Symantec, Didigate/Verizon, Amplidata and many more leading technology companies. We have also been active in the open-source community with projects such as Arakoon, our distributed key-value store.

The technology behind Open vStorage is not something we wrapped together over the last 6 months by gluing some open-source components together and being coated with a nice management layer. The core technology, which basically turns a bucket on your favorite object store into a raw device, is developed from scratch by the CloudFounders R&D and engineering team. We have been working for more than 4 years on the core. We have used the technology in our commercial product, vRun, but decided the best way forward is to open-source the technology. We believe software -defined storage is too important a piece of the virtualization stack for a proprietary solution that is either hypervisor specific, hardware specific, management stack specific and storage backend specific. With Open vStorage we want to build an open and non-proprietary storage layer but foremost something modular enough which allows developers to innovate on top of Open vStorage.

PS. According to Ohloh, Open vStorage has had 1,384 commits made by 14 contributors representing 55,404 lines of code!

Open vStorage Roadshow: last chance to register

To start 2015 in style, the Open vStorage team is doing a small roadshow in the US and Canada. During these presentations we will discuss the upcoming features in Open vStorage and how we will commercially launch solutions based on Open vStorage. The Toronto and Boston Meetup will be a joint session together with Midokura. During the San Francisco session James Hughes, Principal Technologist at Seagate, will also join us and provide an update on the Kinetic Open Storage platform.

Open vStorage Roadshow:

Note that registration is required and there is an attendee limit.

PS. In case you organize an OpenStack User Group and would like to host an Open vStorage session, contact us by email at sales@openvstorage.com.