Today I want to discuss a less technical but more visionary assertion: the concept that storage and networking share the same DNA. This is especially the case for large scale deployments. This insight surfaced as the rumor of Cisco buying Netapp roared again. Allow to me to explain why I believe exabyte storage clusters and large scale networks have a lot in common.
The parallels in storage and networking:
The first feature both networks and exabyte storage share is that they are highly scalable. Both topologies typically start small and grow overtime. Adding more capacity can be achieved seamlessly by adding more hardware to the cluster. This allows for a higher bandwidth, higher capacity and more users to be served.
Downtime is typically unacceptable for both and SLAs to ensure a multi-nine availability are common. To achieve this level of availability both rely on hyper-meshed, shared nothing architectures. These highly redundant architectures ensure that if one component fails another component takes over. To illustrate, switches typically are used in a redundant fashion as a single server is connected to 2 independent switches. If one switch fails the other one takes over. The same holds for storage. Data is also stored redundant. This could be achieved with replication or erasure coding across multiple disks and servers. If a disk or server would fail, data can still be retrieved from other disks and servers in the storage cluster.
These days you can check your Facebook timeline or Twitter account from almost anywhere in the world. Large scale networks allow users to have access from anywhere in the world. This global network spans across the globe and interlinks different smaller networks. The same holds for storage as we are moving to a world where data is stored in geographically dispersed places and even in different clouds.
With new technologies like Software-Defined Networking (SDN) network management has moved towards a Single point of Governance. Accordingly the physical network can be configured on a high level while the detailed network topology is pushed down to the physical and virtual devices that make up the network. The same trend is happening in the storage industry with Software-Defined Storage (SDS). These software applications allow to configure and manage the physical hardware in the storage cluster, even across multiple devices, sites and even different clouds through a single high-level management view.
A last point I’d like to touch is that for both networking and storage, the hardware brands and models hardly matter as they can all work together due to network standards. The same goes for storage hardware. Different brands of disks, controllers and servers can all be used to build an exabyte storage cluster. Users of the network are not aware of the exact topology of the network (brands, links, routing, …). The same holds for storage. The user shouldn’t know on which disk his data is stored exactly, the only thing he cares about is that he or she gets the right data on time when needed and it is safely stored.
Open vStorage, taking the network analogy to the next step
Let’s have a look at the components of a typical network. On the left we have the consumer of the network, in this case a server. This server is physically connected with the network through a Network Interface Controller (NIC). A NIC driver provides the necessary interfaces for the application on the server to use the network. Data which is sent down the network traverses the TCP-IP stack down to the NIC where data is converted into individual packets. Within the network various components play a specific role. A VPN provides encrypted tunnels, WAN accelerators provide caching and compression features, DNS services store the hierarchy of the network and switches/routers route and forward the packets to the right destination. The core-routers form the backbone of the network and connect multiple data centers and clouds.
Each of the above network components can be mapped to an equivalent in the Open vStorage architecture. The Open vStorage Edge offers a block interface to the applications (Hypervisors, Docker, …) on the server. Just like the TCP-IP stack converts the data into network packets, the Volume Driver converts the data received through the block interface into objects (Storage Container Objects). Next we have the proxy which takes up many roles: it encrypts the data for security, provides compression and routes the SCOs after chopping them down in fragments to the right backend. For reads the proxy also plays an important caching role by fetching the data from the correct cache backend. Lastly we have Arkoon, our own distributed key-value store, which stores the metadata of all data in the backend of the storage cluster. A backend consists out of SSDs and HDD in JBODs or traditional x86 servers. There can of course be multiple backends and they can even be spread across multiple data centers.
When reading the first alinea of this blog post it might have crossed your mind that I was crazy. I do hope that after reading through the whole post you realized that networking and storage have a lot in common. As a Product Manager I keep the path that networking has already covered in mind when thinking about the future of storage. How do you see the future of storage? Let me know!