• +1 (408) 493-0687
  • WebMail
Elastic Block Storage
Menu
  • Home
  • Features
    • Multi-Datacenter Technology Extreme Resilience and Availability
      Ultra-High Performance Up to 50MIO IOPS per Rack
      TCO Optimization Block Storage with the Object Economics
      True Data Mobility Let your Volumes travel freely across Datacenters
      Backup Included Eliminate need for external tools and processes
      Automatic Tiering Make optimal use of Flash, SSD and HDD
  • Solutions
    • Infrastructure

      Scale Out VMWare Ultra Scalable VMWare driven Cloud
      OpenStack Ultra Performant Storage Platform for OpenStack
      Containers Storage Platform for Containers

      Applications

      Machine Data Fast and Secure Machine Data Storage
      Copy Data Management (CDM) Add Copy Data Management to your workload and eliminate Storage Silos
      Closed Circuit Television Storage Ultra Fast, Ultra Scalable
      Backup & DR All-in-one. On- and Off-Prem

      Market Segments

      Service Providers Automated and Dynamic
      Enterprise Private Clouds Accelerated Operations, Reduced TCO
  • How it works
  • Case Studies
    • iQIYI Chinese Video Platform
  • Resources
  • Contact
    • Jobs
  • Blog

User functions in Arakoon

  • 03 June 2016
  • by: Wim Provoost
  • in: arakoon
  • Tags: ALBA, Arakoon, key value store, metadata, OCaml, Open vStorage Backend, user functions
  • note: no comments

Originally published on the incubaid.com blog on 2013/02/01

Mahomet cald the Hill to come to him. And when the Hill stood still, he was neuer a whit abashed, but said;
If the Hill will not come to Mahomet, Mahomet wil go to the hill.

Francis Bacon

Introduction

Arakoon tries to be a simple distributed key value store that favours consistency over availability.
From time to time, we get feature requests for additional commands like:

  • assert_exists: assert a value exists for the key without caring what the value actually is
  • increment_counter: increment (or create) a counter and return the new value.
  • queue operations : add an element to the front/back of a double ended queue or pop an element
  • set_and_update_X: insert a key value pair and update some non-trivial counter X (think averages, variances,…)
  • …

The list is semi-infinite and the common thing here is that they are too complex/specific/weird/… to do them in one step using the provided interface. Of course, you can do all of this on the client side, but it will cost extra network round-trips. In distributed systems, you really want to keep the number of round-trips low, which pushes you towards these kind of feature requests.

Once you decided (performance bottlenecks probably) that you need extra functionality there are two things you can do. First, you can try to force or entice us into adding them to the core interface or alternatively, you can get by using Arakoon’s “user functions”. For some reason people fear them but there’s no real technical reason to do so.

This blog post will cover two things. First we’ll go in to the nitty gritty of coding and deploying user functions and then we’ll look at some of the strategic/architectural challenges of user functions.

How do user functions work?

The high level view is this: you build a user function, and register it to an Arakoon cluster before you start it. Then, at runtime, you can call it, using any client, with a parameter (a string option) and get back a result (string option). On the server side, the master will log this in its transaction log, try to reach consensus with the slave(s) and once that is the case, the user function will be executed inside a transaction. The result of that call will be sent to the client. If an exception occurs, the transaction will be aborted. Since Arakoon logs transactions it can replay them in case of calamities. This has a very important impact: since Arakoon needs to be able to replay the execution of a user function, you cannot do side effects, use random values or read the system clock.

Running Example

We’re going to try to build a simple queue API.
It will offer named queues with 2 operations: push and pop. Also, it’s a first-in-first-out thingy.

Arakoon 1

Client side API

Arakoon 1 offers the following API for user functions.

def userFunction(self, name, argument):
'''Call a user-defined function on the server
@param name: Name of user function
@type name: string
@param argument: Optional function argument
@type argument: string option

@return: Function result
@rtype: string option
'''

Let’s take a look at it. A userFunction call needs the name, which is a string, and an argument which is a string option and returns a result of type string option. So what exactly is a string option in Python? Well, it’s either a string or None. This allows a user function to not take input or to not yield a result.

Server side API

The server side API is in OCaml, and looks like this:


class type user_db =
object
method set : string -> string -> unit
method get : string -> string
method delete: string -> unit
method test_and_set: string -> string option -> string option -> string option
method range_entries: string option -> bool -> string option -> bool -> int
-> (string * string) list
end

User functions on server side match the client’s opaque signature.

user_db -> string option -> string option

Queue’s client side

Let’s create the client side in python. We’ll create a class that uses an Arakoon client and acts as a queue. The problem with push is that we need to fit both the name and the value into the one paramater we have available. We need to do our own serialization. Let’s just be lazy (smart?) and use Arakoon’s serialization. The code is shown below.

from arakoon import Arakoon
from arakoon import ArakoonProtocol as P

class ArakoonQueue:
    def __init__(self, name, client):
        self._name = name
        self._client = client

    def push(self, value):        
        input =   P._packString(self._name) 
        input +=  P._packString(value)
        self._client.userFunction("QDemo.push", input)

    def pop(self):
        value = self._client.userFunction("QDemo.pop", self._name)
        return value



That wasn’t too hard now was it?

Queue, server side

The whole idea is that the operations happen on server side, so this will be a tat more complex.
We need to model a queue using a key value store. Code-wise, that’s not too difficult.
For each queue, we’ll keep 2 counters that keep track of both ends of the queue.

A push is merely getting the qname and the value out of the input, calculating the place where we need to store it, store the value there and update the counter for the back end of the queue. A pop is similar but when the queue becomes empty, we use the opportunity to reset the counters (maybe_reset_counters). The counter representation is a bit weird but Arakoon stores things in lexicographical order and we want to take advantage of this to keep our queue fifo. Hence, we need to make the counter in such a way the counter’s order is the same as a string’s order. The code is shown below.

(* file: plugin_qdemo.ml *)

open Registry 

let zero = ""
let begin_name qname = qname ^ "/@begin" 
let end_name qname = qname ^ "/@end"
let qprefix qname key = qname ^ "/" ^ key

let next_counter = function
  | "" -> "A"
  | s -> 
      begin
        let length = String.length s in
        let last = length - 1 in
        let c = s.[last] in
        if c = 'H' 
        then s ^ "A"
        else let () = s.[last] <- Char.chr(Char.code c + 1) in 
             s
      end

let log x= 
  let k s = let s' = "[plugin_qdemo]:" ^ s in
            Lwt.ignore_result (Lwt_log.debug s')
  in
  Printf.ksprintf k x

let maybe_reset_counters user_db qname b1 = 
  let e_key = end_name qname in
  let exists = 
    try let _ = user_db # get e_key in true with Not_found -> false 
  in
  if exists
  then 
    let ev = user_db # get e_key in
    if ev = b1 then
      let b_key = begin_name qname in
      let () = user_db # set b_key zero in
      let () = user_db # set e_key zero in
      ()
    else
      ()
  else ()

let push user_db vo = 
  match vo with
    | None -> invalid_arg "push None"
    | Some v -> 
        let qname, p1 = Llio.string_from v 0 in
        let value, _ = Llio.string_from v p1 in
        let e_key = end_name qname in
        let b0 = 
          try user_db # get (end_name qname) 
          with Not_found -> zero 
        in
        let b1 = next_counter b0 in
        let () = user_db # set (qprefix qname b1) value in
        let () = user_db # set e_key b1 in
        None

let pop user_db vo =
  match vo with 
    | None   -> invalid_arg "pop None"
    | Some qname -> 
        let b_key = begin_name qname in
        let b0 = 
          try user_db # get (begin_name qname) 
          with Not_found -> zero
        in
        let b1 = next_counter b0 in
        try 
          let k = qprefix qname b1 in
          let v = user_db # get k in 
          let () = user_db # set b_key b1 in
          let () = user_db # delete k in
          let () = maybe_reset_counters user_db qname b1 in
          Some v
        with
          Not_found ->
            let e_key = end_name qname in
            let () = user_db # set b_key zero in
            let () = user_db # set e_key zero in
            None
              

let () = Registry.register "QDemo.push" push
let () = Registry.register "QDemo.pop" pop

The last two lines register the functions to the Arakoon cluster when the module is loaded.

Compilation

So how do you deploy your user function module into an Arakoon cluster?
First need to compile your module into something that can be dynamically loaded.
To compile the plugin_qdemo.ml I persuade ocamlbuild like this:

ocamlbuild -use-ocamlfind -tag 'package(arakoon_client)' \
-cflag -thread -lflag -thread \
plugin_qdemo.cmxs

It’s not too difficult to write your own testcase for your functionality, so you can run it outside of Arakoon and concentrate on getting the code right.

Deployment

First, you need put your compilation unit into the Arakoon home directory on all your nodes of the cluster. And second, you need to add the name to the global section of your cluster configuration. Below, I show the configuration file for my simple, single node cluster called ricky.


[global]
cluster = arakoon_0
cluster_id = ricky

### THIS REGISTERS THE USER FUNCTION:
plugins = plugin_qdemo

[arakoon_0]
ip = 127.0.0.1
client_port = 4000
messaging_port = 4010
home = /tmp/arakoon/arakoon_0

All right, that’s it. Just a big warning about user functions here.

Once a user function is installed, it needs to remain available, with the same functionality for as long as user function calls are stored inside the transaction logs, as they need to be re-evaluated when one replays a transaction log to a store (for example when a node crashed, leaving a corrupt database behind). It’s not a bad idea to include a version in the name of a user function to cater for evolution.

Demo

Let’s use it in a simple python script.

def make_client():
    clusterId = 'ricky'
    config = Arakoon.ArakoonClientConfig(clusterId,
                                         {"arakoon_0":("127.0.0.1", 4000)})
    client = Arakoon.ArakoonClient(config)
    return client

if __name__ == '__main__':
    client = make_client()
    q = ArakoonQueue("qdemo", client)
    q.push("bla bla bla")
    q.push("some more bla")
    q.push("3")
    q.push("4")
    q.push("5")
    print q.pop()
    print q.pop()
    print q.pop()
    print q.pop()

with expected results.

Questions asked

Why don’t you allow user functions to be written in <INSERT YOUR FAVOURITE LANGUAGE HERE>?

This is a good question, and there are several answers, most of them wrong. For example, anything along the lines of “I don’t like your stinkin’ language” needs to be rejected because a language’s cuteness is irrelevant.

There are several difficulties with the idea of offering user functions to be written in another programming language. For scripting languages like Python, Lua, PHP ,… we can either implement our own interpreter and offer a subset of the language, which is a lot of work with low return on investment, or integrate an existing interpreter/runtime which will probably not play nice with Lwt, or with the OCaml runtime (garbage collector). For compiled languages we might go via the ffi but it’s still way more complex for us. So for now you’re stuck with OCaml for user functions. There are worse languages.

Wouldn’t it be better if you apply the result of the user function to the transaction log iso the arguments?

Well, we’ve been thinking about that a lot before we started with user functions. The alternative is that we record and log the effect of the user function so that we can always replay that effect later, even when the code is no longer available. It’s an intriguing alternative, but it’s not a clear improvement. It all depends on the size of the arguments versus the size of the effect.
Some user functions have a small argument set and a big effect, while for other user functions it’s the other way around.

Closing words

Technically, it’s not too difficult to hook in your own functionality into Arakoon. Just make sure the thing you want to hook in does not have major flaws.

have fun,

Romain.

The new metadata server architecture in action

  • 02 March 2015
  • by: Wim Provoost
  • in: features
  • Tags: metadata, metadata server, storage
  • note: no comments

In our previous blog post, I explained the new metadata server architecture. The below movie shows the new architecture in action.

Metadata, the key to taking over the world

  • 13 February 2015
  • by: Wim Provoost
  • in: features
  • Tags: HA, LBA, live migration, master, metadata, metadata server, slave, vMotion
  • note: no comments

My wife thinks I’m weird but she loves me anyway. She is probably right as today I got excited about metadata. Metadata of volumes to be more exact. Let me explain why I got so excited. Open vStorage keeps track of which LBA of a volume contains which data, the metadata of a volume. Instead of storing the 4k data block next to the LBA in some kind of database, we create a mapping between the LBA, the place where the data is actually stored (Storage Container ID and the offset) and a hash of the 4k block. I’ve discussed this already in a previous blog post. Now, why am I so excited about this you ask?

In the version 1.6 of Open vStorage this mapping, the metadata database, is only stored locally on the host where the volume is served (typically where the Virtual Machine is running). Under normal circumstances it isn’t a problem that this data is only local as only that host is using this metadata database. When you move a Virtual Machine between hosts this locality is a drawback. Before the new host can start accepting data for the moved volume, the metadata database for that volume needs to be reconstructed with data from the backend. In reality this means getting all TLOGs for the volume from the backend and replaying them from the first to the last. Remember, each TLOG-entry contains the metadata for a specific write: the LBA, the location (a combination on the SCO name and the offset within the SCO) and a hash. Once all TLOGs are replayed, the metadata is reconstructed and the volume can start accepting new IO requests. In case the volume has received a lot of write IO, many TLOGs need to be fetched from the backend so it will take a few seconds, in worst case scenarios even up to a minute, before the volume is available. It works but it isn’t ideal.

Slide1

In the new version of Open vStorage we have fundamentally changed the metadata architecture. We now have added role based functionality to the metadata server. A metadata server can be the master (database) or a slave (database) for a certain volume. Typically the master role will run on the same host as where the volume is running. This means performance is top-notch. Next to the master role, by default an additional slave role will be created on the metadata server of another host. The slave is almost up to date with the master so when it is promoted to be the master, only a couple TLOG-entries have to be replayed.

The real value of the master/slave architecture comes to play when you move a Virtual Machine between hosts. When the original metadata server with the master role is still available, f.e. in case of a live migration, the volume will immediately be able to receive IO requests as underneath the master metadata server will be consulted. As you are going over the network for each metadata lookup, there will be a small performance drop. As soon as the new host discovers it needs to go over the network to another host for the metadata, it will in the background create a slave copy locally. To do this it starts fetching the TLOGs from the backend and replays them so the metadata of the moved volume becomes available locally. Once the local metadata server is up to date, the local slave is promoted to become the master and the metadata is looked up locally from then on.

Let’s illustrate this with an example. In the below scenario VM3 is moved to a second host. The VM3 will immediately be able to receive IO as the metadata server on the original host is still accessible.

Slide3

Another use case where the master/slave architecture shows its value is in case a host goes down and the Virtual Machines and corresponding volumes are restarted on another host. As the slave is almost up to date with the master only a few TLOG-entries need to be replayed. The volume will almost instantly be accessible. It doesn’t really matter if the Virtual Machine is running on the same host as the promoted slave or on a different host. In case it runs on a different host, the master will first be consulted over the network.

Slide5

In the background a local slave will be created and the necessary TLOGS will be fetched from the backend to recreate the metadata locally. Once the slave on the host where the Virtual Machine is running is up to date, it will of course be promoted to master and metadata lookups will happen locally.

Slide6

Together with this master slave concept we also added functionality which detects if a metadata server is overloaded. Once this is detected, another metadata server will be created on the same host to make sure the performance doesn’t suffer. As you can see, I have all reasons to be excited about metadata!

Labels:

Site Categories

  • arakoon (4)
  • features (42)
  • market (28)
  • media (11)
  • releases (21)
  • roadmap (2)
  • Uncategorized (1)
  • white papers (2)
  • Popular
  • Recent
  • Tags
  • Distributed Config Management October 24, 2016
  • The Open vStorage iSCSI integration The Open vStorage iSCSI… December 22, 2017
  • Open vStorage Support Page April 1, 2015
  • Standing at the software… March 4, 2014
  • Open vStorage 2.1 May 12, 2015
  • The Open vStorage iSCSI integration The Open vStorage iSCSI integration December 22, 2017
  • SCOs , chunks & fragments SCOs , chunks & fragments October 30, 2017
  • The Storage and Networking DNA. The Storage and Networking DNA. October 24, 2017
  • Open vStorage High Availability (HA) Open vStorage High Availability (HA) October 16, 2017
  • Jobs, Jobs, Jobs, … Jobs, Jobs, Jobs, … September 26, 2017
  • OpenStack (16)
  • Swift (12)
  • ALBA (11)
  • performance (10)
  • software-defined storage (8)
  • Arakoon (8)
  • Fargo (7)
  • VMware (6)
  • HA (6)
  • Cinder (6)
  • edge (6)
  • vPool (5)
  • object storage (5)
  • bug fixes (5)
  • hybrid cloud (4)
  • backend (4)
  • storage (4)
  • 2.2 (4)
  • alpha (4)
  • docker (4)
  • VM-centric (3)
  • converged (3)
  • Storage Swiss (3)
  • KVM (3)
  • Elasticsearch (3)

Site Archives

  • December 2017 (1)
  • October 2017 (3)
  • September 2017 (2)
  • August 2017 (1)
  • June 2017 (2)
  • May 2017 (2)
  • April 2017 (3)
  • March 2017 (3)
  • January 2017 (6)
  • December 2016 (4)
  • November 2016 (4)
  • October 2016 (3)
  • June 2016 (7)
  • May 2016 (1)
  • April 2016 (1)
  • March 2016 (1)
  • January 2016 (5)
  • December 2015 (1)
  • November 2015 (1)
  • October 2015 (1)
  • August 2015 (2)
  • July 2015 (6)
  • June 2015 (1)
  • May 2015 (1)
  • April 2015 (2)
  • March 2015 (2)
  • February 2015 (2)
  • January 2015 (4)
  • December 2014 (5)
  • November 2014 (2)
  • October 2014 (3)
  • September 2014 (4)
  • August 2014 (3)
  • July 2014 (4)
  • June 2014 (3)
  • May 2014 (3)
  • April 2014 (8)
  • March 2014 (1)

Tag Cloud

2.2 2015 ALBA alpha API Arakoon ASD backend bug fixes cache Cinder Configuration Managament converged docker edge Elasticsearch Encryption ETCD Fargo GUI HA HA Support hybrid cloud key value store Kinetic KVM Logging metadata metadata server multi datacenter object storage OCaml open-source OpenStack OpenStack Online Meetup Open vStorage Backend performance software-defined storage storage Storage Swiss Swift Ubuntu VM-centric VMware vPool

Post Calendar

December 2019
M T W T F S S
« Dec    
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Need Help? Call Us

+1 (408) 493-0687

Sign up for Newsletter

 

Follow Us

Antwerpse Steenweg 19
9080 Lochristi
Belgium
Phone EU: +32 (9) 324 25 74
Phone US: +1 (408) 493-0687
Mail: info@openvstorage.wpengine.com


Check out our Open Source Project
openvstorage.org

Copyright © 2016 - iNuron - All rights reserved.
Scroll