The Distributed Transaction Log explained

During my 1-on-1 sessions I quite often get the question how Open vStorage makes sure there is no data loss when a host crashes. As you probably already know Open vStorage uses SSDs and PCIe flash cards inside the host where the VM is running to store incoming writes. All incoming writes for a volume get appended to a log file (SCO, Storage Container Object) and once enough write are accumulated the SCO gets stored on the backend. Once the SCO is on the backend Open vStorage relies on the functionality (erasure coding, 3-way replication, …) of the backend to make sure that data is stored safely.

This means there is window where data is vulnerable, when the SCO is being constructed and not yet stored on the Backend. To ensure the vulnerable data isn’t lost when a host crashes, incoming writes are also stored in the Distributed Transaction Log (DTL) on another host in the Open vStorage cluster. Note that the volume can even be restarted on another host than were the DTL was stored.

For the DTL of volume you can select one of the following options as modus operandi:

  • No DTL: when this option is selected incoming data doesn’t get stored in the DTL on another node. This option can be used when performance is key and some data loss is acceptable when the host or storage router goes down. Test VMs or VMs which are running batch or distributed applications (f.e. transcoding of files to another file) can use this option.
  • Asynchronous: when this option is selected the incoming writes are added to a queue on the host and replicated to the DTL on the other host once the queue reaches a certain size or if a certain time is exceeded. To ensure consistency, all outstanding data is synced to the DTL in case a sync is executed within the file system of the VM. Virtual Machines running on KVM can use this option. This mode balances data safety and performance.
  • DTL - async

  • Synchronous: when this option is selected, every write request gets synchronized to the DTL on the other host. This option should be selected when absolutely no data loss is acceptable (distributed NFS, HA iSCSI disks). Since this options synchronizes on every write, it is the slowest mode of the DTL. Note that in case the DTL can’t be reached (f.e. because the host is being rebooted), the incoming I/O isn’t blocked and doesn’t return an I/O error to the VM but an out-of-band event is generated to restart the DTL on another host.
  • DTL - sync

About the Author
Wim Provoost
Product Manager Open vStorage.

Leave a Reply