Michael Hausenblas

@mhausenblas

Backup & recovery of infrastructure services

December 31st 2016

Infrastructure should be boring, right? That doesn’t mean that developing tools for said infra can’t be exciting. In one area I believe there’s room for improvement: backing up (and restoring) critical infra service.

To provide some context: I’m talking about cloud native infrastructure, that is, distributed systems that typically manage containers of some sort. What all of those distributed systems have in common is some distributed infrastructure component we use to store state critical to its operation: configuration, metadata about leaders or workers, and so forth:

  • DC/OS uses ZooKeeper supervised by Exhibitor for both its distributed kernel (Apache Mesos) as well as its services (Marathon, Jobs, Spark, Kafka, Cassandra, and so on).
  • Docker SwarmKit uses an internal Raft-based State Store.
  • Kubernetes uses etcd for persistent storage of all of its REST API objects.
  • Nomad uses an internal Raft-based consensus protocol (as well as a gossip protocol to manage cluster membership).

In any case, you might find yourself sometimes in a situation where you want to take a snapshot of the content of the infra service, be it to debug it or to keep a backup of a healthy state. This was the motivation to start work on a tool that I called burry, for BackUp & RecoveRY tool:

http://burry.sh

In a nutshell, burry lets you, at time of writing, take a snapshot of the content of ZooKeeper & etcd and then you can:

  • dump it to the screen, for example: burry --endpoint localhost:2181
  • store it to the local filesystem, for example: burry --endpoint etcd.mesos:1026 --isvc etcd --target local
  • store it in a remote storage system, for example:
burry --endpoint leader.mesos:2181 --target s3 --credentials play.minio.io:9000,AWS_ACCESS_KEY_ID=Q3AM3UQ867SPQQA43P2F,AWS_SECRET_ACCESS_KEY=zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG

Note: currently, you can use Amazon S3 and Minio as remote storage systems.

I’m currently working on Azure and Google storage support as well as restoring the state (that’s the recovery part ;).

What would you like to see, next? Please let me know, either here or by raising an issue on GitHub.

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.
If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

More by Michael Hausenblas

More Related Stories