Infrastructure should be boring, right? That doesn’t mean that tools for said infra can’t be exciting. In one area I believe there’s room for improvement: . developing backing up (and restoring) critical infra service To provide some context: I’m talking about cloud native infrastructure, that is, distributed systems that typically manage of some sort. What all of those distributed systems have in common is some distributed infrastructure component we use to store state critical to its operation: configuration, metadata about leaders or workers, and so forth: containers DC/OS uses for both its distributed kernel (Apache Mesos) as well as its services (Marathon, Jobs, Spark, Kafka, Cassandra, and so on). ZooKeeper supervised by Exhibitor Docker SwarmKit uses an internal Raft-based . State Store Kubernetes uses for persistent storage of all of its REST API objects. etcd Nomad uses an internal Raft-based protocol (as well as a gossip protocol to manage cluster membership). consensus In any case, you might find yourself sometimes in a situation where you want to take a snapshot of the content of the infra service, be it to debug it or to keep a backup of a healthy state. This was the motivation to start work on a tool that I called , for _B_ack_U_p & tool: burry R_ecove_RY http://burry.sh In a nutshell, lets you, at time of writing, take a snapshot of the content of r & etcd and then you can: burry ZooKeepe dump it to the screen, for example: burry --endpoint localhost:2181 store it to the local filesystem, for example: burry --endpoint etcd.mesos:1026 --isvc etcd --target local store it in a remote storage system, for example: burry --endpoint --credentials play.minio.io:9000,AWS_ACCESS_KEY_ID=Q3AM3UQ867SPQQA43P2F,AWS_SECRET_ACCESS_KEY=zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG leader.mesos:2181 --target s3 Note: currently, you can use Amazon S3 and Minio as remote storage systems. I’m currently working on Azure and Google storage support as well as restoring the state (that’s the recovery part ;). What would you like to see, next? Please let me know, either here or by raising an . issue on GitHub
Share Your Thoughts