Thomas Bach

@fuzzy_id

Midas Open Sources its Service Discovery Library

Most of our code currently runs on AWS. Promoting a micro service architecture, we, as many others, soon faced the problem of service discovery. As others face these problems too, you’d probably think that searching for aws service discovery with your search engine of choice will give you loads of nice solutions. Well, we were disappointed. That's why we wrote our own service discovery library which we are proud to open source. This post explains the motives and why we think that our approach is actually the best.

Find the Single Point of Truth

Service discovery is, at its heart, about asking the right question to a single point of truth. The point which knows about all your services and how to reach them. Lets think about a couple of obvious solutions.

The first take is to declare the network itself as the single point of truth. You can use a protocol which usually involves sending multicast or broadcast messages. As there is no support for these on AWS VPC this possibility is ruled out for our use-case.

The second option is to setup an additional service like Apache Zookeeper where your services register on start-up. Discovering a service is then simply a query to this additional service. But wait, how does your original service know where Zookeeper is located? We face the good old chicken-and-egg here. While there are surely ways to solve this problem they didn’t satisfy us. Additionally, Zookeeper becomes a single point of failure in this solution.

In the realm of AWS, there is a third alternative: use AWS itself for service discovery. AWS is actually the single point of truth knowing which of your services/machines/lambdas/you-name-it are up and running. With the AWS SDK we can easily ask how to reach these instances. Additionally, there is no single point of failure. Well, of course AWS could be down (or at least their endpoint for API queries). But then we probably have to escape to another region anyways.

Although this solution seems to be obvious when you think about it. You hardly find this advice in the Stack Overflow literature.

Configuration hell

So the idea was to write a thin layer around the AWS SDK. But how should we expose the functionality to our clients? A good API is always nice, but not really useful when it comes to service discovery. You usually don’t want to hard-code the components with all filters in your code directly. Your discovery probably is very different in your local dev-environment then in your CI test-environment. Which, in turn, is different to staging and production.

Hence, service discovery itself should be configurable via a simple configuration file. Now, configuration syntaxes usually get it wrong somehow. You either have way too many options making the whole thing unusable unless you are the library maintainer yourself. Or you have just a handful of options which lets you accomplish simple tasks quite easily, but as soon as it gets more complex the library comes to its limit.

Service discovery is especially tricky. You want some kind of domain-specific language (DSL) which makes it easy to define the objects you want to discover, the constraints they should satisfy and what attributes of the objects you actually need. Anyways, developing good DSLs is a hard task. So we looked around for existing solutions that could fill the gap somehow. The answer we found might be surprising: GraphQL.

GraphQL was initially developed by Facebook and got publicly released in 2015. Several implementation exist for a huge set of languages –for Scala there is Sangria. GraphQL is a simple query language leaning towards a JSON-style syntax. You might rather know it as an alternative to REST when developing HTTP APIs. Well, we use it for something different here.

Examples, please!

To see how GraphQL works, let’s right jump into an example. Let’s say you want to discover EC2 instances with their id and private IP addresses:

query {
ec2Instances {
id
privateIpAddress
}
}

This is all very clean and clear. This is how an answer formatted in JSON looks like:

{
"data" : {
"ec2Instances" : [
{
"id" : "i-5bc9450cebf76ff90",
"privateIpAddress" : "172.18.42.252"
},
{
"id" : "i-48eb0089ee2cf0324",
"privateIpAddress" : "172.18.17.226"
},
{
"id" : "i-f54ee382cf9e1f773",
"privateIpAddress" : "172.12.57.132"
}
}
}
}

See how only the attributes we requested (namely id and privateIpAddress) are returned? That's actually the outstanding feature and elegance of GraphQL.

How about adding a filter because you only want the machines which are called midas-rocks and midas-is-awsome:

query {
ec2Instances(
filters: [{
name: "tag:Name",
values: ["midas-rocks", "midas-is-awsome"]
}]
) {
id
privateIpAddress
}
}

This didn’t become too complicated, right? The filters get simply mapped to AWS’s Filter argument. Hence you can actually use all the documented filters. The output of this query would be similar to the one listed above, but only containing the machines with the right names.

Now, for something more complicated: ECS. If you want to discover where your service is running on ECS, you’ll have to discover the EC2 instance it runs on. (Don’t know if that is going to change with Fargate.) To get all EC2 instances which are part of your ECS cluster you’d use:

query {
ecsClusters {
tasks {
containerInstance {
ec2Instance {
privateIpAddress
}
}
}
}
}

Again, there is a filter for almost every query-level. If, for example, you’d wanted to fetch only the cluster whose ARN ends in cluster/my-test-cluster and in this cluster you just want the instances running the tasks belonging to the foobar family:

query {
ecsClusters(filterArn: ".*cluster/my-test-cluster") {
tasks(family: "foobar") {
// ...
}
}
}

The former query would yield a result like this:

{
"data" : {
"ecsClusters" : [
{
"tasks" : [
{
"containerInstance" : {
"ec2Instance" : {
"privateIpAddress" : "172.18.42.252"
}
}
},
{
"containerInstance" : {
"ec2Instance" : {
"privateIpAddress" : "172.18.17.226"
}
}
}
]
}
]
}
}

Things become a bit more complicated here and there is quite some work involved if you wanted to extract the interesting bits from this example. Also, as indicated above, the query will be defined in a configuration file. So, in your code you don’t have any clue how to map into this structure in order to get to the interesting parts. This is where extractors come into the game.

Extractors hook into Sangrias query reducers mechanism which provide a way to step through the query in an early preparation step. We use it to generate a query-tailored function which extracts just the fields from the result of the query. Let’s see how that works interactively:

scala> import scala.concurrent.Await
scala> import scala.concurrent.duration._
scala> import scala.concurrent.ExecutionContext.Implicits.global
scala> import sangria.macros._
scala> val query = graphql"""
| query {
| ecsClusters {
| tasks {
| containerInstance {
| ec2Instance {
| privateIpAddress
| }
| }
| }
| }
| }
| """
query: sangria.ast.Document = // [...]

scala> import social.midas.discovery.common
scala> val fut = common.prepareQuery(
| query, extractors=List(common.Ip4Extractor),
| )
fut: scala.concurrent.Future // [...]

scala> val prepared = Await.result(fut, 10.seconds)
prepared: sangria.execution.PreparedQuery // [...]

scala> val extractor = prepared.userContext.extractor.get
extractor: Any => Seq[Any] = // [...]

scala> val result = Await.result(prepared.execute(), 10.seconds)
result: Any = // [...]

scala> extractor(result).asInstanceOf[Seq[String]]
res0: Seq[String] = Vector(172.18.42.252, 172.18.17.226)

“Dude, this is too complicated!”

Of course, this is all way too complex. So there is an overall wrap-up function which reads the query and the extractor from a configuration file and delivers the final result in a Future:

scala> Await.result(common.discoverFromConfig(), 2.seconds)
| .asInstanceOf[Seq[String]]
res3: Seq[String] = Vector(172.18.42.252, 172.18.17.226)

We use the config library by Lightbend. This is how the configuration file looks like:

discovery.aws.region = "eu-central-1"
discovery.extractors = [ "social.midas.discovery.common.Ip4Extractor$" ]
discovery.query = """
query {
ecsClusters {
tasks {
containerInstance {
ec2Instance {
privateIpAddress
}
}
}
}
}
"""

In our main routine we then simply call discoverFromConfig and get all the IP addresses of the EC2 instances being part of our ECS cluster.

What it is and what it isn’t.

Don’t expect a fully fledged library which discovers all kinds of services exactly on AWS! The library, as of this writing, only supports the discovery of IP4 addresses on EC2 and ECS. Yet it is under heavy development. While we don’t expect the API to change dramatically there surely is space for improvement. Mainly you can expect discoverable services and extractors to grow rapidly in the near future.

After all, we open sourced this piece of software to get your feedback. We think that the proposed solution to wrap the AWS SDK in a query language like GraphQL is elegant and outstanding when it comes to the task of service discovery. Tell us what you think about it. We’d love to hear from you!

Topics of interest

More Related Stories