Most of our code currently runs on . Promoting a micro service architecture, we, as many others, soon faced the problem of service discovery. As others face these problems too, you’d probably think that searching for with your search engine of choice will give you loads of nice solutions. Well, we were disappointed. That's why we wrote our own service discovery library which we are proud to open source. This post explains the motives and why we think that our approach is actually the best. AWS aws service discovery Find the Single Point of Truth Service discovery is, at its heart, about asking the right question to a single point of truth. The point which knows about all your services and how to reach them. Lets think about a couple of obvious solutions. The first take is to declare the network itself as the single point of truth. You can use a protocol which usually involves sending multicast or broadcast messages. As there is no support for these on AWS VPC this possibility is ruled out for our use-case. The second option is to setup an additional service like where your services register on start-up. Discovering a service is then simply a query to this additional service. But wait, how does your original service know where Zookeeper is located? We face the good old here. While there are surely ways to solve this problem they didn’t satisfy us. Additionally, Zookeeper becomes a single point of failure in this solution. Apache Zookeeper chicken-and-egg In the realm of AWS, there is a third alternative: use AWS itself for service discovery. AWS is actually single point of truth knowing which of your services/machines/lambdas/you-name-it are up and running. With the we can easily ask how to reach these instances. Additionally, there is no single point of failure. Well, of course AWS could be down (or at least their endpoint for API queries). But then we probably have to escape to another region anyways. the AWS SDK Although this solution seems to be obvious when you think about it. You hardly find this advice in the Stack Overflow literature. Configuration hell So the idea was to write a thin layer around the AWS SDK. But how should we expose the functionality to our clients? A good API is always nice, but not really useful when it comes to service discovery. You usually don’t want to hard-code the components with all filters in your code directly. Your discovery probably is very different in your local dev-environment then in your CI test-environment. Which, in turn, is different to staging and production. Hence, service discovery itself should be configurable via a simple configuration file. Now, configuration syntaxes usually get it wrong somehow. You either have way too many options making the whole thing unusable unless you are the library maintainer yourself. Or you have just a handful of options which lets you accomplish simple tasks quite easily, but as soon as it gets more complex the library comes to its limit. Service discovery is especially tricky. You want some kind of domain-specific language (DSL) which makes it easy to define the objects you want to discover, the constraints they should satisfy and what attributes of the objects you actually need. Anyways, developing good DSLs is a hard task. So we looked around for existing solutions that could fill the gap somehow. The answer we found might be surprising: . GraphQL was initially developed by Facebook and got publicly released in 2015. Several implementation exist for a huge set of languages –for Scala there is . GraphQL is a simple query language leaning towards a JSON-style syntax. You might rather know it as an alternative to REST when developing HTTP APIs. Well, we use it for something different here. GraphQL Sangria Examples, please! To see how GraphQL works, let’s right jump into an example. Let’s say you want to discover EC2 instances with their id and private IP addresses: query {ec2Instances {idprivateIpAddress}} This is all very clean and clear. This is how an answer formatted in JSON looks like: {"data" : {"ec2Instances" : [{"id" : "i-5bc9450cebf76ff90","privateIpAddress" : "172.18.42.252"},{"id" : "i-48eb0089ee2cf0324","privateIpAddress" : "172.18.17.226"},{"id" : "i-f54ee382cf9e1f773","privateIpAddress" : "172.12.57.132"}}}} See how only the attributes we requested (namely and ) are returned? That's actually the outstanding feature and elegance of GraphQL. id privateIpAddress How about adding a filter because you only want the machines which are called and : midas-rocks midas-is-awsome query {ec2Instances(filters: [{name: "tag:Name",values: ["midas-rocks", "midas-is-awsome"]}]) {idprivateIpAddress}} This didn’t become too complicated, right? The filters get simply mapped to argument. Hence you can actually use all the . The output of this query would be similar to the one listed above, but only containing the machines with the right names. AWS’s Filter documented filters Now, for something more complicated: ECS. If you want to discover where your service is running on ECS, you’ll have to discover the EC2 instance it runs on. (Don’t know if that is going to change with Fargate.) To get all EC2 instances which are part of your ECS cluster you’d use: query {ecsClusters {tasks {containerInstance {ec2Instance {privateIpAddress}}}}} Again, there is a filter for almost every query-level. If, for example, you’d wanted to fetch only the cluster whose ARN ends in and in this cluster you just want the instances running the tasks belonging to the family: cluster/my-test-cluster foobar query {ecsClusters(filterArn: ".*cluster/my-test-cluster") {tasks(family: "foobar") {// ...}}} The former query would yield a result like this: {"data" : {"ecsClusters" : [{"tasks" : [{"containerInstance" : {"ec2Instance" : {"privateIpAddress" : "172.18.42.252"}}},{"containerInstance" : {"ec2Instance" : {"privateIpAddress" : "172.18.17.226"}}}]}]}} Things become a bit more complicated here and there is quite some work involved if you wanted to extract the interesting bits from this example. Also, as indicated above, the query will be defined in a configuration file. So, in your code you don’t have any clue how to map into this structure in order to get to the interesting parts. This is where extractors come into the game. Extractors hook into Sangrias mechanism which provide a way to step through the query in an early preparation step. We use it to generate a query-tailored function which extracts just the fields from the result of the query. Let’s see how that works interactively: query reducers scala> import scala.concurrent.Awaitscala> import scala.concurrent.duration._scala> import scala.concurrent.ExecutionContext.Implicits.globalscala> import sangria.macros._scala> val query = graphql"""| query {|   ecsClusters {|     tasks {|       containerInstance {|         ec2Instance {|           privateIpAddress|         }|       }|     }|   }| }| """query: sangria.ast.Document = // [...] scala> import social.midas.discovery.commonscala> val fut = common.prepareQuery(|   query, extractors=List(common.Ip4Extractor),| )fut: scala.concurrent.Future // [...] scala> val prepared = Await.result(fut, 10.seconds)prepared: sangria.execution.PreparedQuery // [...] scala> val extractor = prepared.userContext.extractor.getextractor: Any => Seq[Any] = // [...] scala> val result = Await.result(prepared.execute(), 10.seconds)result: Any = // [...] scala> extractor(result).asInstanceOf[Seq[String]]res0: Seq[String] = Vector(172.18.42.252, 172.18.17.226) “Dude, this is too complicated!” Of course, this is all way too complex. So there is an overall wrap-up function which reads the query and the extractor from a configuration file and delivers the final result in a : Future scala> Await.result(common.discoverFromConfig(), 2.seconds)| .asInstanceOf[Seq[String]]res3: Seq[String] = Vector(172.18.42.252, 172.18.17.226) We use the . This is how the configuration file looks like: config library by Lightbend discovery.aws.region = "eu-central-1"discovery.extractors = [ "social.midas.discovery.common.Ip4Extractor$" ]discovery.query = """query {ecsClusters {tasks {containerInstance {ec2Instance {privateIpAddress}}}}}""" In our main routine we then simply call and get all the IP addresses of the EC2 instances being part of our ECS cluster. discoverFromConfig What it is and what it isn’t. Don’t expect a fully fledged library which discovers all kinds of services exactly on AWS! The library, as of this writing, only supports the discovery of IP4 addresses on EC2 and ECS. Yet it is under heavy development. While we don’t expect the API to change dramatically there surely is space for improvement. Mainly you can expect discoverable services and extractors to grow rapidly in the near future. After all, we open sourced this piece of software to get your feedback. We think that the proposed solution to wrap the AWS SDK in a query language like GraphQL is elegant and outstanding when it comes to the task of service discovery. Tell us what you think about it. We’d love to hear from you!

Amazon

Apache

Discovery

Midas Open Sources its Service Discovery Library

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

10 Lessons from 10 Years of AWS (part 2)

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

10 Lessons from 10 Years of AWS (part 2)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps