How to Use Consul as a Host Resolver in gRPC

Written by vgukasov | Published 2021/12/09
Tech Story Tags: grpc | golang | go | consul | hashicorp | host-resolver | tutorial | no-code

TLDRvia the TL;DR App

Recently, I faced with lack of documentation when I wanted to use Consul as a host resolver in gRPC connections. That’s why I want to fill the hole for engineers who struggle with the same problem. So, let’s start with a little bit of theory.

Why gRPC

gRPC is a trendy RPC framework developed by Google. It reached its popularity cause of several reasons:

  • it’s extremely fast because the Protocol Buffers (protobuf), a powerful binary serialization toolset and language
  • it generates the client/server code by a single protobuf contract.
  • it works with HTTP/2 and supports BI-directional data streams

Why Consul

Consul is a robust service mesh developed by HashiCorp. There are a lot of use-cases for Consul:

  • service discovery
  • config storage
  • key-value storage

In our project, Consul keeps info about all microservice hosts. So when a microservice instance goes down, or there is a new one — Consul knows it immediately. So we want to use the information to resolve a host for any gRPC interaction to ensure that microservices are connected to actual healthy hosts.

First non-optimal solution

The main problem is that there is no official documentation on resolving hosts by Consul in gRPC, so we tried to Google a solution. There was a suggestion to implement a host resolver by ourselves.

OK, let’s do it:

Golang — is the primary language in our tech stack, so all written code will be in Go.

package grpcclient

import (
	"errors"
	"fmt"
	"net/url"
	"strings"

	"utils/pkg/connection"
	"utils/pkg/consul"

	"github.com/hashicorp/consul/api"
	"github.com/sirupsen/logrus"
	"google.golang.org/grpc/resolver"
)

const (
	resolverSchemeConsul = "consul"
)

var (
	errUnknownScheme = errors.New("unknown scheme. Only 'consul' is applicable")
)

// consulBuilder builds the address resolver for gRPC dialer.
type consulBuilder struct {
	consul         *consul.Cluster
	serviceWatcher connection.Watcher
}

// newConsulBuilder is a constructor.
func newConsulBuilder(consul *consul.Cluster, serviceWatcher connection.Watcher) *consulBuilder {
	return &consulBuilder{
		consul:         consul,
		serviceWatcher: serviceWatcher,
	}
}

// Builds the consul address resolver for gRPC.
func (c consulBuilder) Build(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) {
	if target.Scheme != resolverSchemeConsul {
		return nil, errUnknownScheme
	}

	serviceName, tag, err := parseTarget(target)
	if err != nil {
		return nil, fmt.Errorf("parse target: %w", err)
	}

	cr := newConsulResolver(c.serviceWatcher, cc)
	err = cr.watch(serviceName, []string{tag})
	if err != nil {
		return nil, fmt.Errorf("watch: %w", err)
	}

	return cr, nil
}

// parses the target and returns service name and tag.
// example of target endpoint: 127.0.0.1:8500/service?tag=master
func parseTarget(target resolver.Target) (string, string, error) {
	u, err := url.Parse(fmt.Sprintf("%s://%s", resolverSchemeConsul, target.Endpoint))
	if err != nil {
		return "", "", fmt.Errorf("parse: %w", err)
	}

	return strings.Trim(u.Path, "/"), u.Query().Get("tag"), nil
}

// Scheme returns the consul resolver scheme.
func (c consulBuilder) Scheme() string {
	return resolverSchemeConsul
}

type consulResolver struct {
	cc             resolver.ClientConn
	serviceWatcher connection.Watcher
	unwatchFunc    func() error
}

func newConsulResolver(serviceWatcher connection.Watcher, cc resolver.ClientConn) *consulResolver {
	return &consulResolver{
		serviceWatcher: serviceWatcher,
		cc:             cc,
	}
}

func (c *consulResolver) onServiceChanged(entries []*api.ServiceEntry) {
	addresses := make([]resolver.Address, len(entries))
	for i, e := range entries {
		addresses[i] = resolver.Address{
			Addr: connection.BuildAddr(e),
		}
	}

	err := c.cc.UpdateState(resolver.State{
		Addresses: addresses,
	})
	if err != nil {
		logrus.WithError(err).Error("update gRPC consul resolver addresses")
	}
}

func (c *consulResolver) watch(serviceName string, tags []string) error {
	var err error
	c.unwatchFunc, err = c.serviceWatcher.WatchService(serviceName, tags, true, c.onServiceChanged)
	if err != nil {
		return fmt.Errorf("watch service: %w", err)
	}

	return nil
}

// ResolveNow we don't need to anything here because all addresses are updated on change in consul.
func (c *consulResolver) ResolveNow(options resolver.ResolveNowOptions) {}

// Close is an interface method. Gracefully closes the consulResolver.
func (c *consulResolver) Close() {
	err := c.unwatchFunc()
	if err != nil {
		logrus.WithError(err).Error("unwatch service in gRPC consul resolver")
	}
}

In that case, we need to specify the custom host resolver on dialing:

import (
  ...
  "google.golang.org/grpc"
)
...

// Dial dials to a service through gRPC and returns a new connection.
func Dial(serviceName, tag string, timeout time.Duration) (grpc.ClientConnInterface, error) {
    cfg := consul.Config()
    target := fmt.Sprintf("%s:///%s/%s?tag=%s", resolverSchemeConsul, cfg.Address, serviceName, tag)

	return grpc.Dial(
		target,
		grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy": "round_robin"}`),
		grpc.WithResolvers(newConsulBuilder(consul, consul.ServiceWatcher())),
		grpc.WithInsecure(),
	)
}

The implementation works, but we wrote some tricky code and must maintain it. It looks like there is a better way to do that.

The undocumented, optimal solution

The official gRPC name resolution docs don’t contain Consul info, so I found a good solution accidentally while doing some job.

As it turned out, we don’t need a custom host resolver for Consul, and gRPC has the built integration.

To use it, we need to specify a dial target correctly:

import (
  ...
  "google.golang.org/grpc"
  _ "github.com/mbobakov/grpc-consul-resolver"
)
...

// Dial dials a service through gRPC and returns a new connection.
func Dial(serviceName, tag string, timeout time.Duration) (grpc.ClientConnInterface, error) {
    cfg := consul.Config()
    target := fmt.Sprintf("consul://%s:%s@%s/%s?tag=%s", cfg.User, cfg.Password, cfg.Address, serviceName, tag)

	return grpc.Dial(
		target,
		grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy": "round_robin"}`),
		grpc.WithInsecure(),
	)
}

The dialing target will be something like:

consul://user:[email protected]:8500/service_name?tag=service_tag .

And that’s it. No need to write any code.

Conclusion

I want to summarize some conclusions based on the story:

  • the first obvious solution from Google to a problem sometimes isn’t the best
  • if a lib is very popular, it can still have a lack of docs, so carefully explore the library internals before implementing your improvements
  • but it’s still OK to come up with a non-optimal solution at the first because all great products are built iteratively from scratch


Written by vgukasov | Software Engineer @ Amazon
Published by HackerNoon on 2021/12/09