This is basically the : I wanted and guide I wish I had what why. What With the rise of Google's approach, the concept of "Zero Trust" brought the Identity Aware Proxy to the world. beyond-corp In a nutshell, internal resources or tools sit in private inaccessible areas of the cloud, while a reverse proxy on top of them offers access to permitted users only. The authentication often relies on an OAuth2 provider, but any kind of user directory does the trick. Why Rarely do you enjoy a combination of more than two of these: Security. Quality user experience. Ease of management/maintenance. The magic of “Zero Trust” lies in benefiting from all of the above with the same solution. Security The key feature in an identity-aware proxy is the redundancy of VPN servers. VPN by its nature offers a single point of access to the internal network. Once authenticated, the user has the keys to the kingdom; all internal systems are reachable. In some cases, when is correctly implemented, authentication is still in the way and protects user access. RBAC However, the system is still accessible network-wise, making it susceptible to scans and attacks that might bypass the standard access point. With a reverse proxy, before the request is authorized, all access is blocked as the routing didn't take place. The request was blocked before being rerouted forward. Another aspect of security comes from the fact that the user has one identity source to manage. If has been enabled ( ), it stands for all future user authentication methods. More on that on the ease of access, and, ease of management. MFA / 2FA and it should!!! One clarification before moving on though; this is not to say that VPNs are in the past, or that's something is broken with the concept of having one. Deployed correctly, VPN servers are incredible at what they do. They're ubiquitous for a reason. That said, most implementations lack basic access control, and those that do, more often than not, do not monitor internal queries once a user has been authenticated. In some cases, there are usually not too many alternatives. But for others - e.g. back-office web services, we can do better. Ease of Access This is the more straightforward aspect of things; a user with the organization directory only has to manage one identity. Assuming the user is keeping their passwords safe with something like , and a mandatory , it makes everyone's life easier. 1Password 2 step verification The cookie generated from one successful authentication can be used to access all other systems under the proxy (given the user is permitted to do so). Ease of Management One of the key pain points engineering / IT / Ops teams struggle with is user management when multiple systems and tools are involved in the development. Instead of managing a growing number of directories and user sets, offers a single point of authentication, allowing the management of only one directory. SSO All that's left is to integrate SSO with the identity-aware proxy to leverage both a single access point and the reuse of a cookie. to the rescue! SSO While is simply wonderful in terms of concept and implementation, its docs are somewhat for the lack of a better word. The issue amplifies when trying to deploy on ECS Fargate. (Somewhat surprising given the nature of Buzzfeed's workload on ECS but 🤷). Buzzfeed's SSO incomprehensive The How With that in mind, here's a guide / better-docs for implementing a "Zero-trust proxy with SSO and cookie reuse on ECS Fargate" ... Probably the world record for the single-line title with most buzzwords ever Buzzfeed's SSO is an implementation of two proxy entities, one service as a proxy to underlying systems and the other as an auth provider. The reason for the local auth provider is being able to serve one login for all systems, instead of having to re-authenticate with every different upstream. This is a cookie-reuse for the same purpose, which is a super elegant solution on behalf of Buzzfeed. Here's a complex, yet comprehensive visual diagram of the system As said, the system comprises of a proxy and an auth provider, namely and (or in short). Both systems are configured by a set of environment variables, where the backend routing is described in an . With growing usage, additional features, switches, parameters, and updates the project kind of grew out of basic understandable configuration docs. This is why we're here today. sso-proxy sso-authenticator sso-auth upstream Yaml config file The Proxy This is the entity that serves as the front gate for incoming requests. If the incoming request is identified as valid, the request goes through and is routed based on . Otherwise, requests are redirected to the authenticator. upstream_configs.yml I've chosen to build the container image with the environment wrapping Buzzfeed's image. This is for convenience only and can be shifted to any other method. Building the image with docker build --build-arg client_id=xxx client_secret=xxx ... FROM buzzfeed/sso ARG client_id \ client_secret \ session_cookie_secret ENV UPSTREAM_CONFIGFILE="/sso/upstream_configs.yml" \ UPSTREAM_CLUSTER="" \ PROVIDER_URL_EXTERNAL="https://sso-auth.domain.co" \ CLIENT_ID=$client_id \ CLIENT_SECRET=$client_secret \ SESSION_COOKIE_SECRET=$session_cookie_secret \ SESSION_TTL_LIFETIME="1h" \ UPSTREAM_SCHEME=http COPY ./upstream_config.yml /sso/upstream_configs.yml ENTRYPOINT ["/bin/sso-proxy"] The Authenticator Receiving unauthenticated requests from the proxy, the authenticator is in charge of contacting the OAuth2 provider for authorization. According to configuration, if the requesting user has the relevant permissions i.e. authorized domain, correct sub-group, and so on, a cookie is set and redirected to the proxy, which in turn lets it through. Built-in the same manner as its twin, the authenticator is based on the same image, only uses a different and a different set of configuration variables: ENTRYPOINT FROM buzzfeed/sso ARG client_id \ client_secret \ session_cookie_secret \ session_key ENV AUTHORIZE_EMAIL_DOMAINS=domain.co \ AUTHORIZE_PROXY_DOMAINS=domain.co \ SERVER_HOST=sso-auth.domain.co \ CLIENT_PROXY_ID=$client_id \ CLIENT_PROXY_SECRET=$client_secret \ SESSION_COOKIE_SECURE=true \ SESSION_COOKIE_SECRET=$session_cookie_secret \ SESSION_COOKIE_EXPIRE=1h \ SESSION_KEY=$session_key \ PROVIDER_X_CLIENT_ID=$client_id \ PROVIDER_X_CLIENT_SECRET=$client_secret \ PROVIDER_X_TYPE=google \ PROVIDER_X_SLUG=google \ PROVIDER_X_GOOGLE_IMPERSONATE=admin@domain.co \ PROVIDER_X_GOOGLE_CREDENTIALS=/sso/credentials.json \ PROVIDER_X_GROUPCACHE_INTERVAL_REFRESH=1m \ PROVIDER_X_GROUPCACHE_INTERVAL_PROVIDER=1m \ LOGGING_LEVEL=debug COPY ./credentials.json /sso/credentials.json EXPOSE 4180 ENTRYPOINT ["/bin/sso-auth"] OAuth2 Provider of choice - Google Not much to elaborate on Google. Its workspace directory offers a wide range of user management features and is considered a standard choice. Specifically, Buzzfeed's SSO offers either Google or Okta as providers. If these are not the way your organization is managing users this post might be irrelevant for the most part. You can, however, make use of the underlying system - , which will provide a similar experience, except the solution of a single locally managed authentication mechanism. Instead of having two components, the proxy is one system that operates against the backend provider. OAuth2-proxy Instructions (although far from perfect) can be found . Important notes: here Please do go through all steps, even if some seem unnecessary; like not following step three and forward if group segregation is not a requirement. Do go all the way through it, and make sure you get the .json file at the end. Read carefully, make sure is enabled as shown admin SDK Make sure that from Google's workspace security side, the API controls Make sure the is set with an admin user PROVIDER_X_GOOGLE_IMPERSONATE=admin@domain.co Upstreams Upstreams are a configuration file where proxy routes are set. They take a public request from the web, and, if authenticated is routed to an internal (or not) service given some conditions are met. The configuration below describes two services and their internal routes: - service: vault default: from: vault.sso.domain.co to: vault.local:8200 options: allowed_groups: - production@domain.co - service: snappass default: from: secrets.sso.domain.co to: secrets.local:5000 options: allowed_email_domains: - domain.co Notes: Each entry have a setting, other custom settings can follow must default & are the base, while are optional one of the are set. from to options only if top three I've learned this one the hard way... Note how or are set and sufficient on their own. The first is permitting access for a specific directory group, while the latter offers domain-wide access with allowed_groups allowed_email_domains The entire set of can be found options here Deployment The two services above should be deployed together in an internal network just as any other service would. The best practice here dictates a load balancer on top, to route traffic into the proxy/authenticator. The only thing to consider here is the of upstream services; a system that's considered "internal" and will be accessed through the proxy has to be accessible from the proxy itself. On the network level, this means that they either have to sit in the same virtual private network or have a between the networks. From the proxy's perspective, the IP the port should be in reach and open. "Open" also means that they would be part of the same security group, or open a rule in the respective groups to be able to provide back and forth communication. reachability peering and ECS and Service Discovery Once the proxy is set, the user gains access, they should be able to communicate with endpoints only accessible internally within the private network, the . While we can redirect the request to an IP, those tend to change and the connection is then lost. An improvement can be an elastic IP that's guaranteed to stay fixed with the resource it was attached to. This brings a few new issues though; a. the resource itself can (and should) be rotated within time - we are dealing with containers after all. b. Elastic IPs will once their underlying resource no longer exists. Another improvement might be a human-readable DNS A record that's pointed at the same IP. Still, being readable and all, the inherent problem with a static IP remains. VPC start being charged The solution - . Put simply, the service discovery service attaches itself to an ECS target group, updating the live running tasks underneath with an internal endpoint managed by Route53. Meaning, the user can hit the same endpoint and trust it to resolve to a dynamic IP that's connected to an existing task. AWS Service Discovery Here's a quick example using Terraform code (out of convenience only - this can be set manually or using any other language): A global resource of private dns namespace has to be created first before it can host internal records: <img src="{{ site.url }}{{ site.baseurl }}/assets/images/tf-carbon-dns.png"/> After the namespace is ready, we can start creating private records, note the reference to the namespace inside : dns_config <img src="{{ site.url }}{{ site.baseurl }}/assets/images/tf-carbon-discovery.png"/> And lastly, connecting the aws_ecs_service resource to the service registry: <img src="{{ site.url }}{{ site.baseurl }}/assets/images/tf-carbon-registries.png"/> Alternatives The solution above is no the only one for zero trust solution. There are plenty out there, including as well as many OSS alternatives. Buzzfeed's solution was the choice here for its elegance by remaining a one-stop authentication system that builds on top of Google's OAuth2 solution. Any kind of solution will usually do pretty much the same work, and as long as the concept of security is kept, the rest is implementation details. commercial ones Extending the power of Zero Trust Having the system deployed is a great solution for all things web. But sometimes real-life pop the security bubble; in some cases engineers will need to gain access to private resources by SSH, or other protocols (working against a Redis or a PostgreSQL instance). While Working directly against a resource in its protocol is not achievable, we can extend the reach by deploying web interfaces for , -like applications. SSH PGAdmin I would address the engineering culture of accessing private resources in the first place, and how to create a workflow that removes the necessity of such operations, but that's material for another post. In the meantime, let's leave the notion of avoiding it when you can. And if you're serious, you may consider rotating those accessed instances altogether, marking them as once they've been SSH'ed into and automating their removal. Food for thought... contaminated I hope this post helped with grasping the concept of zero trust and real-world implementation. This is something I struggled with when I tried incorporating it in our workflow, so I'm happy to share the experience and the "how-to". If you find any mistakes in the information above, have any questions or comments please ! reach out Thank you for reading 🖤