As part of my job, I advise a lot of customers who use AWS. Many of them build complex systems on top of AWS services. AWS OpenSearch is typically included in the system landscape to cover a wide range of different business cases, some of which require an Internet-accessible cluster with HTTP basic authentication. AWS does not give this capability out of the box; thus, in this article, I will demonstrate how to leverage AWS native services to publish your OpenSearch cluster to the Internet.
OpenSearch cluster can be deployed in two modes with the following features
VPC based |
Public domain |
---|---|
Only private endpoint in the following format |
Public endpoint in the following format |
Cluster health tab does not include shard information, and the Indices tab isn't present. |
All information about cluster health and indexes are available |
Security is based on AWS security groups |
Security is based on IP-based access policies |
Dashboards are available only from VPC and requires VPN or something else for being accessible |
Dashboards are available publicly for IP range that’s configured via IP-based access policies |
AWS currently does not provide any out-of-the-box way for publishing an OpenSearch cluster with basic HTTP authentication to the Internet for any IP range (e.g.: 0.0.0.0/0
). It is possible to configure an AWS IAM Policy to allow a certain IP range or to utilize a publicly open cluster with SAML authorization; however, this is not suitable for cases such as mobile clients or TV set-top boxes, where the client can have any unpredictable IP address and dev team does not have the capacity for SAML implementation.
Maybe in the future, there will be something like an RDS proxy, but so far, AWS has no public plans for this feature.
The AWS documentation recommends using a reverse proxy based on a highly available EC2 Nginx cluster. It's a good approach, but it requires a lot of extra effort for Nginx maintenance, which would be inconvenient in the case of a large number of OpenSearch clusters. In addition, as AWS does not provide highly available Nginx as a service, you will be unable to configure your infrastructure using a single tool such as Terraform, Pulumi, or CloudFormation.
My approach, which I recommend for use in production environments, is based on AWS Application load balancer (ALB). Nginx can be skipped because ALB provides the same functionality as Nginx for all basic scenarios and requires no additional configuration or automation work.
AWS ALB points to an IP address target group that contains a set of private OpenSearch IP addresses, and the OpenSearch cluster should be configured in VPC mode. It is also important to configure HTTP redirects at the ALB level in this configuration, as seen in the image below.
ALB targets should also require the configuration of health checks because without them, OpenSearch IP addresses will be removed from IP pools. Keep in mind that the 401
http code must be added to the list of successful health check codes when you are using basic HTTP authentication.
Configuring the ALB target group once is insufficient. Any OpenSearch cluster change can cause the IP address to change, so to ensure that the target groups contain all OpenSearch IPs at all times, the IP addresses pool should be maintained by a simple Lambda function that runs once per hour and verifies that all OpenSearch IPs are present and actual. The complete AWS function code is provided below.
import boto3
import socket
def lambda_handler(event, context):
# Initialize AWS clients
elasticsearch = boto3.client('es')
elbv2 = boto3.client('elbv2')
# Retrieve target group ARN and OpenSearch domain name from event
target_group_arn = event.get('targetGroupARN')
open_search_domain_name = event.get('openSearchDomainName')
# Describe OpenSearch cluster to retrieve DNS endpoint
response = elasticsearch.describe_elasticsearch_domain(DomainName=open_search_domain_name)
opensearch_dns = response['DomainStatus']['Endpoints']['vpc']
# Resolve DNS to get IP addresses
opensearch_ips = socket.gethostbyname_ex(opensearch_dns)[2]
print(f"OpenSearch IPs are: {opensearch_ips}")
# Describe existing IPs in the EC2 target group
target_group_info = elbv2.describe_target_health(TargetGroupArn=target_group_arn)
existing_ips = [target['Target']['Id'] for target in target_group_info['TargetHealthDescriptions']]
# Compare IPs to determine new ones and remove old ones
ips_to_add = list(set(opensearch_ips) - set(existing_ips))
ips_to_remove = list(set(existing_ips) - set(opensearch_ips))
# Update the target group
if ips_to_add:
elbv2.register_targets(TargetGroupArn=target_group_arn, Targets=[{'Id': ip} for ip in ips_to_add])
if ips_to_remove:
elbv2.deregister_targets(TargetGroupArn=target_group_arn, Targets=[{'Id': ip} for ip in ips_to_remove])
return {
'statusCode': 200,
'body': 'Updated EC2 target group with OpenSearch IPs.'
}
The lambda function can be deployed in VPC mode. It’s more secure but not mandatory. In the beginning, you may provide only es:*
permissions for your Lambda’s IAM role, as shown in the code snippet below.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::123456789012:user/test-user"
]
},
"Action": [
"es:*"
],
"Resource": "arn:aws:es:us-west-1:987654321098:domain/test-domain/*"
}
]
}
For a variety of reasons, connecting to an OpenSearch cluster with a floating IP address is inconvenient. All modern architectures use DNS to hide a pure IP address. We'll do the same via AWS Route53.
Unfortunately, the Route53 DNS record is insufficient because ALB would display problems linked to HTTP certificate mismatches. We must generate a valid TLS certificate and attach it to an ALB listener to avoid HTTPS connection issues while using DNS. AWS Certificate Manager can be used to issue a purchased certificate, but I prefer free Let's Encrypt certificates. AWS does not natively support Let's Encrypt and Certificate Manager integration, so I created an AWS Lambda that automatically issues and renews TLS certificates a long time ago and proposes using it in the suggested design. This Lambda's sources may be found at https://github.com/kvendingoldo/aws-letsencrypt-lambda.
Security highlights
Security is not covered in this article, but I can not highlight two critical points: