Amazon Elasticsearch Service recently added support for k-nearest neighbor search. It enables you to run high scale and low latency k-NN search across thousands of dimensions with the same ease as running any regular Elasticsearch query. k-NN similarity search is powered by Open Distro for Elasticsearch, an Apache 2.0-licensed distribution of Elasticsearch. In this post, I’ll show you how to build a scalable similarity questions search api using Amazon Sagemaker, Amazon Elasticsearch, Amazon Elastic File System (EFS) and Amazon ECS. What we’ll cover in this example: Deploy and run a Sagemaker notebook instance in VPC. Mount EFS to notebook instance. Download Quora Question Pairs dataset, then map variable-length questions from dataset to fixed-length vectors using DistilBERT model. Create downstream task to reduce embedding dimensions and save sentence embedder to EFS. Transform questions text to vectors, and index all vectors to Elasticsearch. Deploy a containerized Flask rest api to ECS. The following diagram shows the architecture of above steps: Deploying and running Sagemaker notebook instance in VPC First, Let’s create a Sagemaker notebook instance which connects to Elasticsearch and make sure then are in the same VPC. To configure VPC options in within section of page, set VPC network configuration details such as VPC subnet IDs and security group IDs: Sagemaker console, Network Create notebook instance Mounting EFS to notebook instance We will do all the necessary sentence transforming steps in a (code found ). SageMaker notebook here Now, mounting EFS to directory, for more details about EFS, please check . model AWS official document %%sh mkdir model sudo mount -t nfs \ -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ fs-xxxxxx.efs.ap-southeast-2.amazonaws.com:/ \ ./model : NOTE is DNS name of EFS. fs-xxxxx.efs.ap-southeast-2.amazonaws.com EFS Mount targets and Sagemaker are in same VPC. Mapping variable-length questions to fixed-length vectors using DistilBERT model To run nearest neighbor search, we have to get sentence and tokens embeddings. We can use which is a sentence embeddings using BERT / RoBERTa / DistilBERT / ALBERT / XLNet with PyTorch. It allows us to map sentences into fixed-length representations in just a few lines of code. sentence-transformers We will use light weight model to generate sentence embeddings in this example, please note that the number of hidden units of is 768. This dimension seems too big to Elasticsearch index, we can reduce the dimension to 256 by adding a dense layer after the pooling: DistilBERT DistilBERT sentence_transformers models, losses, SentenceTransformer word_embedding_model = models.DistilBERT( ) pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens= , pooling_mode_cls_token= , pooling_mode_max_tokens= ) dense_model = models.Dense(in_features= , out_features= ) transformer = SentenceTransformer(modules=[word_embedding_model, pooling_model, dense_model]) from import 'distilbert-base-uncased' True False False # reduce dim from 768 to 256 768 256 Next, Save sentence embedder to EFS mounted directory: transformer.save( ) "model/transformer-v1/" We need to make sure dataset has been downloaded, the data set for this example is . quora question paris datas pandas pd kaggle.api.kaggle_api_extended KaggleApi api = KaggleApi() api.authenticate() api.dataset_download_files( , path= , unzip= ) import as from import "quora/question-pairs-dataset" 'quora_dataset' True Next, extract full text of each question to dataframe: pandas pd pd.set_option( , ) df = pd.read_csv( , usecols=[ , ], index_col= ) df = df.sample(frac= ).reset_index(drop= ) df_questions_imp = df[: ] import as 'display.max_colwidth' -1 "quora_dataset/questions.csv" "qid1" "question1" False 1 True 5000 Transforming questions text to vectors, and index all vectors to Elasticsearch To start with, create a kNN index, boto3 requests_aws4auth AWS4Auth elasticsearch Elasticsearch, RequestsHttpConnection region = service = ssm = boto3.client( , region_name=region) es_parameter = ssm.get_parameter(Name= ) es_host = es_parameter[ ][ ] credentials = boto3.Session().get_credentials() awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token) es = Elasticsearch( hosts=[{ : es_host, : }], http_auth=awsauth, use_ssl= , verify_certs= , connection_class=RequestsHttpConnection ) knn_index = { : { : }, : { : { : { : , : } } } } es.indices.create(index= ,body=knn_index,ignore= ) import from import from import 'ap-southeast-2' 'es' 'ssm' '/KNNSearch/ESUrl' 'Parameter' 'Value' 'host' 'port' 443 True True "settings" "index.knn" True "mappings" "properties" "question_vector" "type" "knn_vector" "dimension" 256 "questions" 400 then transform and index question vectors to Elasticsearch. index, row df.iterrows(): vectors = local_transformer.encode([row[ ]]) es.index(index= , id=row[ ], body={ : vectors[ ].tolist(), : row[ ]}) es_import(df_questions_imp) : def es_import (df) for in "question1" 'questions' "qid1" "question_vector" 0 "question" "question1" Questions in Elasticsearch have the following structure: {'question_vector': [ , ... , ], 'question': 'How hard is it to learn to play piano as an adult?'} -0.06435434520244598 0.0726890116930008 We have embedded questions into fixed-length vectors and indexed all of vectors to Elasticsearch. Let’s create a rest api connects to Elasticsearch and test it out! Deploying a containerized Flask rest API We will use sample cloud formation template to create ECS Cluster and service in VPC (templates and bash scripts found ). here We will use , search flow is EFS volumes with ECS Flask application loads saved sentence embedder in EFS volume, transform input parameter sentence to vectors, then query K-Nearest neighbors in Elasticsearch. json boto3 flask Flask flask_restful reqparse, Resource, Api elasticsearch Elasticsearch, RequestsHttpConnection requests_aws4auth AWS4Auth sentence_transformers SentenceTransformer app = Flask(__name__) api = Api(app) region = ssm = boto3.client( , region_name=region) es_parameter = ssm.get_parameter(Name= ) host = es_parameter[ ][ ] service = credentials = boto3.Session().get_credentials() awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token) parser = reqparse.RequestParser() parser.add_argument( ) parser.add_argument( ) parser.add_argument( ) es = Elasticsearch( hosts=[{ : host, : }], http_auth=awsauth, use_ssl= , verify_certs= , connection_class=RequestsHttpConnection ) transform_model = SentenceTransformer( ) args = parser.parse_args() sentence_embeddings = transform_model.encode([args [ ]]) res = es.search(index= , body={ : args.get( , ), : { : [ ] }, : args.get( , ), : { : { : { : sentence_embeddings[ ].tolist(), : args.get( , ) } } } }) res, api.add_resource(SimilarQuestionList, ) __name__ == : app.run(debug= , host= , port= ) import import from import from import from import from import from import 'ap-southeast-2' 'ssm' '/KNNSearch/ESUrl' 'Parameter' 'Value' 'es' 'question' 'size' 'min_score' 'host' 'port' 443 True True 'model/transformer-v1/' : class SimilarQuestionList (Resource) : def post (self) "question" "questions" "size" "size" 5 "_source" "exclude" "question_vector" "min_score" "min_score" 0.3 "query" "knn" "question_vector" "vector" 0 "k" "size" 5 return 201 '/search' if '__main__' True '0.0.0.0' 8000 We now have flask api running in ECS container, let’s use the basic search function to find similar questions when it comes to the query: “ ?”: What is best way to make money online $ curl --data --data --data -X POST http://knn -publi-xxxx-207238135.ap-southeast-2.elb.amazonaws.com/search 'question=What is best way to make money online?' 'size=5' 'min_score=0.3' -s Check out the result: { : , : , : { : , : , : , : }, : { : { : , : }, : , : [ { : , : , : , : , : { : } }, { : , : , : , : , : { : } }, { : , : , : , : , : { : } }, { : , : , : , : , : { : } }, { : , : , : , : , : { : } } ] } } "took" 10 "timed_out" false "_shards" "total" 5 "successful" 5 "skipped" 0 "failed" 0 "hits" "total" "value" 15 "relation" "eq" "max_score" 0.69955945 "hits" "_index" "questions" "_type" "_doc" "_id" "210905" "_score" 0.69955945 "_source" "question" "What is an easy way make money online?" "_index" "questions" "_type" "_doc" "_id" "547612" "_score" 0.61820024 "_source" "question" "What is the best way to make passive income online?" "_index" "questions" "_type" "_doc" "_id" "1891" "_score" 0.5624176 "_source" "question" "What are the easy ways to earn money online?" "_index" "questions" "_type" "_doc" "_id" "197580" "_score" 0.46031988 "_source" "question" "What is the best way to download YouTube videos for free?" "_index" "questions" "_type" "_doc" "_id" "359930" "_score" 0.45543614 "_source" "question" "What is the best way to get traffic on your website?" As you can see, the result was pretty amazing, you can also fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings for k-NN search. Great! We have what we need! I hope you have found this article useful. The complete scripts can be found in my . GitHub repo