Building a Conversational AI Chatbot With AWS Lambda Function and Amazon EFS

Amazon announced the general availability of AWS Lambda support for Amazon Elastic File System. Amazon EFS is a fully managed, elastic, shared file system and designed to be consumed by other AWS services.

With the release of Amazon EFS for Lambda, we can now easily share data across function invocations. It also opens new capabilities, such as building/importing large libraries and machine learning models directly into Lambda functions. Let’s go over how to build a serverless conversational AI chatbot using Lambda function and EFS.

In this post, we will:

Create an Amazon Elastic File System

Deploy and run a SageMaker notebook instance and Mount EFS to instance.

Download PyTorch libraries and ConvAI pre-trained model to EFS.

Add dialog history DynamoDB table and Gateway endpoint to save & retrieve conversation history.

Deploy a chatbot engine Lambda function and enable EFS for it.

Here’s the architecture diagram:

Creating an EFS file system

In this example we will use CloudFormation to create EFS and EFS access point, the configuration is defined as follows:

FileSystem: Type: AWS::EFS::FileSystem Properties: PerformanceMode: generalPurpose FileSystemTags: - Key: Name Value: fs-pylibs MountTargetA: Type: AWS::EFS::MountTarget Properties: FileSystemId: Ref: FileSystem SubnetId: " {{resolve:ssm:/root/defaultVPC/subsetA:1}} " SecurityGroups: - " {{resolve:ssm:/root/defaultVPC/securityGroup:1}} " MountTargetB: Type: AWS::EFS::MountTarget Properties: FileSystemId: Ref: FileSystem SubnetId: " {{resolve:ssm:/root/defaultVPC/subsetB:1}} " SecurityGroups: - " {{resolve:ssm:/root/defaultVPC/securityGroup:1}} " AccessPointResource: Type: "AWS::EFS::AccessPoint" DependsOn: FileSystem Properties: FileSystemId: !Ref FileSystem PosixUser: Uid: "1000" Gid: "1000" RootDirectory: CreationInfo: OwnerGid: "1000" OwnerUid: "1000" Permissions: "0777" Path: "/py-libs"

Note that we will use EFS General Purpose performance mode since it has lower latency than Max I/O.

Working with Amazon SageMaker

We will mount EFS on Amazon SageMaker on a SageMaker notebook and Install PyTorch and ConvAI model on EFS.

The notebook instance must have access to the same security group and reside in the same VPC as the EFS file system.

/py-libs to /home/ec2-user/SageMaker/libs directory: Let’s mount EFS pathtodirectory:

%%sh mkdir -p libs FILE_SYS_ID=fs-xxxxxx sudo mount -t nfs \ -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ $FILE_SYS_ID .efs.ap-southeast-2.amazonaws.com:/ \ libs cd libs && sudo mkdir -p py-libs cd .. && sudo umount -l /home/ec2-user/SageMaker/libs sudo mount -t nfs \ -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ $FILE_SYS_ID .efs.ap-southeast-2.amazonaws.com:/py-libs \ libs

Then, install PyTorch and simpletransformers to lib/py-libs directory:

!sudo pip --no-cache-dir install torch -t libs/py-libs !sudo pip --no-cache-dir install torchvision -t libs/py-libs !sudo pip --no-cache-dir install simpletransformers -t libs/py-libs

convai-model directory on EFS. Once we have all packages installed, download pre-trained model provided by Hugging Face, then extract the archive todirectory on EFS.

!sudo wget https://s3.amazonaws.com/models.huggingface.co/transfer-learning-chatbot/gpt_personachat_cache.tar.gz !sudo tar -xvf gpt_personachat_cache.tar.gz -C libs/convai-model !sudo chmod -R g+rw libs/convai-model

model.interact() We are now ready to talk to the pre-trained model, simply call

The pre-trained model provided by Hugging Face performs well out-of-the-box and will likely require less fine-tuning when creating chatbot.

We can see that python packages and model consumed from EFS correctly and we are able to start the conversation with pre-trained model.

Creating AWS DynamoDB table

to configure the DynamoDB table. Create a DialogHistory table to store dialog history with at least the last utterance from user. We can use sample CloudFormation templates to configure the DynamoDB table.

We have to create a even though the Lambda function is running inside a public subnet of a VPC. Please note thatWe have to create a VPC endpoint for DynamoDB even though the Lambda function is running inside a public subnet of a VPC.

Configuring AWS Lambda to use EFS

We will use AWS SAM to create Lambda functions and mount EFS access points to Lambda function.

First, create a Lambda function resource, then setup EFS File System for Lambda. Make sure that EFS and Lambda are in the same VPC:

HelloFunction: Type: AWS::Serverless::Function DependsOn: - LibAccessPointResource Properties: Environment: Variables: CHAT_HISTORY_TABLE: !Ref TableName Role: !GetAtt LambdaRole.Arn CodeUri: src/ Handler: api.lambda_handler Runtime: python3.6 FileSystemConfigs: - Arn: !GetAtt LibAccessPointResource.Arn LocalMountPath: "/mnt/libs" VpcConfig: SecurityGroupIds: - " {{resolve:ssm:/root/defaultVPC/securityGroup:1}} " SubnetIds: - " {{resolve:ssm:/root/defaultVPC/subsetA:1}} " - " {{resolve:ssm:/root/defaultVPC/subsetB:1}} " LambdaRole: Type: AWS::IAM::Role Properties: RoleName: "efsAPILambdaRole" AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - "lambda.amazonaws.com" Action: - "sts:AssumeRole" ManagedPolicyArns: - "arn:aws:iam::aws:policy/AWSLambdaExecute" - "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole" - "arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess" Policies: - PolicyName: "efsAPIRoleDBAccess" PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - "dynamodb:PutItem" - "dynamodb:GetItem" - "dynamodb:UpdateItem" - "dynamodb:DeleteItem" - "dynamodb:Query" - "dynamodb:Scan" Resource: - !GetAtt ChatHistory.Arn - Fn: :Join: - "/" - - !GetAtt ChatHistory.Arn - "*" - Effect: Allow Action: - "ssm:GetParameter*" Resource: - !Sub "arn:${AWS::Partition}:ssm:${AWS::Region}:${AWS::AccountId}:parameter/root/defaultVPC*"

Adding the conversation engine: AWS Lambda

In this section, we will create a Lambda function for communication between users and conversation AI model.

src/api.py : We will contain the following source code in

import json import logging import sys import boto3 import random import os sys.path.insert( 1 , '/mnt/libs/py-libs' ) import torch import torch.nn.functional as F from simpletransformers.conv_ai.conv_ai_utils import get_dataset from simpletransformers.conv_ai import ConvAIModel def get_chat_history (userid) : response = dynamodb.get_item(TableName=TABLE_NAME, Key={ 'userid' : { 'S' : userid }}) if 'Item' in response: return json.loads(response[ "Item" ][ "history" ][ "S" ]) return { "history" : []} def save_chat_history (userid, history) : return dynamodb.put_item(TableName=TABLE_NAME, Item={ 'userid' : { 'S' : userid}, 'history' : { 'S' : history}}) def lambda_handler (event, context) : try : userid = event[ 'userid' ] message = event[ 'message' ] history = get_chat_history(userid) history = history[ "history" ] response_msg = interact(message, convAimodel, character, userid, history) return { 'message' : json.dumps(response_msg) } except Exception as ex: logging.exception(ex)

sample_sequenc in conv_ai : Note that simpletransformers library allows us to interact with the models locally with input(). To build our chat engine, we need to override the default method interact andin

def sample_sequence (aiCls, personality, history, tokenizer, model, args, current_output=None) : special_tokens_ids = tokenizer.convert_tokens_to_ids(SPECIAL_TOKENS) if current_output is None : current_output = [] for i in range(args[ "max_length" ]): instance = aiCls.build_input_from_segments( personality, history, current_output, tokenizer, with_eos= False ) input_ids = torch.tensor( instance[ "input_ids" ], device=aiCls.device).unsqueeze( 0 ) token_type_ids = torch.tensor( instance[ "token_type_ids" ], device=aiCls.device).unsqueeze( 0 ) logits = model(input_ids, token_type_ids=token_type_ids) if isinstance(logits, tuple): # for gpt2 and maybe others logits = logits[ 0 ] logits = logits[ 0 , -1 , :] / args[ "temperature" ] logits = aiCls.top_filtering( logits, top_k=args[ "top_k" ], top_p=args[ "top_p" ]) probs = F.softmax(logits, dim= -1 ) prev = torch.topk(probs, 1 )[ 1 ] if args[ "no_sample" ] else torch.multinomial(probs, 1 ) if i < args[ "min_length" ] and prev.item() in special_tokens_ids: while prev.item() in special_tokens_ids: if probs.max().item() == 1 : logging.warn( "Warning: model generating special token with probability 1." ) break # avoid infinitely looping over special token prev = torch.multinomial(probs, num_samples= 1 ) if prev.item() in special_tokens_ids: break current_output.append(prev.item()) return current_output def interact (raw_text, model, personality, userid, history) : args = model.args tokenizer = model.tokenizer process_count = model.args[ "process_count" ] model._move_model_to_device() if not personality: dataset = get_dataset( tokenizer, None , args[ "cache_dir" ], process_count=process_count, proxies=model.__dict__.get( "proxies" , None ), interact= True , ) personalities = [dialog[ "personality" ] for dataset in dataset.values() for dialog in dataset] personality = random.choice(personalities) else : personality = [tokenizer.encode(s.lower()) for s in personality] history.append(tokenizer.encode(raw_text)) with torch.no_grad(): out_ids = sample_sequence( model, personality, history, tokenizer, model.model, args) history.append(out_ids) history = history[-( 2 * args[ "max_history" ] + 1 ):] out_text = tokenizer.decode(out_ids, skip_special_tokens= True ) save_chat_history(userid, json.dumps({ "history" : history})) return out_text

Deploying the chatbot service

We are almost there! Now we have to deploy our bot. Run the following command to deploy:

From the output above, we can see the chatbot is now deployed.

Now it’s time to test our bot. Go to CloudFormation Resources list in the AWS Management Console to find the Lambda function name and Invoke lambda function using following command:

$aws lambda invoke --function-name "chat-efs-api-HelloFunction-KQSNKF5K0IY8" out --log-type Tail \--query 'LogResult' --output text | base64 -d

The output will look like below:

Here is an example of dialog:

>>hi there how are you? >>good, thank you what do you like to do for fun ? >> I like reading, yourself? i like to listen to classical music ......

It works! As we can see from above screenshot, a chatbot returns response based on input from a user.

cold starts to complete. To prevent cold start in our Lambda functions, we can use to keep functions warm: However, I am aware of the impact of cold starts on response times. The first request took ~30 secs forto complete. To prevent cold start in our Lambda functions, we can use Provisioned Concurrency to keep functions warm:

As a result, the latency of warmed-up function is reduced to ~3 seconds:

That’s it! I hope you have found this article useful, The source code for this post can be found in my GitHub repo

