Amazon announced the general availability of AWS Lambda support for Amazon Elastic File System. Amazon EFS is a fully managed, elastic, shared file system and designed to be consumed by other AWS services.
With the release of Amazon EFS for Lambda, we can now easily share data across function invocations. It also opens new capabilities, such as building/importing large libraries and machine learning models directly into Lambda functions. Let’s go over how to build a serverless conversational AI chatbot using Lambda function and EFS.
In this post, we will:
Here’s the architecture diagram:
In this example we will use CloudFormation to create EFS and EFS access point, the configuration is defined as follows:
FileSystem:
Type: AWS::EFS::FileSystem
Properties:
PerformanceMode: generalPurpose
FileSystemTags:
- Key: Name
Value: fs-pylibs
MountTargetA:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId:
Ref: FileSystem
SubnetId: "{{resolve:ssm:/root/defaultVPC/subsetA:1}}"
SecurityGroups:
- "{{resolve:ssm:/root/defaultVPC/securityGroup:1}}"
MountTargetB:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId:
Ref: FileSystem
SubnetId: "{{resolve:ssm:/root/defaultVPC/subsetB:1}}"
SecurityGroups:
- "{{resolve:ssm:/root/defaultVPC/securityGroup:1}}"
AccessPointResource:
Type: "AWS::EFS::AccessPoint"
DependsOn: FileSystem
Properties:
FileSystemId: !Ref FileSystem
PosixUser:
Uid: "1000"
Gid: "1000"
RootDirectory:
CreationInfo:
OwnerGid: "1000"
OwnerUid: "1000"
Permissions: "0777"
Path: "/py-libs"
Note that we will use EFS General Purpose performance mode since it has lower latency than Max I/O.
We will mount EFS on Amazon SageMaker on a SageMaker notebook and Install PyTorch and ConvAI model on EFS.
The notebook instance must have access to the same security group and reside in the same VPC as the EFS file system.
Let’s mount EFS path
/py-libs
to /home/ec2-user/SageMaker/libs
directory:%%sh
mkdir -p libs
FILE_SYS_ID=fs-xxxxxx
sudo mount -t nfs \
-o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
$FILE_SYS_ID.efs.ap-southeast-2.amazonaws.com:/ \
libs
cd libs && sudo mkdir -p py-libs
cd .. && sudo umount -l /home/ec2-user/SageMaker/libs
sudo mount -t nfs \
-o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
$FILE_SYS_ID.efs.ap-southeast-2.amazonaws.com:/py-libs \
libs
Then, install PyTorch and simpletransformers to lib/py-libs directory:
!sudo pip --no-cache-dir install torch -t libs/py-libs
!sudo pip --no-cache-dir install torchvision -t libs/py-libs
!sudo pip --no-cache-dir install simpletransformers -t libs/py-libs
Once we have all packages installed, download pre-trained model provided by Hugging Face, then extract the archive to
convai-model
directory on EFS.!sudo wget https://s3.amazonaws.com/models.huggingface.co/transfer-learning-chatbot/gpt_personachat_cache.tar.gz
!sudo tar -xvf gpt_personachat_cache.tar.gz -C libs/convai-model
!sudo chmod -R g+rw libs/convai-model
We are now ready to talk to the pre-trained model, simply call
model.interact()
The pre-trained model provided by Hugging Face performs well out-of-the-box and will likely require less fine-tuning when creating chatbot.
We can see that python packages and model consumed from EFS correctly and we are able to start the conversation with pre-trained model.
Create a DialogHistory table to store dialog history with at least the last utterance from user. We can use sample CloudFormation templates to configure the DynamoDB table.
Please note that We have to create a VPC endpoint for DynamoDB even though the Lambda function is running inside a public subnet of a VPC.
We will use AWS SAM to create Lambda functions and mount EFS access points to Lambda function.
First, create a Lambda function resource, then setup EFS File System for Lambda. Make sure that EFS and Lambda are in the same VPC:
HelloFunction:
Type: AWS::Serverless::Function
DependsOn:
- LibAccessPointResource
Properties:
Environment:
Variables:
CHAT_HISTORY_TABLE: !Ref TableName
Role: !GetAtt LambdaRole.Arn
CodeUri: src/
Handler: api.lambda_handler
Runtime: python3.6
FileSystemConfigs:
- Arn: !GetAtt LibAccessPointResource.Arn
LocalMountPath: "/mnt/libs"
VpcConfig:
SecurityGroupIds:
- "{{resolve:ssm:/root/defaultVPC/securityGroup:1}}"
SubnetIds:
- "{{resolve:ssm:/root/defaultVPC/subsetA:1}}"
- "{{resolve:ssm:/root/defaultVPC/subsetB:1}}"
LambdaRole:
Type: AWS::IAM::Role
Properties:
RoleName: "efsAPILambdaRole"
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: "Allow"
Principal:
Service:
- "lambda.amazonaws.com"
Action:
- "sts:AssumeRole"
ManagedPolicyArns:
- "arn:aws:iam::aws:policy/AWSLambdaExecute"
- "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
- "arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess"
Policies:
- PolicyName: "efsAPIRoleDBAccess"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "dynamodb:PutItem"
- "dynamodb:GetItem"
- "dynamodb:UpdateItem"
- "dynamodb:DeleteItem"
- "dynamodb:Query"
- "dynamodb:Scan"
Resource:
- !GetAtt ChatHistory.Arn
- Fn::Join:
- "/"
- - !GetAtt ChatHistory.Arn
- "*"
- Effect: Allow
Action:
- "ssm:GetParameter*"
Resource:
- !Sub "arn:${AWS::Partition}:ssm:${AWS::Region}:${AWS::AccountId}:parameter/root/defaultVPC*"
Adding the conversation engine: AWS Lambda
In this section, we will create a Lambda function for communication between users and conversation AI model.
We will contain the following source code in
src/api.py
:import json
import logging
import sys
import boto3
import random
import os
sys.path.insert(1, '/mnt/libs/py-libs')
import torch
import torch.nn.functional as F
from simpletransformers.conv_ai.conv_ai_utils import get_dataset
from simpletransformers.conv_ai import ConvAIModel
def get_chat_history(userid):
response = dynamodb.get_item(TableName=TABLE_NAME, Key={
'userid': {
'S': userid
}})
if 'Item' in response:
return json.loads(response["Item"]["history"]["S"])
return {"history": []}
def save_chat_history(userid, history):
return dynamodb.put_item(TableName=TABLE_NAME, Item={'userid': {'S': userid}, 'history': {'S': history}})
def lambda_handler(event, context):
try:
userid = event['userid']
message = event['message']
history = get_chat_history(userid)
history = history["history"]
response_msg = interact(message, convAimodel,
character, userid, history)
return {
'message': json.dumps(response_msg)
}
except Exception as ex:
logging.exception(ex)
Note that simpletransformers library allows us to interact with the models locally with input(). To build our chat engine, we need to override the default method interact and
sample_sequenc
in conv_ai
:def sample_sequence(aiCls, personality, history, tokenizer, model, args, current_output=None):
special_tokens_ids = tokenizer.convert_tokens_to_ids(SPECIAL_TOKENS)
if current_output is None:
current_output = []
for i in range(args["max_length"]):
instance = aiCls.build_input_from_segments(
personality, history, current_output, tokenizer, with_eos=False)
input_ids = torch.tensor(
instance["input_ids"], device=aiCls.device).unsqueeze(0)
token_type_ids = torch.tensor(
instance["token_type_ids"], device=aiCls.device).unsqueeze(0)
logits = model(input_ids, token_type_ids=token_type_ids)
if isinstance(logits, tuple): # for gpt2 and maybe others
logits = logits[0]
logits = logits[0, -1, :] / args["temperature"]
logits = aiCls.top_filtering(
logits, top_k=args["top_k"], top_p=args["top_p"])
probs = F.softmax(logits, dim=-1)
prev = torch.topk(probs, 1)[
1] if args["no_sample"] else torch.multinomial(probs, 1)
if i < args["min_length"] and prev.item() in special_tokens_ids:
while prev.item() in special_tokens_ids:
if probs.max().item() == 1:
logging.warn(
"Warning: model generating special token with probability 1.")
break # avoid infinitely looping over special token
prev = torch.multinomial(probs, num_samples=1)
if prev.item() in special_tokens_ids:
break
current_output.append(prev.item())
return current_output
def interact(raw_text, model, personality, userid, history):
args = model.args
tokenizer = model.tokenizer
process_count = model.args["process_count"]
model._move_model_to_device()
if not personality:
dataset = get_dataset(
tokenizer,
None,
args["cache_dir"],
process_count=process_count,
proxies=model.__dict__.get("proxies", None),
interact=True,
)
personalities = [dialog["personality"]
for dataset in dataset.values() for dialog in dataset]
personality = random.choice(personalities)
else:
personality = [tokenizer.encode(s.lower()) for s in personality]
history.append(tokenizer.encode(raw_text))
with torch.no_grad():
out_ids = sample_sequence(
model, personality, history, tokenizer, model.model, args)
history.append(out_ids)
history = history[-(2 * args["max_history"] + 1):]
out_text = tokenizer.decode(out_ids, skip_special_tokens=True)
save_chat_history(userid, json.dumps({"history": history}))
return out_text
We are almost there! Now we have to deploy our bot. Run the following command to deploy:
From the output above, we can see the chatbot is now deployed.
Now it’s time to test our bot. Go to CloudFormation Resources list in the AWS Management Console to find the Lambda function name and Invoke lambda function using following command:
$aws lambda invoke --function-name "chat-efs-api-HelloFunction-KQSNKF5K0IY8" out --log-type Tail \--query 'LogResult' --output text | base64 -d
The output will look like below:
Here is an example of dialog:
>>hi there
how are you?
>>good, thank you
what do you like to do for fun?
>>I like reading, yourself?
i like to listen to classical music
......
It works! As we can see from above screenshot, a chatbot returns response based on input from a user.
However, I am aware of the impact of cold starts on response times. The first request took ~30 secs for cold starts to complete. To prevent cold start in our Lambda functions, we can use Provisioned Concurrency to keep functions warm:
As a result, the latency of warmed-up function is reduced to ~3 seconds:
That’s it! I hope you have found this article useful, The source code for this post can be found in my GitHub repo.