Amazon announced the general availability of AWS Lambda support for Amazon Elastic File System. Amazon EFS is a fully managed, elastic, shared file system and designed to be consumed by other AWS services. With the release of Amazon EFS for Lambda, we can now easily share data across function invocations. It also opens new capabilities, such as building/importing large libraries and machine learning models directly into Lambda functions. Let’s go over how to build a serverless conversational AI chatbot using Lambda function and EFS. In this post, we will: Create an Amazon Elastic File System Deploy and run a SageMaker notebook instance and Mount EFS to instance. Download PyTorch libraries and ConvAI pre-trained model to EFS. Add dialog history DynamoDB table and Gateway endpoint to save & retrieve conversation history. Deploy a chatbot engine Lambda function and enable EFS for it. Here’s the architecture diagram: Creating an EFS file system In this example we will use CloudFormation to create EFS and EFS access point, the configuration is defined as follows: FileSystem: Type: AWS::EFS::FileSystem Properties: PerformanceMode: generalPurpose FileSystemTags: - Key: Name Value: fs-pylibs MountTargetA: Type: AWS::EFS::MountTarget Properties: FileSystemId: Ref: FileSystem SubnetId: " " {{resolve:ssm:/root/defaultVPC/subsetA:1}} SecurityGroups: - " " {{resolve:ssm:/root/defaultVPC/securityGroup:1}} MountTargetB: Type: AWS::EFS::MountTarget Properties: FileSystemId: Ref: FileSystem SubnetId: " " {{resolve:ssm:/root/defaultVPC/subsetB:1}} SecurityGroups: - " " {{resolve:ssm:/root/defaultVPC/securityGroup:1}} AccessPointResource: Type: "AWS::EFS::AccessPoint" DependsOn: FileSystem Properties: FileSystemId: !Ref FileSystem PosixUser: Uid: "1000" Gid: "1000" RootDirectory: CreationInfo: OwnerGid: "1000" OwnerUid: "1000" Permissions: "0777" Path: "/py-libs" Note that we will use EFS performance mode since it has lower latency than Max I/O. General Purpose Working with Amazon SageMaker We will mount EFS on Amazon SageMaker on a SageMaker notebook and Install PyTorch and ConvAI model on EFS. The notebook instance must have access to the same security group and reside in the same VPC as the EFS file system. Let’s mount EFS path to directory: /py-libs /home/ec2-user/SageMaker/libs %%sh mkdir -p libs FILE_SYS_ID=fs-xxxxxx sudo mount -t nfs \ -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ .efs.ap-southeast-2.amazonaws.com:/ \ libs libs && sudo mkdir -p py-libs .. && sudo umount -l /home/ec2-user/SageMaker/libs sudo mount -t nfs \ -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ .efs.ap-southeast-2.amazonaws.com:/py-libs \ libs $FILE_SYS_ID cd cd $FILE_SYS_ID Then, install PyTorch and to lib/py-libs directory: simpletransformers !sudo pip --no-cache-dir install torch -t libs/py-libs !sudo pip --no-cache-dir install torchvision -t libs/py-libs !sudo pip --no-cache-dir install simpletransformers -t libs/py-libs Once we have all packages installed, download provided by Hugging Face, then extract the archive to directory on EFS. pre-trained model convai-model !sudo wget https://s3.amazonaws.com/models.huggingface.co/transfer-learning-chatbot/gpt_personachat_cache.tar.gz !sudo tar -xvf gpt_personachat_cache.tar.gz -C libs/convai-model !sudo chmod -R g+rw libs/convai-model We are now ready to talk to the pre-trained model, simply call model.interact() The provided by Hugging Face performs well out-of-the-box and will likely require less fine-tuning when creating chatbot. pre-trained model We can see that python packages and model consumed from EFS correctly and we are able to start the conversation with pre-trained model. Creating AWS DynamoDB table Create a DialogHistory table to store dialog history with at least the last utterance from user. We can use to configure the DynamoDB table. sample CloudFormation templates Please note that We have to create a even though the Lambda function is running inside a public subnet of a VPC. VPC endpoint for DynamoDB Configuring AWS Lambda to use EFS We will use AWS to create Lambda functions and mount EFS access points to Lambda function. SAM First, create a Lambda function resource, then setup EFS File System for Lambda. Make sure that EFS and Lambda are in the same VPC: HelloFunction: Type: AWS::Serverless::Function DependsOn: - LibAccessPointResource Properties: Environment: Variables: CHAT_HISTORY_TABLE: !Ref TableName Role: !GetAtt LambdaRole.Arn CodeUri: src/ Handler: api.lambda_handler Runtime: python3.6 FileSystemConfigs: - Arn: !GetAtt LibAccessPointResource.Arn LocalMountPath: "/mnt/libs" VpcConfig: SecurityGroupIds: - " " {{resolve:ssm:/root/defaultVPC/securityGroup:1}} SubnetIds: - " " {{resolve:ssm:/root/defaultVPC/subsetA:1}} - " " {{resolve:ssm:/root/defaultVPC/subsetB:1}} LambdaRole: Type: AWS::IAM::Role Properties: RoleName: "efsAPILambdaRole" AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - "lambda.amazonaws.com" Action: - "sts:AssumeRole" ManagedPolicyArns: - "arn:aws:iam::aws:policy/AWSLambdaExecute" - "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole" - "arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess" Policies: - PolicyName: "efsAPIRoleDBAccess" PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - "dynamodb:PutItem" - "dynamodb:GetItem" - "dynamodb:UpdateItem" - "dynamodb:DeleteItem" - "dynamodb:Query" - "dynamodb:Scan" Resource: - !GetAtt ChatHistory.Arn - Fn: :Join: - "/" - - !GetAtt ChatHistory.Arn - "*" - Effect: Allow Action: - "ssm:GetParameter*" Resource: - !Sub "arn:${AWS::Partition}:ssm:${AWS::Region}:${AWS::AccountId}:parameter/root/defaultVPC*" Adding the conversation engine: AWS Lambda In this section, we will create a Lambda function for communication between users and conversation AI model. We will contain the following source code in : src/api.py json logging sys boto3 random os sys.path.insert( , ) torch torch.nn.functional F simpletransformers.conv_ai.conv_ai_utils get_dataset simpletransformers.conv_ai ConvAIModel response = dynamodb.get_item(TableName=TABLE_NAME, Key={ : { : userid }}) response: json.loads(response[ ][ ][ ]) { : []} dynamodb.put_item(TableName=TABLE_NAME, Item={ : { : userid}, : { : history}}) : userid = event[ ] message = event[ ] history = get_chat_history(userid) history = history[ ] response_msg = interact(message, convAimodel, character, userid, history) { : json.dumps(response_msg) } Exception ex: logging.exception(ex) import import import import import import 1 '/mnt/libs/py-libs' import import as from import from import : def get_chat_history (userid) 'userid' 'S' if 'Item' in return "Item" "history" "S" return "history" : def save_chat_history (userid, history) return 'userid' 'S' 'history' 'S' : def lambda_handler (event, context) try 'userid' 'message' "history" return 'message' except as Note that library allows us to interact with the models locally with input(). To build our chat engine, we need to override the default method interact and in : simpletransformers sample_sequenc conv_ai special_tokens_ids = tokenizer.convert_tokens_to_ids(SPECIAL_TOKENS) current_output : current_output = [] i range(args[ ]): instance = aiCls.build_input_from_segments( personality, history, current_output, tokenizer, with_eos= ) input_ids = torch.tensor( instance[ ], device=aiCls.device).unsqueeze( ) token_type_ids = torch.tensor( instance[ ], device=aiCls.device).unsqueeze( ) logits = model(input_ids, token_type_ids=token_type_ids) isinstance(logits, tuple): logits = logits[ ] logits = logits[ , , :] / args[ ] logits = aiCls.top_filtering( logits, top_k=args[ ], top_p=args[ ]) probs = F.softmax(logits, dim= ) prev = torch.topk(probs, )[ ] args[ ] torch.multinomial(probs, ) i < args[ ] prev.item() special_tokens_ids: prev.item() special_tokens_ids: probs.max().item() == : logging.warn( ) prev = torch.multinomial(probs, num_samples= ) prev.item() special_tokens_ids: current_output.append(prev.item()) current_output args = model.args tokenizer = model.tokenizer process_count = model.args[ ] model._move_model_to_device() personality: dataset = get_dataset( tokenizer, , args[ ], process_count=process_count, proxies=model.__dict__.get( , ), interact= , ) personalities = [dialog[ ] dataset dataset.values() dialog dataset] personality = random.choice(personalities) : personality = [tokenizer.encode(s.lower()) s personality] history.append(tokenizer.encode(raw_text)) torch.no_grad(): out_ids = sample_sequence( model, personality, history, tokenizer, model.model, args) history.append(out_ids) history = history[-( * args[ ] + ):] out_text = tokenizer.decode(out_ids, skip_special_tokens= ) save_chat_history(userid, json.dumps({ : history})) out_text : def sample_sequence (aiCls, personality, history, tokenizer, model, args, current_output=None) if is None for in "max_length" False "input_ids" 0 "token_type_ids" 0 if # for gpt2 and maybe others 0 0 -1 "temperature" "top_k" "top_p" -1 1 1 if "no_sample" else 1 if "min_length" and in while in if 1 "Warning: model generating special token with probability 1." break # avoid infinitely looping over special token 1 if in break return : def interact (raw_text, model, personality, userid, history) "process_count" if not None "cache_dir" "proxies" None True "personality" for in for in else for in with 2 "max_history" 1 True "history" return Deploying the chatbot service We are almost there! Now we have to deploy our bot. Run the following command to deploy: From the output above, we can see the chatbot is now deployed. Now it’s time to test our bot. Go to CloudFormation Resources list in the AWS Management Console to find the Lambda function name and Invoke lambda function using following command: $aws lambda --function-name out --log-type Tail \--query 'LogResult' --output text | base64 -d invoke "chat-efs-api-HelloFunction-KQSNKF5K0IY8" The output will look like below: Here is an example of dialog: >>hi there how are you? >>good, thank you what you like to I like reading, yourself? i like to listen to classical music ...... do do for ? fun >> It works! As we can see from above screenshot, a chatbot returns response based on input from a user. However, I am aware of the impact of cold starts on response times. The first request took ~30 secs for to complete. To prevent cold start in our Lambda functions, we can use to keep functions warm: cold starts Provisioned Concurrency As a result, the latency of warmed-up function is reduced to ~3 seconds: That’s it! I hope you have found this article useful, The source code for this post can be found in my . GitHub repo