Hackernoon logoBuilding a Conversational AI Chatbot With AWS Lambda Function and Amazon EFS by@yi

Building a Conversational AI Chatbot With AWS Lambda Function and Amazon EFS

Author profile picture

@yiYi Ai

Amazon announced the general availability of AWS Lambda support for Amazon Elastic File System. Amazon EFS is a fully managed, elastic, shared file system and designed to be consumed by other AWS services.
With the release of Amazon EFS for Lambda, we can now easily share data across function invocations. It also opens new capabilities, such as building/importing large libraries and machine learning models directly into Lambda functions. Let’s go over how to build a serverless conversational AI chatbot using Lambda function and EFS.
In this post, we will:
  • Create an Amazon Elastic File System
  • Deploy and run a SageMaker notebook instance and Mount EFS to instance.
  • Download PyTorch libraries and ConvAI pre-trained model to EFS.
  • Add dialog history DynamoDB table and Gateway endpoint to save & retrieve conversation history.
  • Deploy a chatbot engine Lambda function and enable EFS for it.
Here’s the architecture diagram:

Creating an EFS file system

In this example we will use CloudFormation to create EFS and EFS access point, the configuration is defined as follows:
  FileSystem:
    Type: AWS::EFS::FileSystem
    Properties:
      PerformanceMode: generalPurpose
      FileSystemTags:
        - Key: Name
          Value: fs-pylibs
  MountTargetA:
    Type: AWS::EFS::MountTarget
    Properties:
      FileSystemId:
        Ref: FileSystem
      SubnetId: "{{resolve:ssm:/root/defaultVPC/subsetA:1}}"
      SecurityGroups:
        - "{{resolve:ssm:/root/defaultVPC/securityGroup:1}}"
  MountTargetB:
    Type: AWS::EFS::MountTarget
    Properties:
      FileSystemId:
        Ref: FileSystem
      SubnetId: "{{resolve:ssm:/root/defaultVPC/subsetB:1}}"
      SecurityGroups:
        - "{{resolve:ssm:/root/defaultVPC/securityGroup:1}}"
  AccessPointResource:
    Type: "AWS::EFS::AccessPoint"
    DependsOn: FileSystem
    Properties:
      FileSystemId: !Ref FileSystem
      PosixUser:
        Uid: "1000"
        Gid: "1000"
      RootDirectory:
        CreationInfo:
          OwnerGid: "1000"
          OwnerUid: "1000"
          Permissions: "0777"
        Path: "/py-libs"
Note that we will use EFS General Purpose performance mode since it has lower latency than Max I/O.

Working with Amazon SageMaker

We will mount EFS on Amazon SageMaker on a SageMaker notebook and Install PyTorch and ConvAI model on EFS.
The notebook instance must have access to the same security group and reside in the same VPC as the EFS file system.
Let’s mount EFS path 
/py-libs
 to 
/home/ec2-user/SageMaker/libs 
directory:
%%sh

mkdir -p libs

FILE_SYS_ID=fs-xxxxxx

sudo mount -t nfs \
    -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
    $FILE_SYS_ID.efs.ap-southeast-2.amazonaws.com:/ \
    libs

cd libs && sudo mkdir -p py-libs

cd .. && sudo umount -l /home/ec2-user/SageMaker/libs

sudo mount -t nfs \
    -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
    $FILE_SYS_ID.efs.ap-southeast-2.amazonaws.com:/py-libs \
    libs
Then, install PyTorch and simpletransformers to lib/py-libs directory:
!sudo pip --no-cache-dir install torch -t libs/py-libs
!sudo pip --no-cache-dir install torchvision -t libs/py-libs
!sudo pip --no-cache-dir install simpletransformers -t libs/py-libs
Once we have all packages installed, download pre-trained model provided by Hugging Face, then extract the archive to 
convai-model
 directory on EFS.
!sudo wget https://s3.amazonaws.com/models.huggingface.co/transfer-learning-chatbot/gpt_personachat_cache.tar.gz
!sudo tar -xvf gpt_personachat_cache.tar.gz -C libs/convai-model
!sudo chmod -R g+rw libs/convai-model
We are now ready to talk to the pre-trained model, simply call
model.interact()
The pre-trained model provided by Hugging Face performs well out-of-the-box and will likely require less fine-tuning when creating chatbot.
We can see that python packages and model consumed from EFS correctly and we are able to start the conversation with pre-trained model.

Creating AWS DynamoDB table

Create a DialogHistory table to store dialog history with at least the last utterance from user. We can use sample CloudFormation templates to configure the DynamoDB table.
Please note that We have to create a VPC endpoint for DynamoDB even though the Lambda function is running inside a public subnet of a VPC.

Configuring AWS Lambda to use EFS

We will use AWS SAM to create Lambda functions and mount EFS access points to Lambda function.
First, create a Lambda function resource, then setup EFS File System for Lambda. Make sure that EFS and Lambda are in the same VPC:
HelloFunction:
    Type: AWS::Serverless::Function
    DependsOn:
      - LibAccessPointResource
    Properties:
      Environment:
        Variables:
          CHAT_HISTORY_TABLE: !Ref TableName
      Role: !GetAtt LambdaRole.Arn
      CodeUri: src/
      Handler: api.lambda_handler
      Runtime: python3.6
      FileSystemConfigs:
        - Arn: !GetAtt LibAccessPointResource.Arn
          LocalMountPath: "/mnt/libs"
      VpcConfig:
        SecurityGroupIds:
          - "{{resolve:ssm:/root/defaultVPC/securityGroup:1}}"
        SubnetIds:
          - "{{resolve:ssm:/root/defaultVPC/subsetA:1}}"
          - "{{resolve:ssm:/root/defaultVPC/subsetB:1}}"
  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: "efsAPILambdaRole"
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: "Allow"
            Principal:
              Service:
                - "lambda.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/AWSLambdaExecute"
        - "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
        - "arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess"
      Policies:
        - PolicyName: "efsAPIRoleDBAccess"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action:
                  - "dynamodb:PutItem"
                  - "dynamodb:GetItem"
                  - "dynamodb:UpdateItem"
                  - "dynamodb:DeleteItem"
                  - "dynamodb:Query"
                  - "dynamodb:Scan"
                Resource:
                  - !GetAtt ChatHistory.Arn
                  - Fn::Join:
                      - "/"
                      - - !GetAtt ChatHistory.Arn
                        - "*"
              - Effect: Allow
                Action:
                  - "ssm:GetParameter*"
                Resource:
                  - !Sub "arn:${AWS::Partition}:ssm:${AWS::Region}:${AWS::AccountId}:parameter/root/defaultVPC*"
  
Adding the conversation engine: AWS Lambda
In this section, we will create a Lambda function for communication between users and conversation AI model.
We will contain the following source code in 
src/api.py
 :
import json
import logging
import sys
import boto3
import random
import os
sys.path.insert(1, '/mnt/libs/py-libs')
import torch
import torch.nn.functional as F
from simpletransformers.conv_ai.conv_ai_utils import get_dataset
from simpletransformers.conv_ai import ConvAIModel


def get_chat_history(userid):
    response = dynamodb.get_item(TableName=TABLE_NAME, Key={
        'userid': {
            'S': userid
        }})

    if 'Item' in response:
        return json.loads(response["Item"]["history"]["S"])
    return {"history": []}


def save_chat_history(userid, history):
    return dynamodb.put_item(TableName=TABLE_NAME, Item={'userid': {'S': userid}, 'history': {'S': history}})


def lambda_handler(event, context):
    try:
        userid = event['userid']
        message = event['message']
        history = get_chat_history(userid)
        history = history["history"]
        response_msg = interact(message, convAimodel,
                                character, userid, history)

        return {
            'message': json.dumps(response_msg)
        }
    except Exception as ex:
        logging.exception(ex)
Note that simpletransformers library allows us to interact with the models locally with input(). To build our chat engine, we need to override the default method interact and 
sample_sequenc
 in 
conv_ai
:
def sample_sequence(aiCls, personality, history, tokenizer, model, args, current_output=None):
    special_tokens_ids = tokenizer.convert_tokens_to_ids(SPECIAL_TOKENS)
    if current_output is None:
        current_output = []

    for i in range(args["max_length"]):
        instance = aiCls.build_input_from_segments(
            personality, history, current_output, tokenizer, with_eos=False)

        input_ids = torch.tensor(
            instance["input_ids"], device=aiCls.device).unsqueeze(0)
        token_type_ids = torch.tensor(
            instance["token_type_ids"], device=aiCls.device).unsqueeze(0)

        logits = model(input_ids, token_type_ids=token_type_ids)
        if isinstance(logits, tuple):  # for gpt2 and maybe others
            logits = logits[0]
        logits = logits[0, -1, :] / args["temperature"]
        logits = aiCls.top_filtering(
            logits, top_k=args["top_k"], top_p=args["top_p"])
        probs = F.softmax(logits, dim=-1)

        prev = torch.topk(probs, 1)[
            1] if args["no_sample"] else torch.multinomial(probs, 1)
        if i < args["min_length"] and prev.item() in special_tokens_ids:
            while prev.item() in special_tokens_ids:
                if probs.max().item() == 1:
                    logging.warn(
                        "Warning: model generating special token with probability 1.")
                    break  # avoid infinitely looping over special token
                prev = torch.multinomial(probs, num_samples=1)

        if prev.item() in special_tokens_ids:
            break
        current_output.append(prev.item())

    return current_output


def interact(raw_text, model, personality, userid, history):
    args = model.args
    tokenizer = model.tokenizer
    process_count = model.args["process_count"]

    model._move_model_to_device()

    if not personality:
        dataset = get_dataset(
            tokenizer,
            None,
            args["cache_dir"],
            process_count=process_count,
            proxies=model.__dict__.get("proxies", None),
            interact=True,
        )
        personalities = [dialog["personality"]
                         for dataset in dataset.values() for dialog in dataset]
        personality = random.choice(personalities)
    else:
        personality = [tokenizer.encode(s.lower()) for s in personality]

    history.append(tokenizer.encode(raw_text))
    with torch.no_grad():
        out_ids = sample_sequence(
            model, personality, history, tokenizer, model.model, args)
    history.append(out_ids)
    history = history[-(2 * args["max_history"] + 1):]
    out_text = tokenizer.decode(out_ids, skip_special_tokens=True)
    save_chat_history(userid, json.dumps({"history": history}))
    return out_text

Deploying the chatbot service

We are almost there! Now we have to deploy our bot. Run the following command to deploy:
From the output above, we can see the chatbot is now deployed.
Now it’s time to test our bot. Go to CloudFormation Resources list in the AWS Management Console to find the Lambda function name and Invoke lambda function using following command:
$aws lambda invoke --function-name "chat-efs-api-HelloFunction-KQSNKF5K0IY8" out --log-type Tail  \--query 'LogResult' --output text |  base64 -d
The output will look like below:
Here is an example of dialog:
>>hi there
how are you?
>>good, thank you
what do you like to do for fun?
>>I like reading, yourself?
i like to listen to classical music
......
It works! As we can see from above screenshot, a chatbot returns response based on input from a user.
However, I am aware of the impact of cold starts on response times. The first request took ~30 secs for cold starts to complete. To prevent cold start in our Lambda functions, we can use Provisioned Concurrency to keep functions warm:
As a result, the latency of warmed-up function is reduced to ~3 seconds:
That’s it! I hope you have found this article useful, The source code for this post can be found in my GitHub repo.

Tags

The Noonification banner

Subscribe to get your daily round-up of top tech stories!