Too Long; Didn't Read
We often get asked whether serverless is the right compute architecture to deploy models. The cost savings touted by serverless seem appealing for ML workloads as for other traditional workloads. However, the special requirements of ML models as related to hardware and resources can cause impediments to using serverless. This blog post talks about how to get started with deploying models on AWS Lambda, along with the pros and cons of using this system for inference. In particular, we use the DistillBERT question and answer model from HuggingFace.