A lot of companies struggle to bring their data science projects to production. It is mainly because there is a huge knowledge gap, as data scientists understand model building well but lack productionization skills. The simple reason is that these skills are not taught on Youtube videos and are hardly being touched by data science courses or the Kaggle learning method.
Objective of this newsletter is to share my learning from those various deployments, I have done.
Tech Rule to deployment: Lesser dependency ∝ Faster deployment
Let’s understand data science model deployment with a real problem. A friend of mine called me sometime back requesting that I help him with his use case and deploy the model to production.
We discussed the problem for an hour or two to understand the constraints.
Data Source is Elastic Search (ES very frequently updates with new entries)
Real-time or Near Real-Time inference (with a delay acceptable till 10mins)
Low on Budget
Minimum Failure Rate with Fallbacks
Alerting system in case of any failure occurred
Post understanding the constraints and the problem, he was trying to solve. I proposed an architecture (check diagram below) that is near-realtime (batch inference every 5mins), fallbacks to the last model update (backend fetches previous update results from s3), and shared a slack webhook simple alert route.
After about two weeks, he called up sharing that the solution worked well 🥳 (emotion super happy). The above is tried and tested: Low on Budget and Low on Maintenance — Model Productionisation Design.
Let’s reason!
The architecture design on how the above solves for the constraints
Alternative Sagemaker Batch Inference: This could have been a good option but we did not go for it because he already had an EC2 instance running 24x7 and it was underutilized. Apart from the above for 5mins inference (near-real-time inference architecture), it is safer not to use Sagemaker batch inference while the real-time inference option is costlier.
Dev familiarity is another factor that is super important when building an architecture design, EC2 has always been a playground for him.
Another learning point: if you are running a model for with an update frequency of 1 day, and you want to run task compute on a new EC2 machine, on every run one can still use above architecture. One can follow this blog to learn “How do I stop and start Amazon EC2 instances at regular intervals using Lambda?” This will help save cost of EC2 machine by running it only till compute window.
Apache Airflow is an open-source scheduler to manage your regular jobs. It is an excellent tool to organize, execute, and monitor your workflows so that they work seamlessly.
Airflow would trigger the data fetch from ES and prediction task in a defined interval for the above case - every 5mins. To write a scheduler expression one can use Crontab.Guru (this expression writing tool is excellent, I often use it when writing an Airflow task).
Model is loaded from S3 in memory for inferencing so that local disk storage is not utilized by model weights. Advice: I have seen many data scientists keep the model file on machine storage. And when something goes wrong with EC2 all files get deleted, and with it all their efforts. Always keep the model on S3 as a backup.
Output is overwritten on S3 and the backend picks model results from S3; S3 storage is reliable and cheapest on AWS.
For alerts, slack is the best option given it is always on during office hours and when you are unavailable your team members can have visibility on alerts, one can even add airflow failure and retry emails but I prefer alerts with webhooks on the office communication tools: slack/flock/teams.
I know there are better options to Airflow component like Dagster or Perfect; similarity for other components in the architecture there are new/comparative alternatives. But never forgot the factor of Dev familiarity when choosing tools for your model pipelines. Older the tool better the support and one cannot really downplay it.
“WHAT IF, WE WANT TO DEPLOY A REAL-TIME MODEL IN PRODUCTION?” — my friend asked, even readers here must be thinking the same.
Below is the real-time model productionisation architecture design 👇
I hope that data science model productionization architecture design is not an out-of-syllabus question anymore. There is so much more to it than we can cover but do spend some time racking your Brain on it..!
Also published here.
My Newsletter on LinkedIn is now read by more than 4500+ subscribers. If you are building an AI or a data product or service, you are invited to become a sponsor of one of the future newsletter issues. Feel free to reach out to [email protected] for more details on sponsorships.
I am nominated for the HackerNoon 2022 Noonies, Vote for me: HackerNoon Contributor of the Year - Data ||
Data Science Book Recommendations:
[1] The Book of Why
[2] Naked Statistics