The Design of Model Productionization Architecture in Data Science by@shauryauppal

The Design of Model Productionization Architecture in Data Science

A lot of companies struggle to bring their data science projects into production. It is mainly because there is a huge knowledge gap, data scientists understand model building well but lack productionizing skills. The simple reason is these skills are not taught on Youtube videos and are hardly being touched by data science courses or the Kaggle learning method. I proposed an architecture that is near-realtime (batch inference every 5mins), fallbacks to the last model update (backend fetches previous update results from s3), and shared a slack webhook simple alert route.
image
Shaurya Uppal HackerNoon profile picture

Shaurya Uppal

Data Scientist | Applied Scientist | Research Consultant | Startup Builder

twitter social iconlinkedin social icongithub social icon

Credibility


A lot of companies struggle to bring their data science projects to production. It is mainly because there is a huge knowledge gap, as data scientists understand model building well but lack productionization skills. The simple reason is that these skills are not taught on Youtube videos and are hardly being touched by data science courses or the Kaggle learning method.


Knowledge Flowing

Knowledge Flowing

Objective of this newsletter is to share my learning from those various deployments, I have done.

Tech Rule to deployment: Lesser dependency ∝ Faster deployment


Let’s understand data science model deployment with a real problem. A friend of mine called me sometime back requesting that I help him with his use case and deploy the model to production.


Call Ringing

Call Ringing

We discussed the problem for an hour or two to understand the constraints.

Discussed Constraints Summary:

  • Data Source is Elastic Search (ES very frequently updates with new entries)

  • Real-time or Near Real-Time inference (with a delay acceptable till 10mins)

  • Low on Budget

  • Minimum Failure Rate with Fallbacks

  • Alerting system in case of any failure occurred


Post understanding the constraints and the problem, he was trying to solve. I proposed an architecture (check diagram below) that is near-realtime (batch inference every 5mins), fallbacks to the last model update (backend fetches previous update results from s3), and shared a slack webhook simple alert route.


Batch Model Design

Batch Model Design


After about two weeks, he called up sharing that the solution worked well 🥳 (emotion super happy). The above is tried and tested: Low on Budget and Low on Maintenance — Model Productionisation Design.

image

Let’s reason!

The architecture design on how the above solves for the constraints


Alternative Sagemaker Batch Inference: This could have been a good option but we did not go for it because he already had an EC2 instance running 24x7 and it was underutilized. Apart from the above for 5mins inference (near-real-time inference architecture), it is safer not to use Sagemaker batch inference while the real-time inference option is costlier.


Dev familiarity is another factor that is super important when building an architecture design, EC2 has always been a playground for him.


Another learning point: if you are running a model for with an update frequency of 1 day, and you want to run task compute on a new EC2 machine, on every run one can still use above architecture. One can follow this blog to learn “How do I stop and start Amazon EC2 instances at regular intervals using Lambda?” This will help save cost of EC2 machine by running it only till compute window.

Why Airflow?

Apache Airflow is an open-source scheduler to manage your regular jobs. It is an excellent tool to organize, execute, and monitor your workflows so that they work seamlessly.


Airflow would trigger the data fetch from ES and prediction task in a defined interval for the above case - every 5mins. To write a scheduler expression one can use Crontab.Guru (this expression writing tool is excellent, I often use it when writing an Airflow task).

Other Reasoning Pointers on the Architecture:

  • Model is loaded from S3 in memory for inferencing so that local disk storage is not utilized by model weights. Advice: I have seen many data scientists keep the model file on machine storage. And when something goes wrong with EC2 all files get deleted, and with it all their efforts. Always keep the model on S3 as a backup.


  • Output is overwritten on S3 and the backend picks model results from S3; S3 storage is reliable and cheapest on AWS.


  • For alerts, slack is the best option given it is always on during office hours and when you are unavailable your team members can have visibility on alerts, one can even add airflow failure and retry emails but I prefer alerts with webhooks on the office communication tools: slack/flock/teams.

Easy is not the solution always

Easy is not the solution always


I know there are better options to Airflow component like Dagster or Perfect; similarity for other components in the architecture there are new/comparative alternatives. But never forgot the factor of Dev familiarity when choosing tools for your model pipelines. Older the tool better the support and one cannot really downplay it.

“WHAT IF, WE WANT TO DEPLOY A REAL-TIME MODEL IN PRODUCTION?” — my friend asked, even readers here must be thinking the same.

I have a question?

I have a question?

We discussed the constraints, summarized below:

  • A pool of Items is fetched from ES
  • Real-Time Model Output with Timeout of 100ms required
  • On Time-Out, Model Fallbacks to Cached version
  • Logging is required

Below is the real-time model productionisation architecture design 👇

Real-Time Model Design

Real-Time Model Design

Reasoning and Other takeaways on real-time Architecture

  • I have shown dockerized model deployed on EC2. It can be deployed on ECS/SageMaker as well, would leave that choice to you.
  • For caching — personal preference Redis.
  • Kibana is for logging response and info logs within model service
  • For model serving one can use MLFlow, BentoML, FastAPI, Cortex, etc. I prefer BentoML as in under 10 minutes, you’ll be able to serve your ML model over an HTTP API endpoint, and build a docker image that is ready to be deployed in production.

Conclusion

I hope that data science model productionization architecture design is not an out-of-syllabus question anymore. There is so much more to it than we can cover but do spend some time racking your Brain on it..!



Also published here.


I hope you learned something new from this post. If you liked it, hit 👍 or ❤️ and share this with others. Stay tuned for the next one!


My Newsletter on LinkedIn is now read by more than 4500+ subscribers. If you are building an AI or a data product or service, you are invited to become a sponsor of one of the future newsletter issues. Feel free to reach out to [email protected] for more details on sponsorships.


I am nominated for the HackerNoon 2022 Noonies, Vote for me: HackerNoon Contributor of the Year - Data ||

Data Science Demon


Data Science Book Recommendations:

[1] The Book of Why

[2] Naked Statistics

react to story with heart
react to story with light
react to story with boat
react to story with money

Related Stories

L O A D I N G
. . . comments & more!