Continuous Integration (CI) and Continuous Delivery (CD) are staples of a modern software development workflow that enable developers to release their code rapidly, reproducibly, and reliably. At Modzy, we take the best of traditional continuous integration practices and augment them to fit the needs of a modern data science team. In this way, we empower our agile team of researchers and engineers to continually add to and update a growing portfolio of over 100 machine learning models. Figure 1: Overview of the continuous integration process to deploy a model into production using Modzy. To account for the diverse backgrounds of our data scientists, it is essential to define a unifying set of requirements for the development of all models while ensuring accessibility to developers. Pictured above in Figure 1, we see a schematic that outlines the process that allows data scientists to integrate and deploy their models with confidence. In addition to traditional code review, a series of automatic requirements are applied and enforced prior to merging code. machine learning We perform several levels of unique checks against all our data science repositories in order to standardize the model development process. This ensures every model release is reliable and traceable. Ensuring License Compliance During the process of developing models, we rely on third-party software libraries and open-source software implementations. From the data science perspective, we also rely on publicly available data sources and open-source datasets to train and enhance the performance of our models. Mandatory license checks ensure that we comply with the legal parameters associated with the software and datasets we use. Model Versioning In traditional software applications, development is often a linear process in which each new version supplants the previous version. In this case, overwriting old artifacts or only having the latest version of the software deployed to a production environment is often sufficient. For data science model development, different versions of a model may offer trade-offs in speed, accuracy, or intended use case. In order to support this, we use semantic versioning in conjunction with model identifiers to maintain and deploy different versions or lineages of models through time. Container Security Model security is paramount. Users count on our container security to ensure the protection of the intellectual property associated with all machine learning models deployed through the Modzy platform. As a result, we’ve developed an assortment of custom, secure Modzy base images for our models. We scan the docker image during the continuous integration process in order to ensure that there are no Common Vulnerabilities and Exposures (CVEs) that could be exploited by an attacker at the operating system/network level. Key Takeaways It is crucial to establish a CI/CD pipeline to maintain and produce high-quality, reproducible, and secure models, while keeping pace with continually increasing demand. By following a consistent, repeatable process and adapting proven techniques from the software development space for data science, organizations can move past the challenges of deploying and managing machine learning models in production systems.

Different

The CI/CD Model Development Process

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

3 Categories of Model Training Considerations

104 Stories To Learn About Continuous Integration

139 Stories To Learn About Cicd

15 of the Best Continuous Delivery Tools

5 Best Microservices CI/CD Tools You Need to Check Out

53 Stories To Learn About Continuous Deployment

3 Categories of Model Training Considerations

104 Stories To Learn About Continuous Integration

139 Stories To Learn About Cicd

15 of the Best Continuous Delivery Tools

5 Best Microservices CI/CD Tools You Need to Check Out

53 Stories To Learn About Continuous Deployment

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps