paint-brush
Introducing PyMilo: The Power of Transparency in Python ML Model Exportby@openscilab
104 reads

Introducing PyMilo: The Power of Transparency in Python ML Model Export

by OpenSciLabNovember 10th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

PyMilo is an open-source Python package providing a simple, efficient, and safe way to export trained machine-learning models. PyMilo adopts a fully transparent strategy by which the exported model, encapsulated in a human-readable format, invites exploration. Its transparency allows easy integration into diverse environments.
featured image - Introducing PyMilo: The Power of Transparency in Python ML Model Export
OpenSciLab HackerNoon profile picture

In the realm of machine learning, developers always faced a dilemma: how to share trained models without revealing the underlying code or risking the chaos of binary formats. It was a challenge that begged for a solution, and from this need, PyMilo emerged as the Python package we had all been waiting for.


What is PyMilo?

PyMilo is an open-source Python package providing a simple, efficient, and safe way to export trained machine-learning models. PyMilo adopts a fully transparent strategy by which the exported model, encapsulated in a human-readable format, invites exploration, allowing users to comprehend the intricacies of the model structure easily. Its transparency allows easy integration into diverse environments, smooth platform transfer, and collaboration without revealing underlying code or training data.





PyMilo transportation is completely End-to-end, providing a seamless standalone process from exporting to importing and executing machine learning models. “End-to-end” here refers to a full process from exporting the trained model to importing and executing it in an inference mode without any additional dependencies. The user can export a model using PyMilo, and subsequently import the exported file into another environment, resulting in the exact model from the original library.

End-to-end process diagram of PyMilo



The motivation behind PyMilo

The motivation propelling PyMilo is to resolve the risks and inefficiencies in model-sharing methods. Developers often grapple with the dilemma of either limiting themselves to more dependencies or navigating unsafe formats when disseminating their models. PyMilo rises to this challenge by offering an efficient, secure, and transparent pathway for transporting trained models, serving as a conduit between training environments and deployment scenarios without the burden of additional dependencies.


How PyMilo Works

Using PyMilo is as simple as waving a wand. You can easily export your models with just a few lines of code, and the result is neatly packaged and ready for deployment or sharing.


Here is a simple example of serializing and deserializing a Scikit-learn linear regression model.

Model Preparation

from sklearn import datasets
from sklearn.linear_model import LinearRegression
import os

X, Y = datasets.load_diabetes(return_X_y=True)
threshold = 20
X_train, X_test = X[:-threshold], X[-threshold:]
Y_train, Y_test = Y[:-threshold], Y[-threshold:]
model = LinearRegression()
model.fit(X_train, Y_train)

Save Model

from pymilo import Export

PATH_TO_JSON_FILE = os.path.join(os.getcwd(),"test.json") 
exported_model.save(PATH_TO_JSON_FILE)

Output

>>> exported_model.to_json()

{
    "data": {
        "fit_intercept": true,
        "copy_X": true,
        "n_jobs": null,
        "positive": false,
        "n_features_in_": 10,
        "coef_": [
            0.30609424754267966,
            -237.63557011300716,
            510.53804765114097,
            327.7298779909887,
            -814.1119263534517,
            492.7995945034062,
            102.84123996793083,
            184.6034960903708,
            743.5093875957093,
            76.09664636971895
        ],
        "rank_": 10,
        "singular_": [
            1.9578051002417796,
            1.1797491126040702,
            1.0755406405377144,
            0.9579192686906345,
            0.7980638292867588,
            0.7594342409324799,
            0.7216957209064547,
            0.6459380350140406,
            0.27271507089040337,
            0.0915832239699
        ],
        "intercept_": 152.76429169049118
    },
    "sklearn_version": "1.3.0",
    "pymilo_version": "0.2",
    "model_type": "LinearRegression"
}

Load Model

from pymilo import Import
# Import the pymilo-exported model and get a real scikit model
imported_model = Import(PATH_TO_JSON_FILE)


PyMilo vs. the others

In the following, we name the most popular tools whose motivations are in line with PyMilo and point out their differences with PyMilo.


ONNX

ONNX is an open format that is built to represent machine learning models. ONNX defines a set of the building blocks of machine learning models. So, the models exported by ONNX are not exactly the same as the original model. In fact, before exporting, the models are reconstructed (approximately) by the building blocks provided by ONNX.

SKOPS

SKOPS is a Python library that helps you share your Scikit-learn-based models and put them into production. At the moment, it includes tools to easily integrate models on the Hugging Face Hub, which allows you to share your models, make them discoverable, and use the Hub’s API inference and widgets to get outputs of the model without having to download or load the model. However, its outputs are binary and have the risk of malicious code injection.

Tensorflow.js

TensorFlow.js is an open-source hardware-accelerated JavaScript library for training and deploying models trained with the TensorFlow library. It only supports TensorFlow models and is inefficient in dealing with heavy models.


It’s an “All-in-one”!


PyMilo vs. the others



Charting the future

PyMilo is still in its early stages of development and it currently supports only a limited number of machine learning models such as neural networks, trees, and linear models provided by Scikit-learn. However, in the near future, it will support other frameworks like PyTorch and Tensorflow as well. PyMilo plans to add two super major functionalities to its core facilities called, ML Streaming & Model Versioning.

ML Streaming

PyMilo server/client provides a novel and efficient way to utilize PyMilo exported models in web services. By utilizing PyMilo’s client, users can delegate calls to the PyMilo server, which hosts the trained model. This allows for easy interaction with the model, including prediction of data, retraining, and downloading for local use. This process is called “ML streaming”. ML streaming can provide the possibility of developing a marketplace-like platform for hosting pre-trained models, similar to Dockerhub.

PyMilo model streaming flow diagram

Model Versioning

We intend to upgrade PyMilo with a new ML versioning mechanism. Users have the option to request a particular version of an ML model from their own application, which keeps track of the changes that happened through versions. Transparency in the model and the applied changes will be improved by this mechanism.



Wait for it! There is more to come!


Also published here.