For organizations looking for a way to “democratize” data science, it is a must that data models are accessible to the enterprise in a very simple way. In our context, this is part of “model operationalization.” There are other solutions out there to serve data models which is a very common problem for data scientists.
We’ve thought of having a REST API to handle running data models on a HTTP post with new data.
Then, came MLFlow — which allows serving data models as REST API without the complicated setup.
To serve models using MLFlow, we did the following:
1. Save R models as RDS format using the saveRDS function.
2. Convert RDS format into MLFLOW flavor using the mlflow_save_model().
model = readRDS(“./LR_Model.rds”)
predictor <- crate(~ stats::predict.lm(model, as.data.frame(.x)), model)
mlflow_save_model(predictor, “< path_to_save >”)
3. Commit to Git.
4. Setup a server to host the model. Clone git repo.
5. Serve model using the mlflow_rfunc_serve().
mlflow_rfunc_serve("< path_to_save >", run_uuid = NULL, host = "127.0.0.1",port = 8090)
6. On users’ jupyter notebook, we have the following R lines to query the rest api endpoint with the output variable storing the model response.
request_body_json <- toJSON(feature_dataframe[column_name_features],dataframe='rows')
request_body_json
result = POST(url = paste0('http://127.0.0.1:8090/','predict/'),body=request_body_json,add_headers(.headers = c("Content-Type"="application/json","Accept" = "application/json")))
output <- content(result)
7. On python, a similar function is available
import requests
import json
def mlflow_predict(dd,model_name='LR12'):
url = "http://127.0.0.1:8090/predict/"
headers = {"Accept": "application/json", "Content-Type":"application/json"}
resp = requests.post(url=url,json=dd,headers=headers)
res=json.loads(resp.content.decode("utf-8"))
return res['predictions']
If you’re deploying on AWS, make sure to set host=”0.0.0.0" to make it listen to the public internet address.
For Python-based models, MLFLow supports deploying to SageMaker.
If you have multiple models to serve, MLFLOW assigns a port to each of them.
Make sure you have a working R and Python installations. We used anaconda3 to setup the environment. Also, at least 1GB of RAM is needed to get R running with MLFlow in AWS LightSail.