Mojo is a new programming language for AI development. The language is being developed by the Modular company. The developer says, that it combines the best of Python syntax with systems programming and metaprogramming and lets you write portable code that's faster than C and interoperates seamlessly with the Python ecosystem.
According to the official website, Mojo supports parallel threading and is 68,000 times faster than Python.
Mojo is becoming more and more popular and the community is growing actively. So, I decided to try to write my first data science project, but in Mojo language.
I am excited to try a new programming language and start learning it before its popularity explodes. This article is being written in December 2023, so this code is valid for that period.
So, I strongly believe this little guide will be valuable for someone interested in Mojo.
Code from the article is available in this GitHub repo
https://github.com/OberemokAlexandra/mojo_experiments/tree/main
There are two ways to get started with Mojo. The first is to install it manually on your laptop. Instructions for each operating system can be found here
https://developer.modular.com/download
Installation takes time - be prepared
But there is an easier and faster way to try out Mojo on your own: the free JupiterLab playground is available here:
https://playground.modular.com/
I set up all the components on my laptop, and all the code below was tested on the environment, which was configured by me on my laptop.
As it is my very first data project on Mojo - a regression task, I decided to choose a dataset from the sklearn library - diabetes dataset - https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html
Iris and Titanic datasets are the basis for classification tasks, so not today 🙂 .
This dataset contains measurements from 442 diabetic patients. There are 10 features with different magical measurements, and the target is a quantitative measure of disease progression over a year.
A great explanation of the features is given here https://rowannicholls.github.io/python/data/sklearn_datasets/diabetes.html
Mojo has built-in support for Python infrastructure - so it's quite easy to import a required module.
For example, a numpy import will look like this.
# first line imports a Python module in Mojo
from python import Python
# This line imports a numpy module itself
let np = Python.import_module("numpy")
Don’t forget to install the necessary modules in your environment first. For example,
pip install numpy
Variables in Mojo can be declared with two keywords - let and var. The main difference between them is that “let” is used to declare an immutable variable, while “var” is needed to state a mutable variable.
In a Mojo, there are two ways to declare a function - by using def and fn keywords.
More detailed information about the difference between the keywords can be found here: https://docs.modular.com/mojo/manual/functions.html
Here is a function to return a dataset for further modeling.
fn get_data() raises -> PythonObject:
# Import of sklearn module with ready datasets
# It’s possible to import a whole module
let datasets = Python.import_module("sklearn.datasets")
let ds = datasets.load_diabetes()
return ds
fn main() raises:
# gets data from previous function
let ds = get_data()
let X = ds['data']
let y = ds['target']
# import needed sklearn module for train_test split
let model_selection = Python.import_module("sklearn.model_selection")
let split = model_selection.train_test_split(X, y)
# Unpacking looks much harder than in python
let X_train = split[0]
let X_test = split[1]
let y_train = split[2]
let y_test = split[3]
# modeling process
let linear_model = Python.import_module("sklearn.linear_model")
var regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
let pred = regr.predict(X_test)
Let’s evaluate the regression model and see the results
let metrics = Python.import_module("sklearn.metrics")
print("mean")
print(np.mean(y_test))
print("mse")
print(metrics.mean_squared_error(y_test, pred))
print("mae")
print(metrics.mean_absolute_error(y_test, pred))
print("r2")
print(metrics.r2_score(y_test, pred))
The results look like this
Warning: your results may differ from those given here due to the random splitting of the data into training and test samples and model evaluation procedure
Brief metrics analysis and explanation:
I’ve rewritten the same code in Python. To make the experiment clear and honest, I maintained the same settings on Python (trust me, you can make much more using Python).
Here is the code listed below
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import numpy as np
def get_data():
"""returns dataset for modeling"""
return load_diabetes()
def main():
"""performs all the modeling actions"""
ds = get_data()
X, y = ds['data'], ds['target']
X_train, X_test, y_train, y_test = train_test_split(X, y)
regr = LinearRegression()
regr.fit(X_train, y_train)
pred = regr.predict(X_test)
print("mean")
print(np.mean(y_test))
print("mse")
print(mean_squared_error(y_test, pred))
print("mae")
print(mean_absolute_error(y_test, pred))
print("r2")
print(r2_score(y_test, pred))
It is seen that modeling results using Python are comparable.
The challenges below are valid for December 2023. By the time you are reading this article, they may have already been fixed
For now, the hardest part - is that it’s almost impossible to pass arguments easily to a Python module in Mojo
data = load_diabetes(as_frame=True)
Nowadays, I can’t pass parameters like as_frame=True in Mojo code. It can lead to difficulties when using and tuning complex models
Based on information from Github, this feature is on request to develop
issue link - https://github.com/modularml/mojo/issues/702
I couldn't unpack values in “the Python way.” I can't write a line of code like this
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
I only could do something like this
let split = model_selection.train_test_split(X, y)
let X_train = split[0]
let X_test = split[1]
let y_train = split[2]
let y_test = split[3]
Mojo is a new programming language under development. Today, it is possible to write code with a simple regression model. Some features need to be introduced for more development abilities, but I strongly believe that the new programming language has potential and will gain more popularity in the future.