Machine learning experiments require extensive parametrization, including optimizer parameters, network architecture, and data augmentation. However, we strive for concise, readable code instead of a cumbersome 200 lines dedicated to argparse. Our goal is to focus on programming logic rather than threading new parameters through function signatures.
Additionally, we seek a structure that is easily expandable without burdening the project, ensuring the reproducibility of experiments.
Hydra offers a solution to these challenges. Below, you will find a basic guide on how to use it.
Hydra is a library with rich capabilities for managing configurations. The main site describes the name like this:
“The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.
The name Hydra comes from its ability to run multiple similar jobs - much like a Hydra with multiple heads.”
But I have my own interpretation of its name. It is just a combination of some tools that work together impressively well!
Hydra offers a seamless solution to the common headaches faced by ML engineers when attempting to replicate experiments. It elegantly replaces the need for argparse or YAML configurations, allowing access to parameters both from the command line and YAML files.
Consider the pain points:
Hydra addresses these issues by enabling dynamic configuration adjustments without the need for multiple bulky files or rigid command-line arguments.
Furthermore, it simplifies the process of passing complex configurations, such as model architectures or functions, directly from the config file to the model. This capability eliminates the tedious task of manually feeding parameters into the model, streamlining the workflow and reducing the margin for error.
Let's imagine the simplest setup: multiclass classification on MNIST using an MLP. We have a configuration and a training script.
.
├── configs
│ └── config.yaml
└── main.py
The main script can look like this.
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import hydra
from omegaconf import DictConfig
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.model = nn.Sequential(
nn.Linear(28*28, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10)
)
def forward(self, x):
return self.model(x.view(-1, 28*28))
@hydra.main(version_base=None, config_path=None, config_name="config")
def main(cfg: DictConfig):
# Load MNIST dataset
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=cfg.batch_size, shuffle=True)
# Initialize the network, loss function, and optimizer
model = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=cfg.lr, momentum=cfg.momentum)
# Train the network
for epoch in range(cfg.epochs): # loop over the dataset multiple times
for i, (inputs, labels) in enumerate(train_loader, 0):
optimizer.zero_grad() # zero the parameter gradients
outputs = model(inputs) # forward pass
loss = criterion(outputs, labels) # calculate loss
loss.backward() # backward pass
optimizer.step() # optimize
print('Finished Training')
if __name__ == "__main__":
main()
The configuration has the following structure. It is the default, and parameters from it will be used by default if not specified otherwise.
# conf/config.yaml
batch_size: 64
lr: 0.01
momentum: 0.9
epochs: 1
To start training you can use:
python main.py
And for changing parameters, you can do it not only from YAML but also from cli:
python main.py lr=0.03
If you use it, parameters from cli override the first parameters.
As you can see, there is no argparse or additional middleware. It's easy to change parameters from the command line.
We see that in class Net, the architecture can be highly customizable. Even more, we can use also another net for this, e.g. CNN. But we do not want to change our pipeline.
Hydra can construct almost any Python object with specified parameters. Let’s describe our net in another way, using YAML.
model:
_target_: torch.nn.Sequential
layers:
- _target_: torch.nn.Flatten
- _target_: torch.nn.Linear
in_features: 784 # 28x28 images are flattened into 784
out_features: 128
- _target_: torch.nn.ReLU
- _target_: torch.nn.Linear
in_features: 128
out_features: 64
- _target_: torch.nn.ReLU
- _target_: torch.nn.Linear
in_features: 64
out_features: 10
and for CNN
_target_: torch.nn.Sequential
layers:
- _target_: torch.nn.Conv2d
in_channels: 1 # MNIST images are grayscale, so 1 input channel
out_channels: 32 # Number of output channels
kernel_size: 3 # Size of the convolutional kernel
stride: 1
padding: 1
- _target_: torch.nn.ReLU
- _target_: torch.nn.MaxPool2d
kernel_size: 2 # Pooling window size
stride: 2
- _target_: torch.nn.Conv2d
in_channels: 32
out_channels: 64
kernel_size: 3
stride: 1
padding: 1
- _target_: torch.nn.ReLU
- _target_: torch.nn.MaxPool2d
kernel_size: 2
stride: 2
- _target_: torch.nn.Flatten # Flatten the output for the fully connected layer
- _target_: torch.nn.Linear
in_features: 7*7*64 # Size after convolutions and pooling
out_features: 128
- _target_: torch.nn.ReLU
- _target_: torch.nn.Dropout
p: 0.5 # Dropout rate
- _target_: torch.nn.Linear
in_features: 128
out_features: 10 # Number of classes in MNIST
Sometimes, our parameters can also function. For example,
So, you can use Compose API.
from hydra import compose, initialize from omegaconf import OmegaConf
with initialize(version_base=None, config_path="conf", job_name="run_0001"):
cfg = compose(config_name="config", overrides= ["parameter=value"])
print (OmegaConf.to_yaml(cfg))
For example, you want to find some good hyperparam and Hydra allows a lot for that. Basically, you can launch several experiments sequentially like.
python train_model.py -m "batch_size=16,32,64"
It launches sequentially, but it is easy to extend to parallel if turned on some launcher like joblib.
python train_model.py -m "batch_size=16,32,64 hydra/launcher=joblib"
for config_file in configs/*.yaml; do python main.py --config-name="${config_file}"; done