Machine learning experiments require extensive parametrization, including optimizer parameters, network architecture, and data augmentation. However, we strive for concise, readable code instead of a cumbersome 200 lines dedicated to argparse. Our goal is to focus on programming logic rather than threading new parameters through function signatures. Additionally, we seek a structure that is easily expandable without burdening the project, ensuring the reproducibility of experiments. Hydra offers a solution to these challenges. Below, you will find a basic guide on how to use it. What is Hydra? Hydra is a library with rich capabilities for managing configurations. The main site describes the name like this: “The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.The name Hydra comes from its ability to run multiple similar jobs - much like a Hydra with multiple heads.” But I have my own interpretation of its name. It is just a combination of some tools that work together impressively well! Hydra offers a seamless solution to the common headaches faced by ML engineers when attempting to replicate experiments. It elegantly replaces the need for argparse or YAML configurations, allowing access to parameters both from the command line and YAML files. Consider the pain points: Replicating experiments using argparse forces reliance on string inputs and complicates YAML config launches. Relying solely on YAML files for configuration leads to duplication and potential errors when only a single parameter needs alteration. Hydra addresses these issues by enabling dynamic configuration adjustments without the need for multiple bulky files or rigid command-line arguments. Furthermore, it simplifies the process of passing complex configurations, such as model architectures or functions, directly from the config file to the model. This capability eliminates the tedious task of manually feeding parameters into the model, streamlining the workflow and reducing the margin for error. Basic Setup Let's imagine the simplest setup: multiclass classification on MNIST using an MLP. We have a configuration and a training script. . ├── configs │ └── config.yaml └── main.py The main script can look like this. import torch import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms from torch.utils.data import DataLoader import hydra from omegaconf import DictConfig class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.model = nn.Sequential( nn.Linear(28*28, 128), nn.ReLU(), nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, 10) ) def forward(self, x): return self.model(x.view(-1, 28*28)) @hydra.main(version_base=None, config_path=None, config_name="config") def main(cfg: DictConfig): # Load MNIST dataset transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) ]) train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=cfg.batch_size, shuffle=True) # Initialize the network, loss function, and optimizer model = Net() criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=cfg.lr, momentum=cfg.momentum) # Train the network for epoch in range(cfg.epochs): # loop over the dataset multiple times for i, (inputs, labels) in enumerate(train_loader, 0): optimizer.zero_grad() # zero the parameter gradients outputs = model(inputs) # forward pass loss = criterion(outputs, labels) # calculate loss loss.backward() # backward pass optimizer.step() # optimize print('Finished Training') if __name__ == "__main__": main() The configuration has the following structure. It is the default, and parameters from it will be used by default if not specified otherwise. # conf/config.yaml batch_size: 64 lr: 0.01 momentum: 0.9 epochs: 1 To start training you can use: python main.py And for changing parameters, you can do it not only from YAML but also from cli: python main.py lr=0.03 If you use it, parameters from cli override the first parameters. As you can see, there is no argparse or additional middleware. It's easy to change parameters from the command line. Pass Class Object With Hydra We see that in class Net, the architecture can be highly customizable. Even more, we can use also another net for this, e.g. CNN. But we do not want to change our pipeline. Hydra can construct almost any Python object with specified parameters. Let’s describe our net in another way, using YAML. model: _target_: torch.nn.Sequential layers: - _target_: torch.nn.Flatten - _target_: torch.nn.Linear in_features: 784 # 28x28 images are flattened into 784 out_features: 128 - _target_: torch.nn.ReLU - _target_: torch.nn.Linear in_features: 128 out_features: 64 - _target_: torch.nn.ReLU - _target_: torch.nn.Linear in_features: 64 out_features: 10 and for CNN _target_: torch.nn.Sequential layers: - _target_: torch.nn.Conv2d in_channels: 1 # MNIST images are grayscale, so 1 input channel out_channels: 32 # Number of output channels kernel_size: 3 # Size of the convolutional kernel stride: 1 padding: 1 - _target_: torch.nn.ReLU - _target_: torch.nn.MaxPool2d kernel_size: 2 # Pooling window size stride: 2 - _target_: torch.nn.Conv2d in_channels: 32 out_channels: 64 kernel_size: 3 stride: 1 padding: 1 - _target_: torch.nn.ReLU - _target_: torch.nn.MaxPool2d kernel_size: 2 stride: 2 - _target_: torch.nn.Flatten # Flatten the output for the fully connected layer - _target_: torch.nn.Linear in_features: 7*7*64 # Size after convolutions and pooling out_features: 128 - _target_: torch.nn.ReLU - _target_: torch.nn.Dropout p: 0.5 # Dropout rate - _target_: torch.nn.Linear in_features: 128 out_features: 10 # Number of classes in MNIST Pass Function Through Config Sometimes, our parameters can also function. For example, How to Access Config Without Decorating Main - e.g., in ipython Notebook. So, you can use Compose API. from hydra import compose, initialize from omegaconf import OmegaConf with initialize(version_base=None, config_path="conf", job_name="run_0001"): cfg = compose(config_name="config", overrides= ["parameter=value"]) print (OmegaConf.to_yaml(cfg)) Multi Runs For example, you want to find some good hyperparam and Hydra allows a lot for that. Basically, you can launch several experiments sequentially like. python train_model.py -m "batch_size=16,32,64" It launches sequentially, but it is easy to extend to parallel if turned on some launcher like joblib. python train_model.py -m "batch_size=16,32,64 hydra/launcher=joblib" Run All Experiments From the Folder for config_file in configs/*.yaml; do python main.py --config-name="${config_file}"; done Resources hydra - https://hydra.cc/ good template for DL - https://github.com/ashleve/lightning-hydra-template/tree/main Machine learning experiments require extensive parametrization, including optimizer parameters, network architecture, and data augmentation. However, we strive for concise, readable code instead of a cumbersome 200 lines dedicated to argparse. Our goal is to focus on programming logic rather than threading new parameters through function signatures. Additionally, we seek a structure that is easily expandable without burdening the project, ensuring the reproducibility of experiments. Hydra offers a solution to these challenges. Below, you will find a basic guide on how to use it. What is Hydra? Hydra is a library with rich capabilities for managing configurations. The main site describes the name like this: “The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.The name Hydra comes from its ability to run multiple similar jobs - much like a Hydra with multiple heads.” “The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line. The name Hydra comes from its ability to run multiple similar jobs - much like a Hydra with multiple heads.” But I have my own interpretation of its name. It is just a combination of some tools that work together impressively well! Hydra offers a seamless solution to the common headaches faced by ML engineers when attempting to replicate experiments. It elegantly replaces the need for argparse or YAML configurations, allowing access to parameters both from the command line and YAML files. Consider the pain points: Replicating experiments using argparse forces reliance on string inputs and complicates YAML config launches. Replicating experiments using argparse forces reliance on string inputs and complicates YAML config launches. Relying solely on YAML files for configuration leads to duplication and potential errors when only a single parameter needs alteration. Relying solely on YAML files for configuration leads to duplication and potential errors when only a single parameter needs alteration. Hydra addresses these issues by enabling dynamic configuration adjustments without the need for multiple bulky files or rigid command-line arguments. Furthermore, it simplifies the process of passing complex configurations, such as model architectures or functions, directly from the config file to the model. This capability eliminates the tedious task of manually feeding parameters into the model, streamlining the workflow and reducing the margin for error. Basic Setup Let's imagine the simplest setup: multiclass classification on MNIST using an MLP. We have a configuration and a training script. . ├── configs │ └── config.yaml └── main.py . ├── configs │ └── config.yaml └── main.py The main script can look like this. import torch import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms from torch.utils.data import DataLoader import hydra from omegaconf import DictConfig class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.model = nn.Sequential( nn.Linear(28*28, 128), nn.ReLU(), nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, 10) ) def forward(self, x): return self.model(x.view(-1, 28*28)) @hydra.main(version_base=None, config_path=None, config_name="config") def main(cfg: DictConfig): # Load MNIST dataset transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) ]) train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=cfg.batch_size, shuffle=True) # Initialize the network, loss function, and optimizer model = Net() criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=cfg.lr, momentum=cfg.momentum) # Train the network for epoch in range(cfg.epochs): # loop over the dataset multiple times for i, (inputs, labels) in enumerate(train_loader, 0): optimizer.zero_grad() # zero the parameter gradients outputs = model(inputs) # forward pass loss = criterion(outputs, labels) # calculate loss loss.backward() # backward pass optimizer.step() # optimize print('Finished Training') if __name__ == "__main__": main() import torch import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms from torch.utils.data import DataLoader import hydra from omegaconf import DictConfig class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.model = nn.Sequential( nn.Linear(28*28, 128), nn.ReLU(), nn.Linear(128, 64), nn.ReLU(), nn.Linear(64, 10) ) def forward(self, x): return self.model(x.view(-1, 28*28)) @hydra.main(version_base=None, config_path=None, config_name="config") def main(cfg: DictConfig): # Load MNIST dataset transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) ]) train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=cfg.batch_size, shuffle=True) # Initialize the network, loss function, and optimizer model = Net() criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=cfg.lr, momentum=cfg.momentum) # Train the network for epoch in range(cfg.epochs): # loop over the dataset multiple times for i, (inputs, labels) in enumerate(train_loader, 0): optimizer.zero_grad() # zero the parameter gradients outputs = model(inputs) # forward pass loss = criterion(outputs, labels) # calculate loss loss.backward() # backward pass optimizer.step() # optimize print('Finished Training') if __name__ == "__main__": main() The configuration has the following structure. It is the default, and parameters from it will be used by default if not specified otherwise. # conf/config.yaml batch_size: 64 lr: 0.01 momentum: 0.9 epochs: 1 # conf/config.yaml batch_size: 64 lr: 0.01 momentum: 0.9 epochs: 1 To start training you can use: python main.py python main.py And for changing parameters, you can do it not only from YAML but also from cli: python main.py lr=0.03 python main.py lr=0.03 If you use it, parameters from cli override the first parameters. As you can see, there is no argparse or additional middleware. It's easy to change parameters from the command line. Pass Class Object With Hydra We see that in class Net, the architecture can be highly customizable. Even more, we can use also another net for this, e.g. CNN. But we do not want to change our pipeline. Hydra can construct almost any Python object with specified parameters. Let’s describe our net in another way, using YAML. model: _target_: torch.nn.Sequential layers: - _target_: torch.nn.Flatten - _target_: torch.nn.Linear in_features: 784 # 28x28 images are flattened into 784 out_features: 128 - _target_: torch.nn.ReLU - _target_: torch.nn.Linear in_features: 128 out_features: 64 - _target_: torch.nn.ReLU - _target_: torch.nn.Linear in_features: 64 out_features: 10 model: _target_: torch.nn.Sequential layers: - _target_: torch.nn.Flatten - _target_: torch.nn.Linear in_features: 784 # 28x28 images are flattened into 784 out_features: 128 - _target_: torch.nn.ReLU - _target_: torch.nn.Linear in_features: 128 out_features: 64 - _target_: torch.nn.ReLU - _target_: torch.nn.Linear in_features: 64 out_features: 10 and for CNN _target_: torch.nn.Sequential layers: - _target_: torch.nn.Conv2d in_channels: 1 # MNIST images are grayscale, so 1 input channel out_channels: 32 # Number of output channels kernel_size: 3 # Size of the convolutional kernel stride: 1 padding: 1 - _target_: torch.nn.ReLU - _target_: torch.nn.MaxPool2d kernel_size: 2 # Pooling window size stride: 2 - _target_: torch.nn.Conv2d in_channels: 32 out_channels: 64 kernel_size: 3 stride: 1 padding: 1 - _target_: torch.nn.ReLU - _target_: torch.nn.MaxPool2d kernel_size: 2 stride: 2 - _target_: torch.nn.Flatten # Flatten the output for the fully connected layer - _target_: torch.nn.Linear in_features: 7*7*64 # Size after convolutions and pooling out_features: 128 - _target_: torch.nn.ReLU - _target_: torch.nn.Dropout p: 0.5 # Dropout rate - _target_: torch.nn.Linear in_features: 128 out_features: 10 # Number of classes in MNIST _target_: torch.nn.Sequential layers: - _target_: torch.nn.Conv2d in_channels: 1 # MNIST images are grayscale, so 1 input channel out_channels: 32 # Number of output channels kernel_size: 3 # Size of the convolutional kernel stride: 1 padding: 1 - _target_: torch.nn.ReLU - _target_: torch.nn.MaxPool2d kernel_size: 2 # Pooling window size stride: 2 - _target_: torch.nn.Conv2d in_channels: 32 out_channels: 64 kernel_size: 3 stride: 1 padding: 1 - _target_: torch.nn.ReLU - _target_: torch.nn.MaxPool2d kernel_size: 2 stride: 2 - _target_: torch.nn.Flatten # Flatten the output for the fully connected layer - _target_: torch.nn.Linear in_features: 7*7*64 # Size after convolutions and pooling out_features: 128 - _target_: torch.nn.ReLU - _target_: torch.nn.Dropout p: 0.5 # Dropout rate - _target_: torch.nn.Linear in_features: 128 out_features: 10 # Number of classes in MNIST Pass Function Through Config Sometimes, our parameters can also function. For example, How to Access Config Without Decorating Main - e.g., in ipython Notebook. So, you can use Compose API . Compose API from hydra import compose, initialize from omegaconf import OmegaConf with initialize(version_base=None, config_path="conf", job_name="run_0001"): cfg = compose(config_name="config", overrides= ["parameter=value"]) print (OmegaConf.to_yaml(cfg)) from hydra import compose, initialize from omegaconf import OmegaConf with initialize(version_base=None, config_path="conf", job_name="run_0001"): cfg = compose(config_name="config", overrides= ["parameter=value"]) print (OmegaConf.to_yaml(cfg)) Multi Runs For example, you want to find some good hyperparam and Hydra allows a lot for that. Basically, you can launch several experiments sequentially like. python train_model.py -m "batch_size=16,32,64" python train_model.py -m "batch_size=16,32,64" It launches sequentially, but it is easy to extend to parallel if turned on some launcher like joblib. python train_model.py -m "batch_size=16,32,64 hydra/launcher=joblib" python train_model.py -m "batch_size=16,32,64 hydra/launcher=joblib" Run All Experiments From the Folder for config_file in configs/*.yaml; do python main.py --config-name="${config_file}"; done for config_file in configs/*.yaml; do python main.py --config-name="${config_file}"; done Resources hydra - https://hydra.cc/ good template for DL - https://github.com/ashleve/lightning-hydra-template/tree/main hydra - https://hydra.cc/ https://hydra.cc/ good template for DL - https://github.com/ashleve/lightning-hydra-template/tree/main https://github.com/ashleve/lightning-hydra-template/tree/main