Before you go, check out these stories!

0
Hackernoon logoTracking PyTorch Lightning Experiments Using NeptuneAI by@jakubczakon

Tracking PyTorch Lightning Experiments Using NeptuneAI

;
Author profile picture

@jakubczakonjakubczakon

Senior data scientist building experiment tracking tools for ML projects at https://neptune.ai

Working with PyTorch Lightning and wondering which logger should you choose to keep track of your experiments?

Thinking of using PyTorch Lightning to structure your Deep Learning code and wouldn’t mind learning about it’s logging functionality?

Didn’t know that Lightning has a pretty awesome Neptune integration?

This article is (very likely) for you.

Why PyTorch Lightning and Neptune?

If you never heard of it, PyTorch Lightning is a very lightweight wrapper on top of PyTorch which is more like a coding standard than a framework. The format allows you to get rid of a ton of boilerplate code while keeping it easy to follow.

The result is a framework that gives researchers, students, and production teams the ultimate flexibility to try crazy ideas without having to learn yet another framework while automating away all the engineering details.

Some great features that you can get out-of-the-box are:

  • Train on CPU, GPU or TPUs without changing your code,
  • Trivial multi-GPU and multi-node training
  • Trivial 16 bit precision support
  • Built-in performance profiler (
    Trainer(profile=True)
    )

and a ton of other great functionalities.

But with this great power of running experiments easily and flexibility in tweaking anything you want comes a problem.

How to keep track of all the changes like:

  • losses and metrics,
  • hyperparameters,
  • model binaries,
  • validation predictions,

and other things that will help you organize your experimentation process?

Fortunately, PyTorch lightning gives you an option to easily connect loggers to the

pl.Trainer
and one of the supported loggers that can track all of the things mentioned before (and many others) is the
NeptuneLogger
which saves your experiments in… you guessed it Neptune.

Neptune not only tracks your experiment artifacts but also:

  • let’s you monitor everything live,
  • gives you a nice UI where you can filter, group and compare various experiment runs
  • access experiment data that you logged programmatically from a Python script or Jupyter Notebook

The best part is that this integration really is trivial to use.

Let me show you how it looks.

Note:
You can also check out this colab notebook and play with the examples we will talk about yourself.

Basic Integration

In the simplest case you just create the

NeptuneLogger
:

from pytorch_lightning.logging.neptune import NeptuneLogger
neptune_logger = NeptuneLogger(
    api_key="ANONYMOUS",
    project_name="shared/pytorch-lightning-integration")

and pass it to the logger argument of

Trainer
and fit your model.

from pytorch_lightning import Trainer
trainer = Trainer(logger=neptune_logger)
trainer.fit(model)

By doing so you get your:

  • Metrics and losses logged and charts created,
  • Hyperparameters saved (if defined via lightning hparams)
  • ,Hardware utilization logged
  • Git info and execution script logged

Check out this experiment.

You can monitor your experiments, compare them, and share them with others.

Not too bad for a 4-liner.

But with just a bit more effort you can get a lot more.

Advanced Options

Neptune gives you a lot of customization options and you can simply log more experiment-specific things, like image predictions, model weights, performance charts and more.

All of that functionality is available for Lightning users and in the next sections I will show you how to leverage Neptune to the fullest.

Logging extra information at NeptuneLogger creation

When you are creating the logger you can log additional useful information:

  • code: snapshot scripts, jupyter notebooks, config files, and more
  • hyperparameters: log learning rate, number of epochs and other things (if you are using lightning
    hparams
    object from lightning it will be logged automatically)
  • properties: log data locations, data versions, or other things
  • tags: add tags like “resnet50” or “no-augmentation” to organize your runs.
  • name: every experiment deserves a meaningful name so let’s not use “default” every time 🙂 shall we

Just pass this information to your logger:

neptune_logger = NeptuneLogger(
    api_key="ANONYMOUS",
    project_name="shared/pytorch-lightning-integration",
    experiment_name="default",  # Optional,
    params={"max_epochs": 10,
            "batch_size": 32},  # Optional,
    tags=["pytorch-lightning", "mlp"]  # Optional,
    upload_source_files=["**/*.py", "*.yaml"]  # Optional,
)

… and proceed as before to get an organized dashboard like this one.

Logging extra things during training

A lot of interesting information can be logged during training.

You may be interested in monitoring things like:

  • model predictions after each epochs (think prediction masks or overlaid bounding boxes)
  • diagnostic charts like ROC AUC curve or Confusion Matrix
  • model checkpoints, or other objects

It is really simple. Just go to your

LightningModule
and call methods of the Neptune experiment available as
self.logger.experiment
.

For example, we can log histograms of losses after each epoch:

class CoolSystem(pl.LightningModule):

    def validation_end(self, outputs):
        # OPTIONAL
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        tensorboard_logs = {'val_loss': avg_loss}
       
        # log debugging images like histogram of losses
        fig = plt.figure()
        losses = np.stack([x['val_loss'].numpy() for x in outputs])
        plt.hist(losses)
        self.logger.experiment.log_image('loss_histograms', fig)
    plt.close(fig)
 
        return {'avg_val_loss': avg_loss, 'log': tensorboard_logs}

Explore them for yourself.

Other things you may want to log during training are:

  • self.logger.experiment.log_metric
    # log custom metrics
  • self.logger.experiment.log_text
    # log text values
  • self.;ogger.experiment.log_artifact
    # log files
  • self.logger.experiment.log_image
    # log images, charts
  • self.logger.experiment.set_property
    # add key:value pairs
  • self.logger.experiment.append_tag
    # add tags for organization

Pretty cool right?

But … that is not all you can do!

Logging things after training has finished

Tracking your experiment doesn’t have to finish after your .fit loop ends.

You may want to track the metrics of the

trainer.test(model)
or calculate some additional validation metrics and log them.

To do that you just need to tell

NeptuneLogger
not to close after fit:

neptune_logger = NeptuneLogger(
    api_key="ANONYMOUS",
    project_name="shared/pytorch-lightning-integration",
    close_after_fit=False,
    ...
)

… and you can keep logging 🙂

Test metrics:

trainer.test(model)

Additional (external) metrics:

from sklearn.metrics import accuracy_score
...
accuracy = accuracy_score(y_true, y_pred)
neptune_logger.experiment.log_metric('test_accuracy', accuracy)

Performance charts on test set:

from scikitplot.metrics import plot_confusion_matrix
import matplotlib.pyplot as plt
...
fig, ax = plt.subplots(figsize=(16, 12))
plot_confusion_matrix(y_true, y_pred, ax=ax)
neptune_logger.experiment.log_image('confusion_matrix', fig)

The whole model checkpoints directory:

neptune_logger.experiment.log_artifact('my/checkpoints')

Go to this experiment to see how those objects are logged:

But … there is even more!

Neptune lets you fetch experiments after training.

Let me show you how.

Fetching your experiment information directly to the notebooks

You can fetch experiments after they have finished, analyze the results and update metrics, artifacts or other things if you want to.

For example, let’s fetch the experiments dashboard to a pandas DataFrame:

import neptune

project = neptune.init('shared/pytorch-lightning-integration')
project.get_leaderboard().head()

or visualize it with HiPlot via neptune HiPlot integration:

from neptunecontrib.viz import make_parallel_coordinates_plot

make_parallel_coordinates_plot(
           metrics= ['train_loss', 'val_loss', 'test_accuracy'],
           params = ['max_epochs', 'batch_size', 'lr'])

or fetch a single experiment and update it with some external metric calculated after training:

exp = project.get_experiments(id='PYTOR-63')[0]
exp.log_metric('some_external_metric', 0.92)

As you can see there are a lot of things you can log to Neptune from Pytorch Lightning.

If you want to go deeper into this:

Final Thought

Pytorch Lightning is a great library that helps you with:

  • organizing your deep learning code to make it easily understandable to other people,
  • outsourcing development boilerplate to a team of seasoned engineers,
  • accessing a lot of state-of-the-art functionalities with almost no changes to your code

With Neptune integration, you get some additional things for free:

  • you can monitor and keep track of your deep learning experiments
  • you can share your research with other people easily
  • you and your team can access experiment metadata and collaborate more efficiently.

Hopefully, with all that power you will know exactly what you (and other people) tried and your deep learning research will be moving at a lightning speed 🙂

This article was originally on the Neptune blog where you can find more in-depth articles for machine learning practitioners.

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.