Hello There! Consider a scenario - a lone Data Scientist works away at her system trying to wade through a huge amount of data; cleaning, sorting, processing, and then building a model to run prediction on the newly processed data. The scientist has a bunch of tools at her disposal - Jupyter Notebooks, Airflow, Anaconda, Pandas, data storage, and a cloud virtual machine. She trains it for hours and hours, only to fall short of perfection - the model doesn’t perform as well as it should have. She looks out the window - it’s nightfall already. She has yet to test her model with a different set of parameters and track a set of different metrics of her experiments. She switches off her system, calls it a day, and will try the next day with another model, a different approach with a bunch of new data and parameters. This is a long process that might stretch for days…weeks…and months. It is difficult to jump back to a point when she had tried a specific combination of parameters for the experiment, knowledge is sometimes lost, as all the experiments and every artifact related to the model might not be saved. Tracking is crucial for the improvement of the ML model. I think this lone ranger scenario can be avoided if we had a comprehensive IDE-style environment where we can run multiple experiments, do data management, and track our code, experiment metrics, plots, model, and data artifacts as well. How cool would that be? Sounds too good to be true, but this is what is attempting to do. DVC VSCode Extension is an excellent tool to track your experiments, models, and related artifacts, but it’s a CLI - which many in the data science community might not be comfortable or familiar with. DVC Gone are the days when you had to learn a bunch of pesky CLI commands like this: Using DVC got a whole lot easier and more fun. DVC VSCode Extension Iterative Team brings you a VS Code extension that combines the power of DVC CLI commands for data management, versioning, and experimentation with the sleek elegant coding experience of Visual Studio Code IDE. The extension in its current form provides you with the following features: 1. Command Palette Integrated into VS Code command palette menu. Press F1 to open the palette and type DVC to view a whole bunch of DVC-related commands at your disposal. 2. Experiments Table Gives you an in-depth view of the experiments run in the workspace. The equivalent of the command in the CLI mode. dvc exp show 3. Plots / Live Plots You can view the plots generated by the experiment run in the workspace. Can compare the plots of different experiments. Even view the plots updated in real-time. 4. Source Control Management You can check the status of the workspace using this feature. You can , , , & from this view. dvc checkout dvc commit dvc add dvc push dvc pull 5. Tracked Artifacts - Datasets, Models, and Tokenizers A small window for tracking your resources in the workspace. From here you can perform file actions, & specific resources and manage the data within tracked datasets. push pull 6. DVC View Container / Tray The View Container can be activated by clicking the DVC icon in VS Code icon bar. It gives general information about the experiments and resources in the workspace. Here are some advantages compared to CLI alone when you use the extension: Hides the complexity of the CLI and removes friction from the experience. Enhancing existing and providing extra visualizations. Moving the data science workflows into the build context - fewer unexpected breaks in focus time. View experiment performance in real-time Everybody loves VS Code ❤️🙂 DVC Extenstion - Getting Started Using the DVC Extension can be summarized into 4 steps Installation - (One time) Setting up your project and data Experimentation Plotting Graphs and Model Evaluation Installation Make sure you have DVC installed on your system. You can run the following command in your terminal: $ pip3 install dvc Or you can follow the guide given . here for OS-specific installation Go to VS Code and in the extension menu, search for DVC. Click Install. https://www.youtube.com/watch?v=INjOkuanRpc Now you have the DVC extension ready to go. To get familiar with the usage of the extension we will download a sample ML project Download Sample Project You can download the sample project from the . Open the folder in VS Code. The DVC extension should detect the DVC binary and the python environment. repo If you have a specific environment you can and select press F1 DVC: Setup The Workspace Provide the compiler path and the python environment binary path. Using the DVC Extension You can view the DVC experiments in the current workspace in the DVC view container tab. Pulling Data To begin our experimentation, we need to pull the data. to open VS Code command palette  and select Press F1 DVC: Pull You can view the output by selecting DVC: Show DVC Output Note: As of now the team is still working on the DVC remote storage option in the VS Code plugin, you will have to set your storage remote via command line or config file Experimentation You can change the parameters in the file and select in the VS Code command palette. params.yaml DVC: Modify Experiment Param(s),Rest and Run https://www.youtube.com/watch?v=buuoKsGZvvo Plots / Live Plots You can check your experiments and view the plotted graphs using the extension as well. And the cherry on top is that the extension allows you to cherry-pick your experiments. Pun Intended! https://www.youtube.com/watch?v=N0VdjyQCo3Q That’s not all, you can run individual experiments and change specific parameters. If you wish to view your graphs live, for experiments that take a lot of time - say a DL model maybe with a lot of epochs. You can view them in real-time as well. Just run your experiment and click on the plots button in the DVC tray. https://www.youtube.com/watch?v=ov5ScDPV6Rw When all is well and done, you can commit and push your changes as well. The Iterative team is going to add more exciting features to the extension soon. Stay tuned. Don’t let us keep you, go ahead and start experimenting. Happy DVC time! A bit of parting philosophy As an ML Ops practitioner, I deal with various challenges when working with different data science teams. There are various tools available in the market - both paid and open-source. I tend to lean towards open-source tools, as there is a kinship with a community that is actively helping out strangers across the world solve similar problems. This approach is of great significance for the ML community as we are still in the adoption stage where a good tool can help your solve your problems faster and with more confidence. A centralized tool integrated with multiple stages of the ML pipeline goes a long way in helping the data science teams solve problems; they can focus more on the model improvement than on the infrastructure and setups -  this is what drew me to the DVC tool. A shout out to the team at for creating this wonderful extension, hoping to see more magic in the future. Iterative

A New Hope for ML Experimentation

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

How to Get Started with Data Version Control (DVC)

Mobile Price Classification: An Open Source Data Science Project with Dagshub

How to Use DVC for Tuning Hyperparameters in Machine Learning

$2M Backing and a Vision: How GAM3S.GG is Reshaping Web3 Gaming

$1M Hackathon Prizes Announced By MultiversX to Expand the Blockchain Ecosystem

Windows Sticky Keys Exploit: The War Veteran That Never Dies

How to Get Started with Data Version Control (DVC)

Mobile Price Classification: An Open Source Data Science Project with Dagshub

How to Use DVC for Tuning Hyperparameters in Machine Learning

$2M Backing and a Vision: How GAM3S.GG is Reshaping Web3 Gaming

$1M Hackathon Prizes Announced By MultiversX to Expand the Blockchain Ecosystem

Windows Sticky Keys Exploit: The War Veteran That Never Dies

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps