paint-brush
Improve Productivity by Using mamba to Speed up Creating Python Virtual Environmentby@keviny
528 reads
528 reads

Improve Productivity by Using mamba to Speed up Creating Python Virtual Environment

by Kevin YangSeptember 4th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Improve productivity by using mamba to speed up creating Python virtual environment. Resolve the issue of extremely slow environment solving when using conda.
featured image - Improve Productivity by Using mamba to Speed up Creating Python Virtual Environment
Kevin Yang HackerNoon profile picture

If you’re using Anaconda to create a new virtual environment and have encountered an issue where the environment creation process stalls at the stage of Solving Environment or this stage runs for an extremely long time (e.g., 10+ hours), the following steps outline the journey toward a solution.


What Are the Reasons that “Solving Environment” Takes Excessively Long?

To comprehend the factors contributing to the extended duration of the solving stage, I initially referred to a document on Conda Performance. The document provides a set of questions to consider when experiencing a slowdown:


  1. Are you creating a new environment or installing into an existing one?
  2. Does your environment have pip-installed dependencies in it?
  3. What channels are you using?
  4. What packages are you installing?
  5. Is the channel metadata sane?
  6. Are channels interacting in bad ways?


To answer the questions, I was in the process of creating a new virtual environment that contained pip-installed dependencies. I utilized both the anaconda and conda-forge channels and the packages are sourced from different channels.


To validate whether the channel metadata is sane, I executed the following commands.


conda search --override-channels --channel=anaconda
conda search --override-channels --channel=conda-forge


Thankfully, no errors were encountered, indicating that the channel metadata appeared to be in order. I remained uncertain about whether the channels were interacting in undesirable ways.


Trying to Improve Conda’s Performance

The document referenced earlier, together with the blog post titled Understanding and Improving Conda’s Performance, provides some suggested approaches to tackle the issue.


  1. Reduce Conda’s problem size (probably refers to the SAT problem) using “conda-metachannel
  2. Configure channel priority
  3. Reduce the index — specify more specific package specs (e.g., version, build string)


Using “conda-metachannel”

Because of the large number of packages involved and the complexity of channel interactions, this particular method had not been tested at the time.


Configure Channel Priority

I initially attempted to resolve the issue by adjusting the channel priority in the .condarc file. I placed smaller channels like defaults and anaconda before larger channels like conda-forge, utilizing the strict mode. However, this approach did not effectively address the problem and sometimes even resulted in failure during the solving stage. Subsequently, I experimented with the flexible mode, but the solving time remained excessively long. Here is an example of the configuration in the .condarc file.


channel_priority: flexible
channels:
  - defaults
  - anaconda
  - conda-forge


Reduce the Index

To optimize the indexing process, I opted to specify the version for each package explicitly. For instance, I used the format numpy==1.15.4 instead of numpy. Theoretically, this approach should expedite the solving stage by allowing Conda to narrow down the candidate options more efficiently.


After implementing this modification, I observed that the solving stage took approximately 70 minutes to complete. To assess the reliability of this solution, I conducted tests on other machines. Surprisingly, the process concluded within approximately 15 minutes on certain machines, while on others, it seemed to run indefinitely. Consequently, it became evident that this method does not provide a definitive resolution to the issue at hand.


Using mamba to Create the Environment

Despite following the aforementioned suggestions without achieving success, I decided to explore an alternative approach to address the issue. Based on a recommendation from a colleague, I delved into mamba, a C++ implementation of the Conda package manager. After experimenting with this tool, I successfully resolved the problem. Now, the environment creation process takes only a couple of minutes.


Install mamba

Consequently, I started with install mamba by running (for Linux):


curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh 


During the installation process, you may encounter a prompt asking whether to initialize conda. In such a case, it is recommended to select “yes” for this option. This ensures that mamba can function properly alongside conda and avoids any potential conflicts between two package managers.


Once the installation is completed successfully, the terminal will display the information as depicted in the screenshot below. After restarting the terminal, you can verify the installation by typing mamba in the command prompt.


A Potential Issue with the Installation

In case you encounter an error stating Command ‘mamba’ not found even after restarting the terminal, it is advisable to check the conda section in your .bashrc file. Ensure that the path to conda.sh points to the mamba installation directory, as illustrated in the second screenshot below.


Create A Virtual Environment using mamba

Following the installation, I proceeded to create a new virtual environment using mamba. I executed the command mamba env create -f environment.yaml --prefix $(pwd) to create the environment using the specifications provided in the environmental.yaml file, with the environment located in the current directory.


Compatibility with conda Command

Although the issue was ultimately resolved, a new concern emerged due to the need to use mamba instead of conda for creating the environment. This raised a potential challenge since there are numerous instances in our project where conda is utilized for configuring environment setup and deployment.


However, there is good news: mamba and conda commands are interchangeable to some extent. This means that we can still use conda to interact with the environment created by mamba, as illustrated in the screenshot below.



Hence, it is great that our team does not have to refactor any of the existing processes, except for the specific step involving environment creation. This means that we can seamlessly integrate the use of mamba for creating environments without disrupting the rest of our workflow.


Also published here.