If you’re using Anaconda to create a new virtual environment and have encountered an issue where the environment creation process stalls at the stage of Solving Environment or this stage runs for an extremely long time (e.g., 10+ hours), the following steps outline the journey toward a solution.
To comprehend the factors contributing to the extended duration of the solving stage, I initially referred to a document on
To answer the questions, I was in the process of creating a new virtual environment that contained pip-installed dependencies. I utilized both the anaconda
and conda-forge
channels and the packages are sourced from different channels.
To validate whether the channel metadata is sane, I executed the following commands.
conda search --override-channels --channel=anaconda
conda search --override-channels --channel=conda-forge
Thankfully, no errors were encountered, indicating that the channel metadata appeared to be in order. I remained uncertain about whether the channels were interacting in undesirable ways.
The document referenced earlier, together with the blog post titled
Because of the large number of packages involved and the complexity of channel interactions, this particular method had not been tested at the time.
I initially attempted to resolve the issue by adjusting the channel priority in the .condarc
file. I placed smaller channels like defaults
and anaconda
before larger channels like conda-forge
, utilizing the strict
mode. However, this approach did not effectively address the problem and sometimes even resulted in failure during the solving stage. Subsequently, I experimented with the flexible
mode, but the solving time remained excessively long. Here is an example of the configuration in the .condarc
file.
channel_priority: flexible
channels:
- defaults
- anaconda
- conda-forge
To optimize the indexing process, I opted to specify the version for each package explicitly. For instance, I used the format numpy==1.15.4
instead of numpy
. Theoretically, this approach should expedite the solving stage by allowing Conda to narrow down the candidate options more efficiently.
After implementing this modification, I observed that the solving stage took approximately 70 minutes to complete. To assess the reliability of this solution, I conducted tests on other machines. Surprisingly, the process concluded within approximately 15 minutes on certain machines, while on others, it seemed to run indefinitely. Consequently, it became evident that this method does not provide a definitive resolution to the issue at hand.
Despite following the aforementioned suggestions without achieving success, I decided to explore an alternative approach to address the issue. Based on a recommendation from a colleague, I delved into mamba, a C++ implementation of the Conda package manager. After experimenting with this tool, I successfully resolved the problem. Now, the environment creation process takes only a couple of minutes.
Consequently, I started with
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
During the installation process, you may encounter a prompt asking whether to initialize conda. In such a case, it is recommended to select “yes” for this option. This ensures that mamba can function properly alongside conda and avoids any potential conflicts between two package managers.
Once the installation is completed successfully, the terminal will display the information as depicted in the screenshot below. After restarting the terminal, you can verify the installation by typing mamba
in the command prompt.
In case you encounter an error stating Command ‘mamba’ not found even after restarting the terminal, it is advisable to check the conda section in your .bashrc
file. Ensure that the path to conda.sh
points to the mamba installation directory, as illustrated in the second screenshot below.
Following the installation, I proceeded to create a new virtual environment using mamba. I executed the command mamba env create -f environment.yaml --prefix $(pwd)
to create the environment using the specifications provided in the environmental.yaml
file, with the environment located in the current directory.
Although the issue was ultimately resolved, a new concern emerged due to the need to use mamba instead of conda for creating the environment. This raised a potential challenge since there are numerous instances in our project where conda is utilized for configuring environment setup and deployment.
However, there is good news: mamba and conda commands are interchangeable to some extent. This means that we can still use conda to interact with the environment created by mamba, as illustrated in the screenshot below.
Hence, it is great that our team does not have to refactor any of the existing processes, except for the specific step involving environment creation. This means that we can seamlessly integrate the use of mamba for creating environments without disrupting the rest of our workflow.
Also published here.