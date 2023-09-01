In today’s machine learning development, it is common to package the training application into a container, which is then deployed to a compute infrastructure for training. However, before distributing the container image, it is crucial to perform a test locally to ensure everything works correctly. In this guide, I will explain how to configure your local machine to run a . I will demonstrate the setup process on a machine equipped with an , version 11.8, and version 8.6.0. Docker container with access to your on-premise GPU devices Ubuntu 20.04 Nvidia RTX 2060 GPU CUDA cuDNN Prerequisites Docker Nvidia driver CUDA Toolkit cuDNN NVIDIA Container Toolkit Here’s a step-by-step guide to achieving this: Install Docker You can follow the to install Docker Desktop. This application includes Docker Engine, Docker CLI client, Docker Compose, and other tools that enable you to build and share containerized apps. official documentation Install Nvidia Driver Before installing the Nvidia driver, ensure that the is the you intend to install. You can check the for information on compatibility. To determine the required version of the CUDA Toolkit, refer to the machine learning framework you will be using. More details are provided in the CUDA Toolkit section below. driver version compatible with CUDA Toolkit official documentation After that, you can proceed to the , where you should specify your machine’s specifications. Once you have entered the details, click the “Search” button to initiate the driver download. driver download page Install CUDA Toolkit On an Ubuntu machine, it is advisable to install the necessary system packages, such as "build-essential,” before proceeding with the CUDA Toolkit installation. sudo apt-get install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev Before installing the driver, make sure to double-check the required versions of cuDNN and CUDA Toolkit specified by the machine learning framework you intend to use. The version specifications below are from the TensorFlow library. In this case, ensure that the following version requirements are met: After verifying the version requirements, proceed to the and select your machine’s specifications. Once you have selected the appropriate settings, click the “deb(network)” button to obtain the script for installing the CUDA Toolkit. download page At this point, you need to to specify the version of the CUDA Toolkit you want to download. Below is the original script you are likely to receive after specifying your machine specifications modify the installation script wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install cuda Below is an example of installing CUDA Toolkit 11.8. You will need to change the last line by appending the version number to . cuda wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install cuda-11.8 Afterward, you will need to set your and to point to the CUDA Toolkit that you just installed, which, in this case, is . If you are installing a different version, be sure to update it accordingly to the corresponding version. This will ensure that your system can locate and use the installed CUDA Toolkit correctly. PATH LD_LIBRARY_PATH cuda-11.8 echo 'export PATH=/usr/local/cuda-11.8/bin:$PATH' >> ~/.bashrc\necho 'export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc Once you have completed these steps and set up the environment variables, it is essential to reboot the machine. Rebooting ensures that all the changes and configurations related to the CUDA Toolkit and environment variables take effect. After the reboot, your machine should be ready to utilize the installed CUDA Toolkit and GPU for machine learning tasks. Install cuDNN The installation of cuDNN is relatively straightforward, involving copying specific files to the CUDA Toolkit’s and directories. To download cuDNN, you can visit the page. Ensure that the cuDNN version you download matches the one specified by your machine learning framework. include lib64 cuDNN Archive To download the cuDNN package, obtain it as a file and extract its contents once the download is complete. After extraction, run the following script to copy the necessary files into the appropriate CUDA Toolkit directories. Make sure that the specified path points to the correct CUDA Toolkit installation directory. tar sudo cp -P <extracted_cudnn_path>/include/cudnn.h /usr/local/cuda-11.8/include\nsudo cp -P <extracted_cudnn_path>/lib64/libcudnn* /usr/local/cuda-11.8/lib64/\nsudo chmod a+r /usr/local/cuda-11.8/lib64/libcudnn* Now, you have both CUDA Toolkit and cuDNN installed. Install Nvidia Container Toolkit To configure your Docker container to utilize the on-premise GPU devices, you need to set up the Nvidia Container Toolkit. If you are not using Docker, you can follow the to install the toolkit. official guide To install the NVIDIA Container Toolkit, run the following command: sudo apt-get update\nsudo apt-get install -y nvidia-container-toolkit-base To validate your installation, run the following command: nvidia-ctk --version To set up the Nvidia Container Toolkit, run the following command: distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \\\n&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \\\n&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \\\nsed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \\\nsudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list Then, you will need to configure the Docker daemon to recognize the NVIDIA Container Runtime by editing the Docker daemon configuration file. sudo nvidia-ctk runtime configure --runtime=docker Finally, to restart your docker daemon, run the following command: sudo systemctl restart docker Test the Setup After completing the configuration of the Nvidia Container Toolkit and Docker, you can test your setup by running a base CUDA container. I will use the following and the command: Dockerfile Dockerfile FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04\n\nRUN apt-get update --yes --quiet && DEBIAN_FRONTEND=noninteractive apt-get install --yes --quiet --no-install-recommends \\\n software-properties-common \\\n build-essential apt-utils \\\n wget curl vim git ca-certificates kmod \\\n nvidia-driver-525 \\\n && rm -rf /var/lib/apt/lists/*\n\nRUN add-apt-repository --yes ppa:deadsnakes/ppa && apt-get update --yes --quiet\nRUN DEBIAN_FRONTEND=noninteractive apt-get install --yes --quiet --no-install-recommends \\\n python3.10 \\\n python3.10-dev \\\n python3.10-distutils \\\n python3.10-lib2to3 \\\n python3.10-gdbm \\\n python3.10-tk \\\n pip\n\nRUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 999 \\\n && update-alternatives --config python3 && ln -s /usr/bin/python3 /usr/bin/python\n\nRUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10\n\nCOPY requirements.txt /requirements.txt\nCOPY finetune.py /finetune.py\n\nRUN python3 -m pip install --upgrade pip && \\\n python3 -m pip install --no-cache-dir -r /requirements.txt\n\nENTRYPOINT [ "python3", "finetune.py" ] Run the following command to start the container. sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi