Use nvidia-docker to create awesome Deep Learning Environments for R (or Python) PT I

How long does it take you to install your complete GPU-enabled deep learning environment including RStudio or jupyter and all your packages? And do you have to do that on multiple systems? In this blog post series I’m going to show you how and why I manage my data science environment with GPU enabled docker containers. In this first post you will read about:

  • Why I completely switched to containers for my data science workflow
  • What docker is and how it compares to virtual machines (skip that if you already know!)
  • How I build my data science images
  • How to build a container with r-base, Keras and TensorFlow with GPU support

The status quo

How are you managing your data science stack? I was never really satisfied in how I did it. Installing the whole stack including all the packages I use, GPU-Support, Keras and TensorFlow for R and the underlying Python stuff on different machines is a tedious and cumbersome process. You end up with a pretty fragile toolchain and I’m the kind of guy that tends to fiddle around and break things. And there are more downsides to that. You don’t have the same environment on different machines for example. I’m doing data science on at least three different machines: My laptop, my workstation with a GTX 1080 Ti and on AWS instances. A decoupling of my dev-environment and the host OS was long time overdue. I always knew, that VM’s are not the way to go, because they create way to much overhead and you have to allocate all the resources you want to use in advance.

Docker is perfect for this purpose. It gives you an isolated development environment which shields you from messing things up. When I saw the NVIDIA GPU Cloud (NGC) I knew, that this would solve my problems. NGC is not a cloud service but a container registry where you can download pre-build and GPU enabled docker images that are optimized for different workflows. There are images for TensorFlow, pytorch, caffe and other frameworks. The good thing is: You don’t have to care about any driver, framework or package installation. You can launch python, import TensorFlow and torture your GPU right away. Unfortunately the images contain proprietary software from NVIDIA to optimize computation. You can use it for free for all kinds of purposes including commercial stuff, but you are not allowed to redistribute their images. That’s probably the reason why you will not find them in in projects like Rocker. What you can of course do is publishing Dockerfiles that use those images as base images and that’s what I’m going to do in this blog post.

Why Docker?

If you already know what Docker is and how it compares to Virtual Machines you can just skip this section. I’ll just give you a short overview, there are plenty of good tutorials out there to learn how Docker works. By the way: I’m far from being an expert for Docker, but you really don’t have to be one to do stuff like this. Have a look at the picture below. On the left side is shown how VM’s are working: Each instance emulates a complete Operating system. For me that was never really an option for data science environments. It’s creating some overhead but most importantly you have to allocate the resources in advance and they are not shared with your host system. For a setup where you use the host as your all-purpose system and the VM for data science on the same machine that’s not practical.

Isolation with containers is much better for this scenario: It’s basically just an isolated file system. It’s using the kernel of your host system and that’s of course adding very little overhead. Unfortunately this is not true for Docker on Windows or Mac, because it’s using a Linux virtual machine underneath. I always tinkered with completely switching to Linux even on my Laptop and because of this I did it. But even if you want to stick with Windows or Mac: You are probably using a workstation or cloud instance with Linux. The really interesting part is shown on the right side of the image: Containers with the nvidia-docker runtime. Those containers can use the GPU of the host system. You just need a CUDA enabled GPU and the drivers on the host system and nothing more.

Now a few words on how docker works and the terms that are associated with it. The two most important concepts in Docker are images and containers. The image contains the blueprint to create a container. It’s build in layers and it can contain just the fundamental basics of an operating system or a more complex stack of software. A container is an instance of a docker image. It feels exactly the same as a virtual machine except that you have access to the resources of your host system. So, when you start a docker container from an image (and tell it to be interactive) you end up in a shell environment just like you would login to a VM. Also a container starts up very fast. It’s almost not recognizable whereas in a VM you have to start up the OS.

To build a docker image you have to write what’s called a Dockerfile. This can be done in two fashions: Using a parent image or starting from scratch, which you only need, when you want to create a new base image. In most cases you will use a base image as parent and build your software stack on top of that. If you want to build your image on let’s say Ubuntu 16.04 the first line of your Dockerfile would be:  FROM ubuntu:16.04. The most common way of getting images is pulling them from docker hub, which is a platform to share images. Docker does that automatically or you can pull an image manually with docker pull ImageName. In the case of Ubuntu and other commonly used images there are officially maintained images from the docker staff. As you will see in the next section you can also pull images from other repositories than docker hub.

Get Docker and the TensorFlow container

Docker and the nvidia runtime are really easy to install.  The following commands install the free docker community edition with an installation script from get.docker.com. This script can install docker on all the common Linux distributions (have a look at get.docker.com for details):

To install the nvidia-docker runtime you must add their package repositories and install and reload the docker daemon. I’m using Ubuntu 16.04 as host system, but this should work on pretty much any debian based distribution using apt.

To check if everything worked out you can load a cuda image and execute nvidia-smi:

This command is downloading the cuda image from docker hub, firing up a container based on this image, executing the command nvidia-smi inside the container and then immediately leaving the container and deleting it. You should see something like this:

This means that my GTX 1080Ti is available inside the container! This cuda image is one of the images NVIDIA is hosting on docker hub. For the optimized deep learning containers you have to register for the NVIDIA GPU Cloud (NGC) which is not a cloud service provider but a container registry similar to docker hub. It’s free and you can use the containers for your own or commercial purposes, but you are not allowed to redistribute them. Once you are signed up select configuration in the menu on the left and generate an API-Key. With this key you can register your docker installation for the NVIDIA registry. You should save the key somewhere safe. Use the command docker login nvcr.io to register. As username you have to use $oauthtoken (with the $ sign!) and the API-Key as password. Now if you use the following command docker should download the optimized TensorFlow container:

How I build

I’m creating the environment I use in different stages. In the first image I’m just installing R-base, Keras for Python and R, TensorFlow for R and their dependencies. In the next image I’m installing RStudio-Server and jupyterlab on top of that. In the third image I’m customizing RStudio for my needs and install all the packages I want to use.

The last step is for installing further packages. If I recognize that I need an additional package during a working session I can just install it like I would naturally do that inside RStudio. After I finished I’m adding this package to the fourth image. If I would use the third one I would have to reinstall all the packages every time and that takes some time. With docker you can also commit changes to a container. That’s also a way to permanently install new packages. I decided not to do that to keep track of what I’m adding to my images. There is another benefit of creating those containers in steps: You can use the images for different use cases. For example, you can use your RStudio image to create a neural net and save it. Then you can take this model load it into the base container and make a python-based flask microservice to deploy it. Or you want to make a shiny app that uses image recognition. You can use the r-base container, add shiny and deploy it. Or you want to provide differently customized RStudio-Versions for your colleagues.

Install R-base and Keras

As we already pulled the TensorFLow container from NVIDIA we can now start to build the r-base image. Building a docker image is essentially just writing a shell script that installs all the things you want in your image. Like I already pointed out it’s written down in a textfile with the name Dockerfile. When you run the command  docker build -t YourNameTag .  in the folder of your Dockerfile docker is starting a container from a base image that you defined with FROM and runs the stuff you’ve written in your Dockerfile inside this container. What docker executes inside this container is defined with RUN. Have a look at the following Dockerfile:

If I run docker build on this file docker does the following: It’s starting a container based on our downloaded TensorFlow container, adding some environment variables ( ARG and  ENV) and creating a group and a user. Note that  ARG defines variables that are only available during the build process and that you can pass a value for those variables with the  docker build  command. So when you start the command with docker build --build-arg USER=Kai the default  USER="docker"  would be overwritten during the build process. Variables you define with  ENV persist and are also available in containers you start from a created image. In the next step we install packages that are needed with apt:

The next thing is installing R. I’m using the binaries CRAN is providing for Ubuntu. Normally CRAN has the latest releases very fast, but at the moment (May 19th) R 3.5 is not available via CRAN (read here why). I will wait for 3.5 to appear on CRAN. If you desperately want to  have R 3.5 you can install it like in this Dockerfile and install packages from source. The installation shown in the next step is the same as in the r-base Dockerfile from the Rocker project. It’s installing littler, which is a handy CLI interface for R, r-base, r-base-dev and r-recommended for the R version specified in the R_BASE_VERSION environment variable. Also it’s creating links for littler in  /usr/local/bin to make it available in the command line.

The TensorFlow and Keras packages for R are connecting to python via the reticulate package. That means we need both packages installed for R and python. Both R packages are installed via the github installation routine provided by littler. Since we’re using the NVIDIA TensorFlow image we only have to care about Keras for python (TensorFlow is already installed). If you ever used Keras for R before you probably noticed that you have to call the function install_keras() to install the python backend. In our case this is already done. The only additional thing is that the function creates the virtual environment r-tensorflow. It’s good practice to use virtual environments in python although this would not be necessary in a docker container because it’s kind of double isolated.

Okay now we have a base image for deep learning with R! You can find the complete Dockerfile here. In the next part I’m going to create the images with RStudio and the customization.