Analytics: Data in Docker

Table of contentHide Show

Temporary Data
Persistent Data: Bind Mounts
Persistent Data: Volumes
- CLI Commands
Wrap up and more info

This post is about data inside Docker containers. As I mentioned in the last post #Analytics: Docker for Data Science Environment, data in Docker can either be temporary or persistent. In this tutorial, I will focus on Docker volumes, but I will include some info about temporary data and bind mounts too.



Fig 1: Data in Docker container (source)

Temporary Data

Inside a Docker container, there are two ways in which data can be kept temporarily. By default, files created inside a container are stored in the writable layer of the container. You do not have to do anything, but every file that is created by an application or the user and written in this layer is temporarily. This means, when the container is stopped or killed and a new container is deployed, the data is lost. You can avoid this lost, if you restart the container that was stopped using its ID, but if your data is important, this is not an option.

If you do not need your data to persist beyond the container life, there is another performant option to save your data: a tmpfs mount. This is a temporary mount that uses the host memory. It has the benefit of faster read and write operations, but it is more volatile as the above option. In this case, if the container is stopped or restarted, you lose the data and you cannot get it back.

docker run -d \
  -it \
  --name tmptest \
  --mount type=tmpfs,destination=/app,tmpfs-mode=1770 \
  nginx:latest

If you want your data to exist even after stopping or killing the container, there are two ways to persist your data beyond the life of a container: bind mount and volumes.

Persistent Data: Bind Mounts

Bind mounts enable to persist your data by binding a file or directory structure of the host system inside the container. This means, the files or directories co-exist inside and outside the container. Then, processes outside the container can modify these files. Additionally, these mounts are difficult to backup, migrate or share between the containers.

The following types of use case are appropriate for bind mounts:

Sharing configuration files from the host machine to containers. E.g. if you need to share the DNS resolution, you can bind /etc/resolv.conf or /etc/hosts files.
Sharing source code or build artifacts between a development environment on the Docker host and a container.
When the file or directory structure of the Docker host is guaranteed to be consistent with the bind mounts the containers require.

As example, to bind the host directory structure "$(pwd)"/target to the container directory app, you just need to type the following:

docker run -d \
  -it \
  --name devtest \
  -v "$(pwd)"/target:/app \
  nginx:latest

Originally, the -v flag was used for standalone containers and the --mount flag was used with Docker Swarms. However, beginning with Docker 17.06, you can use --mount in all cases. The -v is more easy to use, but the syntax --mount is a bit more verbose and allows you to set more options to the bind. The above deployment can be done using --mount typing:

docker run -d \
  -it \
  --name devtest \
  --mount type=bind,source="$(pwd)"/target,target=/app \
  nginx:latest

Persistent Data: Volumes

Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. Volumes are completely managed by Docker. They have several advantages over bind mounts:

Easier to back up or migrate
The management can be done using Docker CLI commands or the Docker API
Work on both Linux and Windows containers
They can be more safely shared among multiple containers
Able to store on remote hosts or cloud providers, to encrypt the contents
New volumes can have their content pre-populated by a container

CLI Commands

You can create a stand-alone volume typing the following:

docker volume create —-name my_volume

and you can list the created volumes using:

docker volume ls

Moreover, volumes can be inspected with:

docker volume inspect my_volume

and removed using:

docker volume rm my_volume

Volumes that are not used by a container are called dangling volumes. If you want/need to reduce disk usage on your system, you can remove all dangling volumes with:

docker volume prune

Docker will warn you and ask for confirmation before deletion. If the volume is associated with any containers, you cannot remove it until the containers are deleted. If you still have problems to remove a container because Docker says it is associated with a container that does not exist. You can use the following to clean up all your Docker resources:

docker system prune

You can mount a volume using -v or mount. To mount e.g. my_volume type the following:

docker run -d \
  -it \
  --name devtest \
  -v my_volume:/app \
  nginx:latest

The syntax --mount allows you to add e.g. a read only option to the volume as:

docker run -d \
  -it \
  --name devtest \
  --mount type=bind,source=my_volume,target=/app,readonly \
  nginx:latest

Wrap up and more info

This article explains how to deal with data in Docker containers. If you need to persist your data, you should use bind or volumes. Volumes are the preferred way to persist data in Docker containers and services. They are portable, and the data and files inside cannot be directly modified by the host system.

If you need more information about how to "manage data in Docker", please refer to the official documentation. Nigel Poulton has also an excellent book about Docker: Docker Deep Dive. The book updates regularly and on leanpub (I do not have any association nor I do not receive any support from, I just find it nice) you can get always the actual version. This book is my reference for Docker.

#Analytics: Data in Docker

Temporary Data

Persistent Data: Bind Mounts

Persistent Data: Volumes

CLI Commands

Wrap up and more info