Here is some useful information and best practises for Docker Images and Image Building.
You can look at what makes up an image using the docker image history
command. You can see the instruction that was used to create each layer within an image.
Use the docker image history
command to see the layers in the todo-app image you created earlier in the tutorial.
docker image history todo-app
You should get output that looks something like this (dates/IDs may be different).
IMAGE CREATED CREATED BY SIZE COMMENT
2f00d62f2528 24 minutes ago CMD ["node" "src/index.js"] 0B buildkit.dockerfile.v0
<missing> 24 minutes ago EXPOSE map[3000/tcp:{}] 0B buildkit.dockerfile.v0
<missing> 24 minutes ago RUN /bin/sh -c yarn install --production # b… 85.3MB buildkit.dockerfile.v0
<missing> 24 minutes ago COPY . . # buildkit 4.59MB buildkit.dockerfile.v0
<missing> 24 minutes ago WORKDIR /app 0B buildkit.dockerfile.v0
<missing> 8 weeks ago /bin/sh -c #(nop) CMD ["node"] 0B
<missing> 8 weeks ago /bin/sh -c #(nop) ENTRYPOINT ["docker-entry… 0B
<missing> 8 weeks ago /bin/sh -c #(nop) COPY file:4d192565a7220e13… 388B
<missing> 8 weeks ago /bin/sh -c apk add --no-cache --virtual .bui… 7.77MB
<missing> 8 weeks ago /bin/sh -c #(nop) ENV YARN_VERSION=1.22.19 0B
<missing> 8 weeks ago /bin/sh -c addgroup -g 1000 node && addu… 117MB
<missing> 8 weeks ago /bin/sh -c #(nop) ENV NODE_VERSION=18.19.0 0B
<missing> 2 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 2 months ago /bin/sh -c #(nop) ADD file:1f4eb46669b5b6275… 7.38MB
Each of the lines represents a layer in the image. The display here shows the layers that are part of the base image (FROM node:18-alpine
) at the bottom (the lines where IMAGE is missing and that are created 2 months or 8 weeks ago) and the newest layer at the top (created 24 minutes ago). Using this, you can also quickly see the size of each layer, helping diagnose large images.
Now that you’ve seen the layering in action, there’s an important lesson to learn to help decrease build times for your container images.
Once a layer changes, all downstream layers have to be recreated as well
Let’s look at the Dockerfile we were using one more time…
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN yarn install --production
EXPOSE 3000
CMD ["node", "src/index.js"]
Going back to the image history output, we see that each command in the Dockerfile becomes a new layer in the image.
You might remember that when we made a very small change to the image (we simply changed a string in one file), the Node.js dependencies had to be reinstalled which takes a long time. Is there a way to fix this?
To fix this, we need to restructure our Dockerfile to help support the caching of the dependencies. For Node-based applications, those dependencies are defined in the package.json file. So, what if we copied only that file in first, install the dependencies, and then copy in everything else? Then, we only recreate the yarn dependencies if there was a change to the package.json. Make sense?
Update the Dockerfile to copy in the package.json
(and yarn.lock) first, install dependencies, and then copy everything else in.
FROM node:18-alpine
WORKDIR /app
COPY package.json yarn.lock ./
RUN yarn install --production
EXPOSE 3000
COPY . .
CMD ["node", "src/index.js"]
Create a file named .dockerignore
in the same folder as the Dockerfile with the following contents:
node_modules
.dockerignore
files are an easy way to selectively copy only image relevant files. You can read more about this here. In this case, the local node_modules
folder should be omitted in the second COPY
step because otherwise, it would possibly overwrite files which were created by the command in the RUN step.
If you develop in Python, I found this blog for you.
Build a new image.
docker build -t todo-app .
You should see output like this…
```
[+] Building 9.0s (10/10) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 183B 0.0s
=> [internal] load metadata for docker.io/library/node:18-alpine 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 54B 0.0s
=> [1/5] FROM docker.io/library/node:18-alpine 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 2.85kB 0.1s
=> CACHED [2/5] WORKDIR /app 0.0s
=> [3/5] COPY package.json yarn.lock ./ 0.0s
=> [4/5] RUN yarn install --production 8.2s
=> [5/5] COPY . . 0.1s
=> exporting to image 0.6s
=> => exporting layers 0.6s
=> => writing image sha256:c150598f61853d4c86c423adcc920f655b23051560a5c3800c63c074d45aa6d5 0.0s
=> => naming to docker.io/library/todo-app
```
You’ll see that most layers were rebuilt. Perfectly fine since we changed the Dockerfile quite a bit.
Now, make a change to the src/static/index.html
file (e.g. change the title
to say “The Awesome Todo App”).
Rebuild the Docker image again using docker build -t todo-app .
again. This time, your output should look a little different.
```
[+] Building 0.2s (10/10) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 183B 0.0s
=> [internal] load metadata for docker.io/library/node:18-alpine 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 54B 0.0s
=> [1/5] FROM docker.io/library/node:18-alpine 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 3.48kB 0.1s
=> CACHED [2/5] WORKDIR /app 0.0s
=> CACHED [3/5] COPY package.json yarn.lock ./ 0.0s
=> CACHED [4/5] RUN yarn install --production 0.0s
=> [5/5] COPY . . 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:b67d89be99c1317c4355440b51cdd28697998c5b0335d6396f0db01c30235674 0.0s
=> => naming to docker.io/library/todo-app
```
First off, you should notice that the build was MUCH faster (0.2 s vs. 9.0 s)! And, you’ll see that steps 2,3, and 4 all have been CACHED. So we are using the build cache. Pushing and pulling this image and updates to it will be much faster as well.
During this lab, whenever we build and rebuild the todo-app container image, we did so without specifiying a tag with the name.
List all container images with this command:
docker image ls
and check for the todo image:
REPOSITORY TAG IMAGE ID CREATED SIZE
todo-app latest 6866f3c2eb2e 34 seconds ago 222MB
<none> <none> 9c60391389f0 About a minute ago 222MB
<none> <none> 7c8c52d18f97 19 minutes ago 222MB
<none> <none> 53c0e6d51d54 36 minutes ago 222MB
...
When you build a container image with a name but without a tag, Docker will give it the tag “latest”.
Note the three images with name and tag of <none>
. Whenever you build a new todo-app image, it replaces the prevously built image which in turn “looses” the name and the tag.
In a real scenario, whenever you build a container, you should give it a meaningful tag, e.g. v1 for Version 1:
docker build -t todo-app:v1 .
And when you add functions, use a new tag to build a new image:
docker build -t todo-app:v2 .
To get rid of the “dangling” images (the <none> <none>
images) use this command:
docker image prune
While we are not going to dive into it too much in this tutorial, multi-stage builds are an incredibly powerful tool to help use multiple stages to create an image. There are several advantages for them:
When building Java-based applications, a JDK is needed to compile the source code to Java bytecode. However, that JDK isn’t needed in production. Also, you might be using tools like Maven or Gradle to help build the app. Those also aren’t needed in our final image. Multi-stage builds help.
FROM maven AS build
WORKDIR /app
COPY . .
RUN mvn package
FROM tomcat
COPY --from=build /app/target/file.war /usr/local/tomcat/webapps
In this example, we use one stage (called build
) to perform the actual Java build using Maven. In the second stage (starting at FROM tomcat
), we copy in files from the build stage. The final image is only the last stage being created (which can be overridden using the --target
flag).
When building React applications, we need a Node environment to compile the JS code (typically JSX), SASS stylesheets, and more into static HTML, JS, and CSS. If we aren’t doing server-side rendering, we don’t even need a Node environment for our production build. Why not ship the static resources in a static nginx container?
FROM node:12 AS build
WORKDIR /app
COPY package* yarn.lock ./
RUN yarn install
COPY public ./public
COPY src ./src
RUN yarn run build
FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
Here, we are using a node:12
image to perform the build (maximizing layer caching) and then copying the output into an nginx container.
By understanding a little bit about how images are structured, we can build images faster and ship fewer changes. Multi-stage builds help us reduce overall image size and increase final container security by separating build-time dependencies from runtime dependencies.
Next Step: Docker Compose