
Machine Learning with Amazon SageMaker Cookbook
By :

In this recipe, we will prepare a Dockerfile
for the custom Python container image. We will make use of the train
and serve
scripts that we prepared in the previous recipes. After that, we will run the docker build
command to prepare the image before pushing it to an Amazon ECR repository.
Tip
Wait! What's a Dockerfile
? It's a text document containing the directives (commands) used to prepare and build a container image. This container image then serves as the blueprint when running containers. Feel free to check out https://docs.docker.com/engine/reference/builder/ for more information on Dockerfiles.
Make sure you have completed the Preparing and testing the serve script in Python recipe.
The initial steps in this recipe focus on preparing a Dockerfile
. Let's get started:
Dockerfile
file in the file tree to open it in the Editor pane. Make sure that this is the same Dockerfile
that's inside the ml-python
directory:Figure 2.55 – Opening the Dockerfile inside the ml-python directory
Here, we can see a Dockerfile
inside the ml-python
directory. Remember that we created an empty Dockerfile
in the Setting up the Python and R experimentation environments recipe. Clicking it in the file tree should open an empty file in the Editor pane:
Figure 2.56 – Empty Dockerfile in the Editor pane
Here, we have an empty Dockerfile
. In the next step, we will update this by adding three lines of code.
Dockerfile
with the following block of configuration code: FROM arvslat/amazon-sagemaker-cookbook-python-base:1 COPY train /usr/local/bin/train COPY serve /usr/local/bin/serve
Here, we are planning to build on top of an existing image called amazon-sagemaker-cookbook-python-base
. This image already has a few prerequisites installed. These include the Flask
, pandas
, and Scikit-learn
libraries so that you won't have to worry about getting the installation steps working properly in this recipe. For more details on this image, check out https://hub.docker.com/r/arvslat/amazon-sagemaker-cookbook-python-base:
Figure 2.57 – Docker Hub page for the base image
Here, we can see the Docker Hub page for the amazon-sagemaker-cookbook-python-base image.
Tip
You can access a working copy of this Dockerfile
in the Machine Learning with Amazon SageMaker Cookbook GitHub repository: https://github.com/PacktPublishing/Machine-Learning-with-Amazon-SageMaker-Cookbook/blob/master/Chapter02/ml-python/serve.
With the Dockerfile
ready, we will proceed with using the Terminal until the end of this recipe:
Figure 2.58 – New Terminal
Here, we can see how to create a new Terminal. Note that the Terminal pane is under the Editor pane in the AWS Cloud9 IDE.
ml-python
directory containing our Dockerfile
:cd /home/ubuntu/environment/opt/ml-python
IMAGE_NAME=chap02_python TAG=1
docker build
command:docker build --no-cache -t $IMAGE_NAME:$TAG .
The docker build
command makes use of what is written inside our Dockerfile
. We start with the image specified in the FROM
directive and then we proceed by copying the file into the container image.
docker run
command to test if the train
script works:docker run --name pytrain --rm -v /opt/ml:/opt/ml $IMAGE_NAME:$TAG train
Let's quickly discuss some of the different options that were used in this command. The --rm
flag makes Docker clean up the container after the container exits. The -v
flag allows us to mount the /opt/ml
directory from the host system to the /opt/ml
directory of the container:
Figure 2.59 – Result of the docker run command (train)
Here, we can see the results after running the docker run
command. It should show logs similar to what we had in the Preparing and testing the train script in Python recipe.
docker run
command to test if the serve
script works:docker run --name pyserve --rm -v /opt/ml:/opt/ml $IMAGE_NAME:$TAG serve
After running this command, the Flask API server starts successfully. We should see logs similar to what we had in the Preparing and testing the serve script in Python recipe:
Figure 2.60 – Result of the docker run command (serve)
Here, we can see that the API is running on port 8080
. In the base image we used, we added EXPOSE 8080
to allow us to access this port in the running container.
Figure 2.61 – New Terminal
As the API is running already in the first Terminal, we have created a new one.
SERVE_IP=$(docker network inspect bridge | jq -r ".[0].Containers[].IPv4Address" | awk -F/ '{print $1}') echo $SERVE_IP
We should get an IP address that's equal or similar to 172.17.0.2
. Of course, we may get a different IP address value.
curl
command: curl http://$SERVE_IP:8080/ping
We should get an OK
after running this command.
invocations
endpoint URL using the curl
command:curl -d "1" -X POST http://$SERVE_IP:8080/invocations
We should get a value similar or close to 881.3428400857507
after invoking the invocations
endpoint.
At this point, it is safe to say that the custom container image we have prepared in this recipe is ready. Now, let's see how this works!
In this recipe, we built a custom container image using the Dockerfile
configuration we specified. When you have a Dockerfile
, the standard set of steps would be to use the docker build
command to build the Docker image, authenticate with ECR to gain the necessary permissions, use the docker tag
command to tag the image appropriately, and use the docker push
command to push the Docker image to the ECR repository.
Let's discuss what we have inside our Dockerfile
. If this is your first time hearing about Dockerfiles, they are simply text files containing commands to build the image. In our Dockerfile
, we did the following:
arvslat/amazon-sagemaker-cookbook-python-base
as the base image. Check out https://hub.docker.com/repository/docker/arvslat/amazon-sagemaker-cookbook-python-base for more details about this image.train
and serve
scripts to the /usr/local/bin
directory inside the container image. These scripts are executed when we use docker run
.Using the arvslat/amazon-sagemaker-cookbook-python-base
image as the base image allowed us to write a shorter Dockerfile
that focuses only on copying the train
and serve
files to the directory inside the container image. Behind the scenes, we have already pre-installed the flask
, pandas
, scikit-learn
, and joblib
packages, along with their prerequisites, inside this container image so that we will not run into issues when building the custom container image. Here is a quick look at the Dockerfile
file we used as the base image that we are using in this recipe:
FROM ubuntu:18.04 RUN apt-get -y update RUN apt-get install -y python3.6 RUN apt-get install -y --no-install-recommends python3-pip RUN apt-get install -y python3-setuptools RUN ln -s /usr/bin/python3 /usr/bin/python & \ ln -s /usr/bin/pip3 /usr/bin/pip RUN pip install flask RUN pip install pandas RUN pip install scikit-learn RUN pip install joblib WORKDIR /usr/local/bin EXPOSE 8080
In this Dockerfile
, we can see that we are using Ubuntu:18.04
as the base image. Note that we can use other base images as well, depending on the libraries and frameworks we want to be installed in the container image.
Once we have the container image built, the next step will be to test if the train
and serve
scripts will work inside the container once we use docker run
. Getting the IP address of the running container may be the trickiest part, as shown in the following block of code:
SERVE_IP=$(docker network inspect bridge | jq -r ".[0].Containers[].IPv4Address" | awk -F/ '{print $1}')
We can divide this into the following parts:
docker network inspect bridge
: This provides detailed information about the bridge network in JSON format. It should return an output with a structure similar to the following JSON value:[ { ... "Containers": { "1b6cf4a4b8fc5ea5...": { "Name": "pyserve", "EndpointID": "ecc78fb63c1ad32f0...", "MacAddress": "02:42:ac:11:00:02", "IPv4Address": "172.17.0.2/16", "IPv6Address": "" } }, ... } ]
jq -r ".[0].Containers[].IPv4Address"
: This parses through the JSON response value from docker network inspect bridge
. Piping this after the first command would yield an output similar to 172.17.0.2/16
.awk -F/ '{print $1}'
: This splits the result from the jq
command using the /
separator and returns the value before /
. After getting the AA.BB.CC.DD/16
value from the previous command, we get AA.BB.CC.DD
after using the awk
command.Once we have the IP address of the running container, we can ping the /ping
and /invocations
endpoints, similar to how we did in the Preparing and testing the serve script in Python recipe.
In the next recipes in this chapter, we will use this custom container image when we do training and deployment with the SageMaker Python SDK.
Change the font size
Change margin width
Change background colour