
Machine Learning with Amazon SageMaker Cookbook
By :

In this recipe, we will launch and configure an AWS Cloud9 instance running an Ubuntu server. This will serve as the experimentation and simulation environment for the other recipes in this chapter. After that, we will resize the volume attached to the instance so that we can build container images later. This will ensure that we don't have to worry about disk space issues while we are working with Docker containers and container images. In the succeeding recipes, we will be preparing the expected file and directory structure that our train
and serve
scripts will expect when they are inside the custom container.
Important note
Why go through all this effort of preparing an experimentation environment? Once we have finished preparing the experimentation environment, we will be able to prepare, test, and update the custom scripts quickly, without having to use the fit()
and deploy()
functions from the SageMaker Python SDK during the initial stages of writing the script. With this approach, the feedback loop is much faster, and we will detect the issues in our script and container image before we even attempt using these with the SageMaker Python SDK during training and deployment.
Make sure you have permission to manage the AWS Cloud9 and EC2 resources if you're using an AWS IAM user with a custom URL. It is recommended to be signed in as an AWS IAM user instead of using the root account in most cases.
The steps in this recipe can be divided into three parts:
We'll begin by launching the Cloud9 environment with the help of the following steps:
Figure 2.2 – Looking for the AWS Cloud9 service under Developer Tools
In the preceding screenshot, we can see the services after clicking the Services link on the navigation bar.
Figure 2.3 – Create environment button
Here, we can see that the Create environment button is located near the top-right corner of the page.
Cookbook Experimentation Environment
) and, optionally, a description for your environment. Click Next step afterward:Figure 2.4 – Name environment form
Here, we have the Name environment form, where we can specify the name and description of our Cloud9 environment.
Figure 2.5 – Environment settings
We can see the different configuration settings here. Feel free to choose a different instance type as needed.
Figure 2.6 – Other configuration settings
Here, we can see that we have selected a Cost-saving setting of After one hour. This means that after an hour of inactivity, the EC2 instance linked to the Cloud9 environment will be automatically turned off to save costs.
Figure 2.7 – Create environment button
After clicking the Create environment button, it may take a minute or so for the environment to be ready. Once the environment is ready, check the different sections of the IDE:
Figure 2.8 – AWS Cloud9 development environment
As you can see, we have the file tree on the left-hand side. At the bottom part of the screen, we have the Terminal, where we can run our Bash commands. The largest portion, at the center of the screen, is the Editor, where we can edit the files.
Now, we need to increase the disk space.
lsblk
With the lsblk
command, we will get information about the available block devices, as shown in the following screenshot:
Figure 2.9 – Result of the lsblk command
Here, we can see the results of the lsblk
command. At this point, the root volume only has 10G
of disk space (minus what is already in the volume).
Figure 2.10 – How to go back to the AWS Cloud9 dashboard
This will open a new tab showing the Cloud9 dashboard.
ec2
in the search bar and click the EC2 service from the list of results:Figure 2.11 – Using the search bar to navigate to the EC2 console
Here, we can see that the search bar quickly gives us a list of search results after we have typed in ec2
.
Figure 2.12 – Instances (running) link under Resources
We should see the link we need to click under the Resources pane, as shown in the preceding screenshot.
aws-cloud9
and the name we specified while creating the environment. In the bottom pane showing the details, click the Storage tab to show Root device details and Block devices.Figure 2.13 – Storage tab
Here, we can see the Storage tab showing Root device details and Block devices.
10
GiB for the volume size. Click the link under Volume ID (for example, vol-0130f00a6cf349ab37
). Take note that this Volume ID will be different for your volume: Figure 2.14 – Looking for the volume attached to the EC2 instance
You will be redirected to the Elastic Block Store Volumes page, which shows the details of the volume attached to your instance:
Figure 2.15 – Elastic Block Store Volumes page
Here, we can see that the size of the volume is currently set to 10 GiB.
Figure 2.16 – Modify Volume
This is where we can find the Modify Volume option.
100
and click Modify:Figure 2.17 – Modifying the volume
As you can see, we specified a new volume size of 100
GiB. This should be more than enough to help us get through this chapter and build our custom algorithm container image.
Figure 2.18 – Modify Volume confirmation dialog
We should see a confirmation screen here after clicking Modify in the previous step.
Figure 2.19 – Modify Volume Request Succeeded message
Here, we can see a message stating Modify Volume Request Succeeded. At this point, the volume modification is still pending and we need to wait about 10-15 minutes for this to complete. Feel free to check out the How it works… section for this recipe while waiting.
Figure 2.20 – Refresh button
Clicking the refresh button will update State from in-use (green) to in-use – optimizing (yellow):
Figure 2.21 – In-use state – optimizing (yellow)
Here, we can see that the volume modification step has not been completed yet.
Figure 2.22 – In-use state (green)
When we see what is shown in the preceding screenshot, we should celebrate as this means that the volume modification step has been completed!
Now that the volume modification step has been completed, our next goal is to make sure that this change is reflected in our environment.
lsblk
:lsblk
Running lsblk
should yield the following output:
Figure 2.23 – Partition not yet reflecting the size of the root volume
As you can see, while the size of the root volume, /dev/nvme0n1
, reflects the new size, 100G
, the size of the /dev/nvme0n1p1
partition reflects the original size, 10G
.
There are multiple ways to grow the partition, but we will proceed by simply rebooting the EC2 instance so that the size of the /dev/nvme0n1p1
partition will reflect the size of the root volume, which is 100G
.
Figure 2.24 – Attachment information
Clicking this link will redirect us to the EC2 Instances page. It will automatically select the EC2 instance of our Cloud9 environment:
Figure 2.25 – EC2 instance of the Cloud9 environment
The preceding screenshot shows the EC2 instance linked to our Cloud9 environment.
Figure 2.26 – Reboot instance
This is where we can find the Reboot instance option.
Figure 2.27 – Instance is still rebooting
We should see a screen similar to the preceding one.
lsblk
in the Terminal:lsblk
We should get a set of results similar to what is shown in the following screenshot:
Figure 2.28 – Partition now reflecting the size of the root instance
As we can see, the /dev/nvme0n1p1
partition now reflects the size of the root volume, which is 100G
.
That was a lot of setup work, but this will be definitely worth it, as you will see in the next few recipes in this chapter. Now, let's see how this works!
In this recipe, we launched a Cloud9 environment where we will prepare the custom container image. When building Docker container images, it is important to note that each container image consumes a bit of disk space. This is why we had to go through a couple of steps to increase the volume attached to the EC2 instance of our Cloud9 environment. This recipe was composed of three parts: launching a new Cloud9 environment, modifying the mounted volume, and rebooting the instance.
Launching a new Cloud9 environment involves using a CloudFormation template behind the scenes. This CloudFormation template is used as the blueprint when creating the EC2 instance:
Figure 2.29 – CloudFormation stack
Here, we have a CloudFormation stack that was successfully created. What's CloudFormation? AWS CloudFormation is a service that helps developers and DevOps professionals manage resources using templates written in JSON or YAML. These templates get converted into AWS resources using the CloudFormation service.
At this point, the EC2 instance should be running already and we can use the Cloud9 environment as well:
Figure 2.30 – AWS Cloud9 environment
We should be able to see the preceding output once the Cloud9 environment is ready. If we were to use the environment right away, we would run into disk space issues as we will be working with Docker images, which take up a bit of space. To prevent these issues from happening later on, we modified the volume in this recipe and restarted the EC2 instance so that this volume modification gets reflected right away.
Important note
In this recipe, we took a shortcut and simply restarted the EC2 instance. If we were running a production environment, we should avoid having to reboot and follow this guide instead: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html.
Note that we can also use a SageMaker Notebook instance that's been configured with root access enabled as a potential experimentation environment for our custom scripts and container images, before using them in SageMaker. The issue here is that when using a SageMaker Notebook instance, it reverts to how it was originally configured every time we turn off and reboot the instance. This makes us lose certain directories and installed packages, which is not ideal.
Change the font size
Change margin width
Change background colour