
Learn Amazon SageMaker
By :

Experimentation is a key part of the ML process. Developers and data scientists use a collection of open source tools and libraries for data exploration and processing, and of course, to evaluate candidate algorithms. Installing and maintaining these tools takes a fair amount of time, which would probably be better spent on studying the ML problem itself!
In order to solve this problem, Amazon SageMaker makes it easy to fire up a notebook instance in minutes. A notebook instance is a fully managed Amazon EC2 instance that comes preinstalled with the most popular tools and libraries: Jupyter, Anaconda (and its conda
package manager), numpy
, pandas
, deep learning frameworks, and even NVIDIA GPU drivers.
Note:
If you're not familiar with S3 at all, please read the following documentation:https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html
Let's create one such instance using the AWS Console (https://console.aws.amazon.com/sagemaker/):
Figure 1.6 Creating a notebook instance
Note:
The AWS console is a living thing. By the time you're reading this, some screens may have been updated. Also, you may notice small differences from one region to the next, as some features or instance types are not available there.
ml.t2.medium
for now. As a matter of fact, it's an excellent default choice if your notebooks only invoke SageMaker APIs that create fully managed infrastructure for training and deployment – no need for anything larger. If your workflow requires local data processing and model training, then feel free to scale up as needed.We can ignore Elastic Inference for now, it will be covered in Chapter 13, Optimizing Prediction Cost and Performance. Thus, your setup screen should look like the following screenshot:
Figure 1.7 Creating a notebook instance
Figure 1.8 Creating a notebook instance
Select Create a new role, which opens the following screen:
Figure 1.9 Creating an IAM role
The only decision we have to make here is whether we want to allow our notebook instance to access specific Amazon S3 buckets. Let's select Any S3 bucket and click on Create role. This is the most flexible setting for development and testing, but we'd want to apply much stricter settings for production. Of course, we can edit this role later on in the IAM console, or create a new one.
Optionally, we can disable root access to the notebook instance, which helps lock down its configuration. We can also enable storage encryption using Amazon Key Management Service (https://aws.amazon.com/kms). Both features are extremely important in high-security environments, but we won't enable them here.
Once you've completed this step, your screen should look like this, although the name of the role will be different:
Figure 1.10 Creating an IAM role
Figure 1.11 Setting the VPC
Let's clone one of my repositories to illustrate, and enter its name as seen in the following screenshot. Feel free to use your own!
Figure 1.12 Setting Git repositories
Figure 1.13 Setting tags
Under the hood, SageMaker fires up a fully managed Amazon EC2 instance, using an Amazon Machine Image (AMI) preinstalled with Jupyter, Anaconda, deep learning libraries, and so on. Don't look for it in the EC2 console, you won't see it.
Figure 1.14 Opening a notebook instance
We'll jump straight into Jupyter Lab. As shown in the following screenshot, we see in the left-hand panel that the repository has been cloned. In the Launcher panel, we see the many conda environments that are readily available for TensorFlow, PyTorch, Apache MXNet, and more:
Figure 1.15 Notebook instance welcome screen
The rest is vanilla Jupyter, and you can get to work right away!
Coming back to the AWS console, we see that we can stop, start, and delete a notebook instance, as shown in the next screenshot:
Figure 1.16 Stopping a notebook instance
Stopping a notebook instance is identical to stopping an Amazon EC2 instance: storage is persisted until the instance is started again.
When a notebook instance is stopped, you can then delete it: the storage will be destroyed, and you won't be charged for anything any longer.
If you're going to use this instance to run the examples in this book, I'd recommend stopping it and restarting it. This will save you the trouble of recreating it again and again, your work will be preserved, and the costs will really be minimal.