
Machine Learning with Amazon SageMaker Cookbook
By :

In this recipe, we will convert and serialize the synthetic data stored in CSV format into the protobuf recordIO
format. With the data serialized into the protobuf recordIO
format, we can take advantage of Pipe mode, where training start times will be faster as the training job streams data directly from the S3 bucket source. That said, the SageMaker algorithms may perform much better with this training file format.
This recipe continues from Generating a synthetic dataset for analysis and transformation.
In the first few steps of this recipe, we will focus on scaling and transforming the synthetic labeled dataset into a set of values between 0
and 1
using MinMaxScaler
from sklearn
:
my-experiments/chapter04
directory inside your SageMaker notebook instance. Feel free to create this directory if it does not exist yet.conda_python3...