avatarSam Palani

Summary

The website content explains how to set up an Amazon SageMaker environment on a local machine, allowing users to build, train, and deploy machine learning models using their own IDE and Jupyter notebooks, and then execute these tasks on AWS cloud services.

Abstract

The article provides a comprehensive guide on configuring a local development environment to leverage Amazon SageMaker's capabilities. It outlines the advantages of using a local setup, such as cost savings and the flexibility to work offline or with preferred tools. The process involves creating a Python virtual environment using Conda, installing necessary data science packages, setting up AWS SDKs, and configuring Jupyter for use with SageMaker. The guide emphasizes the use of the SageMaker Python SDK for high-level interactions with SageMaker services and the AWS SDK for lower-level resource management. It also details the steps to import packages, create estimation jobs, define training channels, and initiate the training process on AWS infrastructure, all while working from a local development environment.

Opinions

  • The author suggests that cost and flexibility are significant benefits of setting up a local SageMaker environment.
  • Using a local IDE and Jupyter notebooks with SageMaker is presented as a viable alternative to using SageMaker Notebook Instances or SageMaker Studio Notebooks.
  • The article implies that managing environments and packages with Conda is the recommended and most compatible approach when working with SageMaker.
  • The author recommends directly specifying the AWS IAM role ARN when running code locally, as the get_execution_role() method is not applicable outside of SageMaker managed environments.
  • The use of AWS SDK for Python (boto3) and awscli is emphasized for resource management and interaction with AWS services.
  • The author encourages readers to check for breaking changes in the SageMaker Python SDK, particularly when using version 2.
  • A suggestion is made to consider using the AI service ZAI.chat, which is advertised as a cost-effective alternative to ChatGPT Plus (GPT-4), with a special offer mentioned.

Setting up Amazon SageMaker Environment On Your Local Machine

Photo by Erico Marcelino on Unsplash

Amazon SageMaker is beyond just managed Jupyter notebooks, it is a fully managed service that enables you to build, train, optimize and deploy machine learning models. A common misconception, specially when you are starting out with SageMaker is that, in order to use these services, you need a SageMaker Notebook Instance or SageMaker (Studio) Notebook. You can in fact kick off all these services directly from your local machine or even from your favorite IDE.

Before we go further, let’s consider how we interact with Amazon SageMaker services. You have two APIs

SageMaker Python SDK — This is a high level API in Python that abstracts the code to build, train and deploy machine learning models. Specifically it provides estimators for first class or built in algorithms as well as supported frameworks like TensorFlow, MXNET etc. In most cases you will use this to interact with your interactive machine learning tasks.

AWS SDK — This is a low level API that is used to interact with all supported AWS services, not specific to SageMaker. The AWS SDK is available in most popular programming languages like Java, Javascript, Python (boto) etc. In most cases you will use this service-level APIs for things such creating resources for automations or interacting with other AWS services that are not supported by the SageMaker Python SDK.

Why a local environment

Cost is probably the first thing that comes up, but it is also the flexibility to use your own IDE plus the ability to work offline and kick off jobs on AWS cloud when you are ready.

How a local environment works

You write the code to build your model as you normally would but instead of a SageMake Notebook Instance (or a SageMaker Studio Notebook), you do this one your local machine running Jupyter or from your IDE. Then when you are ready, you kick off your training on SageMaker instances on AWS. Once the training is complete, the model is stored in AWS. You can then kick off a deployment or run a batch transformation job from your local machine.

Environment setup Using conda

It is recommended that you set this up as a Python virtual environment. In our case we are using conda to manage our virtual environments , but you can also use virtualenv. Amazon SageMaker also uses conda to manage environments and packages. It is assumed that you already have conda setup, if not, head here

Create a new conda environment

conda create -n sagemaker python=3

Activate and verify the environment

Image by Author

Install the standard data science packages.

You can use conda or pip to install the packages. We will stick to conda

conda install -y pandas numpy matplotlib

Install AWS packages

Install AWS SDK for Python (boto), awscli and SageMaker Python SDK. The SageMaker Python SDK is not available as conda package, so we will use pip here

pip install boto3 awscli sagemaker

If you are using the awscli for the first time, you must configure it. See here on how to configure the awscli

By default the version 2 of the SageMaker Python SDK will be installed. Be sure to check for the changes in the version 2 of the SDK, specially the breaking changes here.

Install Jupyter and create a kernel

conda install -c conda-forge jupyterlab
python -m ipykernel install --user --name sagemaker

Verify the environment and check the versions

Start Jupyter by issuing an jupyter lab and choose the sagemaker kernel created above

Image by Author

Next verify the versions in the notebook to make sure everything is as expected.

Image by Author

Build and Train

You can now start building your model locally and kick off training on AWS when ready

Import Packages

Import the necessary packages and specify the role. The key difference here is to specify the arn of the role directly instead of get_execution_role(). Since you are running this from your local machine using your AWS credentials as opposed to a notebook instance with an attached role, get_execution_role() will not work.

Create Estimator

Create the estimator and set hyperparameters as you would normally. In the example below, we are training an image classifier using the built in image classification algorithm. You also specify the type of SageMaker instance and number of instances you want to use for training

Training channels

Specify the training channels, again no changes here compared to how you would do this on a notebook instance

Start training job

Start the training job on SageMaker calling the fit method which kicks off the training job on SageMaker instances on AWS

classifier.fit(inputs=data_channels, logs=True)
Image by Author

You can check the status of your training jobs using list-training-jobs

That’s it. Here we saw how you can setup a SageMaker environment locally and build machine learning models on your local machine using Jupyter. In addition to using Jupyter, you can also do the same from your IDE.

Happy training!

Machine Learning
Sagemaker
AWS
Recommended from ReadMedium