Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

e a good chance of passing the exam.<h1 id="e937">Important Key Concept</h1>Here is the most important section, I will share some key concepts that frequently show up on the exam and how deep you know to understand in order to pass it.<h2 id="c7d7">Data Engineering</h2><ul><li>GlUE</li></ul>When you see ETL, Serverless, 90% chance answer is Glue, you need to understand how to use GLUE as ETL, how to scheduled ETL workflow, what transformation you can do, and what other service you can integrate with GLUE. Glue shows up a lot in the exam as case study question to de-couple application<ul><li>Kinesis Family</li></ul>Understand when to use which Kineses product, how those products integrate with other services (eg: GLUE, EMR, Lamda). What is Shard, and how you can use Kinesis analytic to do ad-hoc transformation. When you see streaming, just look for Kineses. This also came up a lot in the exam<ul><li>Data Migration Tool</li></ul>Understand when to use other data migration tool: Data pipeline, DMS, EMR, etc<ul><li>ETL Transformation</li></ul>How to combine different services for different ETL workflow fulfilling different ETL requirements.<ul><li>S3</li></ul>When you see data storage for machine learning, chose S3. Understand the different ways to get your data into S3.<h2 id="da11">Exploratory Data Analysis</h2><ul><li>Missing Value</li></ul>How to handle missing value, the best option always using ML data augmentation technique<ul><li>Unbalance Data</li></ul>Use SMOTE, add more weight to the smaller class in the evaluation metric, turn the classification threshold<ul><li>Outlier</li></ul>Always go with RCF<ul><li>Visualization</li></ul>Understand the different visualization techniques(Scatter plot, Histogram, etc). How to read elbow graph for early stop. Use Amazon Quicksight for visualization and dashboard.<ul><li>Data Transformation</li></ul>Use Log Transform/Numerical Value Binning for skewed dataUnderstand One Hot encoding, ordinal vs nominal dataNormalization vs. Standardization for large scale data<h2 id="d58c">Modeling</h2><ul><li>Built-in Model</li></ul>Understand all the AWS built-in Molde, when to use which. Spend some time understand BlazingText, Word2Vec, LDA, XGboost, Seq2Seq, those come up a lot on the exam.<ul><li>Hyperparameter Tuning</li></ul>Understand how to do hyperparameter tuning and AWS auto tuning functionality.<ul><li>Evaluation</li></ul>Classification Metric comes up a lot, remember how to calculate Recall, Accuracy, Precision, Sensitivity, and what are those used for. AUC for classification, MAPE for regression<ul><li>Overfitting</li></ul>Lots of question around overfitting remember those methods to help overfittingLower the max depth in the treeL

Options

ower number of layerEnable dropoutEarly stopApply L1 or L2 regularization and dropouts to the trainingDecrease feature combinations.For underfitting, get more data by data augmentation<ul><li>Run your own code/image/docker</li></ul>This also comes up a lot on the exam, how to run your own code/image/docker in sagemaker. Read the documentation<ul><li>AI Service</li></ul>Remember the different AWS AI service and what they can do, a lot of questions on how to combine those AI services together to create an application<h2 id="0f71">Machine Learning Implementation and Operations</h2><ul><li>Model Deployment</li></ul>Understand what is Elastic inference, Sagemaker Neo,Multi-AZ deployment by deploying more than 1 instance, and how to do auto-scaling(Know the concept of cool downtime.)<ul><li>A/B Test</li></ul>Know how to deploy multiple model inference under the same endpoint for AB testing<ul><li>Security</li></ul>A lot of security-related questions, remember the concept of access Sagemaker via VPC endpoint, access S3 via S3 endpoint, EnableNetworkIsolation setting in Sagemaker notebook. And understand how IAM, S3 bucket policy, endpoint policy, Security group, and KMS work<h1 id="54da">Important!! Things that will show up on the exam, Nobody will tell you and you just need to remember.</h1><ul><li>Glue Bookmark</li><li>Seq2Seq use attention mechanism to optimize long sentence</li><li>Personalize use event tracker to learn from recent data</li><li>Polly using lexicons enables doing customization to specific words/acronyms to how they will be pronounced instead.</li><li>Aws forecast can optimize use performautoML and Perform HPO</li><li>How to update JupterNotebook Version</li><li>SHAP ML explainability</li><li>Glue Fuzzy Matching</li><li>How to avoid AWS recognition using your data to improve your training</li><li>Tensorflow distributed in Sagemaker by Horovod framework</li><li>Amazon Kinesis Data Analytics Random Cut Forest (RCF)</li><li>FindMatch ML Transform using Glue</li><li>Stratified k-Fold Cross-Validation for Imbalanced Classification</li><li>LDA for customer clustering</li><li>Transfer Learning</li><li>Read Elbow Plot</li><li>SMOTE</li></ul>Hope this blog helps you to pass the certification exam!Thanks for reading and I am looking forward to hearing your questions and thoughts. If you want to learn more about Data Science and Cloud Computing, you can find me on <a href="https://www.linkedin.com/in/andrewngai9255/">Linkedin</a>.<figure id="7058"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*qL9J-Cjp81Qj9tiT"><figcaption>Photo by <a href="https://unsplash.com/@alfonsmc10?utm_source=medium&utm_medium=referral">Alfons Morales</a> on <a href="https://unsplash.com?utm_source=medium&utm_medium=referral">Unsplash</a></figcaption></figure></article></body>

How I Pass AWS Machine Learning Specialty in a Week

I have been thinking about taking the AWS Machine Learning Specialty for a while. Finally, with the Covid-19 lockdown, I end up booking the exam and start studying. I got a free AWS exam voucher from joining the AWS ML community(I would recommend anyone who’s interested in cloud computing or ML to join, lots of great resources,s and free gift! Big Thanks to AWS), and booked the earliest exam which only gave me one week to study. I want to see how well I can do on the exam starting from zero and only study for a week. I also have to work during the daytime, so I spend on average 4 hours on a weekday and around 12 hours on the weekend, 32 hours in total to study fo the exam.

I purposely did it because I found that limited review time usually helps to force you focusing on the most important things. I successfully passed my exam with a 970/1000 score last Friday and would like to share my study guide with anyone who is planning to take it. This should be really helpful that everything I shared here below is 99% chance gone show up on the exam.

How I Study

There are lots of training video, different materials on the internet to prepare you for the exam, but if you don’t have enough time and any previous knowledge I would recommend spending your time wisely and only focusing on the Udemy AWS Certified Machine Learning Specialty 2021 course by Frank Kane and the practice questions on Exam Topics. Both of them are relatively inexpensive (Exam Topics is free), and I passed the exam by only studying from those two + Reading AWS Docs.

The 10 hours Udemy course videos are well structured, you can skip the lab since they will not be tested on the exam. But I recommend watching them if you got extra time as they are good for building your future AWS project. Don’t skip any of the non-lab videos, most of the material will show up on the exam.

The Exam Topics ML question bank has 131 questions, I see at least 15+ questions on the exam that are exactly the same as the question bank, those will be your easy points. Also, the question bank covers 90% of the exam testing knowledge domain. I strongly recommend going through the question bank at least three times, read the discussion and understand in-depth of each answers. Keep in mind that, a lot of the answers are wrong, you need to carefully read the discussion and search in AWS doc to find the correct answer. I spent most of my time going through the question banks and understanding each question by searching online. Once you can answer all the questions in there, you have a good chance of passing the exam.

Important Key Concept

Here is the most important section, I will share some key concepts that frequently show up on the exam and how deep you know to understand in order to pass it.

Data Engineering

GlUE

When you see ETL, Serverless, 90% chance answer is Glue, you need to understand how to use GLUE as ETL, how to scheduled ETL workflow, what transformation you can do, and what other service you can integrate with GLUE. Glue shows up a lot in the exam as case study question to de-couple application

Kinesis Family

Understand when to use which Kineses product, how those products integrate with other services (eg: GLUE, EMR, Lamda). What is Shard, and how you can use Kinesis analytic to do ad-hoc transformation. When you see streaming, just look for Kineses. This also came up a lot in the exam

Data Migration Tool

Understand when to use other data migration tool: Data pipeline, DMS, EMR, etc

ETL Transformation

How to combine different services for different ETL workflow fulfilling different ETL requirements.

When you see data storage for machine learning, chose S3. Understand the different ways to get your data into S3.

Exploratory Data Analysis

Missing Value

How to handle missing value, the best option always using ML data augmentation technique

Unbalance Data

Use SMOTE, add more weight to the smaller class in the evaluation metric, turn the classification threshold

Outlier

Always go with RCF

Visualization

Understand the different visualization techniques(Scatter plot, Histogram, etc). How to read elbow graph for early stop. Use Amazon Quicksight for visualization and dashboard.

Data Transformation

Use Log Transform/Numerical Value Binning for skewed data

Understand One Hot encoding, ordinal vs nominal data

Normalization vs. Standardization for large scale data

Modeling

Built-in Model

Understand all the AWS built-in Molde, when to use which. Spend some time understand BlazingText, Word2Vec, LDA, XGboost, Seq2Seq, those come up a lot on the exam.

Hyperparameter Tuning

Understand how to do hyperparameter tuning and AWS auto tuning functionality.

Evaluation

Classification Metric comes up a lot, remember how to calculate Recall, Accuracy, Precision, Sensitivity, and what are those used for. AUC for classification, MAPE for regression

Overfitting

Lots of question around overfitting remember those methods to help overfitting

Lower the max depth in the tree

Lower number of layer

Enable dropout

Early stop

Apply L1 or L2 regularization and dropouts to the training

Decrease feature combinations.

For underfitting, get more data by data augmentation

Run your own code/image/docker

This also comes up a lot on the exam, how to run your own code/image/docker in sagemaker. Read the documentation

AI Service

Remember the different AWS AI service and what they can do, a lot of questions on how to combine those AI services together to create an application

Machine Learning Implementation and Operations

Model Deployment

Understand what is Elastic inference, Sagemaker Neo,Multi-AZ deployment by deploying more than 1 instance, and how to do auto-scaling(Know the concept of cool downtime.)

A/B Test

Know how to deploy multiple model inference under the same endpoint for AB testing

Security

A lot of security-related questions, remember the concept of access Sagemaker via VPC endpoint, access S3 via S3 endpoint, EnableNetworkIsolation setting in Sagemaker notebook. And understand how IAM, S3 bucket policy, endpoint policy, Security group, and KMS work

Important!! Things that will show up on the exam, Nobody will tell you and you just need to remember.

Glue Bookmark
Seq2Seq use attention mechanism to optimize long sentence
Personalize use event tracker to learn from recent data
Polly using lexicons enables doing customization to specific words/acronyms to how they will be pronounced instead.
Aws forecast can optimize use performautoML and Perform HPO
How to update JupterNotebook Version
SHAP ML explainability
Glue Fuzzy Matching
How to avoid AWS recognition using your data to improve your training
Tensorflow distributed in Sagemaker by Horovod framework
Amazon Kinesis Data Analytics Random Cut Forest (RCF)
FindMatch ML Transform using Glue
Stratified k-Fold Cross-Validation for Imbalanced Classification
LDA for customer clustering
Transfer Learning
Read Elbow Plot
SMOTE

Hope this blog helps you to pass the certification exam!

Thanks for reading and I am looking forward to hearing your questions and thoughts. If you want to learn more about Data Science and Cloud Computing, you can find me on Linkedin.