How I Pass AWS Machine Learning Specialty in a Week

I have been thinking about taking the AWS Machine Learning Specialty for a while. Finally, with the Covid-19 lockdown, I end up booking the exam and start studying. I got a free AWS exam voucher from joining the AWS ML community(I would recommend anyone who’s interested in cloud computing or ML to join, lots of great resources,s and free gift! Big Thanks to AWS), and booked the earliest exam which only gave me one week to study. I want to see how well I can do on the exam starting from zero and only study for a week. I also have to work during the daytime, so I spend on average 4 hours on a weekday and around 12 hours on the weekend, 32 hours in total to study fo the exam.

I purposely did it because I found that limited review time usually helps to force you focusing on the most important things. I successfully passed my exam with a 970/1000 score last Friday and would like to share my study guide with anyone who is planning to take it. This should be really helpful that everything I shared here below is 99% chance gone show up on the exam.
How I Study
There are lots of training video, different materials on the internet to prepare you for the exam, but if you don’t have enough time and any previous knowledge I would recommend spending your time wisely and only focusing on the Udemy AWS Certified Machine Learning Specialty 2021 course by Frank Kane and the practice questions on Exam Topics. Both of them are relatively inexpensive (Exam Topics is free), and I passed the exam by only studying from those two + Reading AWS Docs.
The 10 hours Udemy course videos are well structured, you can skip the lab since they will not be tested on the exam. But I recommend watching them if you got extra time as they are good for building your future AWS project. Don’t skip any of the non-lab videos, most of the material will show up on the exam.
The Exam Topics ML question bank has 131 questions, I see at least 15+ questions on the exam that are exactly the same as the question bank, those will be your easy points. Also, the question bank covers 90% of the exam testing knowledge domain. I strongly recommend going through the question bank at least three times, read the discussion and understand in-depth of each answers. Keep in mind that, a lot of the answers are wrong, you need to carefully read the discussion and search in AWS doc to find the correct answer. I spent most of my time going through the question banks and understanding each question by searching online. Once you can answer all the questions in there, you have a good chance of passing the exam.
Important Key Concept
Here is the most important section, I will share some key concepts that frequently show up on the exam and how deep you know to understand in order to pass it.
Data Engineering
- GlUE
When you see ETL, Serverless, 90% chance answer is Glue, you need to understand how to use GLUE as ETL, how to scheduled ETL workflow, what transformation you can do, and what other service you can integrate with GLUE. Glue shows up a lot in the exam as case study question to de-couple application
- Kinesis Family
Understand when to use which Kineses product, how those products integrate with other services (eg: GLUE, EMR, Lamda). What is Shard, and how you can use Kinesis analytic to do ad-hoc transformation. When you see streaming, just look for Kineses. This also came up a lot in the exam
- Data Migration Tool
Understand when to use other data migration tool: Data pipeline, DMS, EMR, etc
- ETL Transformation
How to combine different services for different ETL workflow fulfilling different ETL requirements.
- S3
When you see data storage for machine learning, chose S3. Understand the different ways to get your data into S3.
Exploratory Data Analysis
- Missing Value
How to handle missing value, the best option always using ML data augmentation technique
- Unbalance Data
Use SMOTE, add more weight to the smaller class in the evaluation metric, turn the classification threshold
- Outlier
Always go with RCF
- Visualization
Understand the different visualization techniques(Scatter plot, Histogram, etc). How to read elbow graph for early stop. Use Amazon Quicksight for visualization and dashboard.
- Data Transformation
Use Log Transform/Numerical Value Binning for skewed data
Understand One Hot encoding, ordinal vs nominal data
Normalization vs. Standardization for large scale data
Modeling
- Built-in Model
Understand all the AWS built-in Molde, when to use which. Spend some time understand BlazingText, Word2Vec, LDA, XGboost, Seq2Seq, those come up a lot on the exam.
- Hyperparameter Tuning
Understand how to do hyperparameter tuning and AWS auto tuning functionality.
- Evaluation
Classification Metric comes up a lot, remember how to calculate Recall, Accuracy, Precision, Sensitivity, and what are those used for. AUC for classification, MAPE for regression
- Overfitting
Lots of question around overfitting remember those methods to help overfitting
Lower the max depth in the tree
Lower number of layer
Enable dropout
Early stop
Apply L1 or L2 regularization and dropouts to the training
Decrease feature combinations.
For underfitting, get more data by data augmentation
- Run your own code/image/docker
This also comes up a lot on the exam, how to run your own code/image/docker in sagemaker. Read the documentation
- AI Service
Remember the different AWS AI service and what they can do, a lot of questions on how to combine those AI services together to create an application
Machine Learning Implementation and Operations
- Model Deployment
Understand what is Elastic inference, Sagemaker Neo,Multi-AZ deployment by deploying more than 1 instance, and how to do auto-scaling(Know the concept of cool downtime.)
- A/B Test
Know how to deploy multiple model inference under the same endpoint for AB testing
- Security
A lot of security-related questions, remember the concept of access Sagemaker via VPC endpoint, access S3 via S3 endpoint, EnableNetworkIsolation setting in Sagemaker notebook. And understand how IAM, S3 bucket policy, endpoint policy, Security group, and KMS work
Important!! Things that will show up on the exam, Nobody will tell you and you just need to remember.
- Glue Bookmark
- Seq2Seq use attention mechanism to optimize long sentence
- Personalize use event tracker to learn from recent data
- Polly using lexicons enables doing customization to specific words/acronyms to how they will be pronounced instead.
- Aws forecast can optimize use performautoML and Perform HPO
- How to update JupterNotebook Version
- SHAP ML explainability
- Glue Fuzzy Matching
- How to avoid AWS recognition using your data to improve your training
- Tensorflow distributed in Sagemaker by Horovod framework
- Amazon Kinesis Data Analytics Random Cut Forest (RCF)
- FindMatch ML Transform using Glue
- Stratified k-Fold Cross-Validation for Imbalanced Classification
- LDA for customer clustering
- Transfer Learning
- Read Elbow Plot
- SMOTE
Hope this blog helps you to pass the certification exam!
Thanks for reading and I am looking forward to hearing your questions and thoughts. If you want to learn more about Data Science and Cloud Computing, you can find me on Linkedin.
