Getting Started with Machine Learning (Part 1): An Absolute Beginner’s Guide — Algorithms
Machine learning, in brief
Machine learning has made huge strides in the past few years and has become pervasive in many industries. But if you’re new to machine learning, it can be challenging to know where to start. In this guide, we’ll help you navigate the basics of machine learning so that you can start using it in your own projects.
First, it’s essential to understand what machine learning is. At its core, machine learning is a type of artificial intelligence that enables machines to make predictions and decisions without explicitly being programmed. Instead, it uses algorithms to “learn” from data, identify patterns and make decisions with minimal human intervention.
How to get started with machine learning
1. Collect data
The first step is to take a look at your data. The data should reflect the issue you’re trying to solve. It should include enough data points to make accurate predictions.
2. Choose a Machine Learning Algorithm
Depending on your project, there are a variety of algorithms to choose from. The algorithm should reflect the issue you are trying to solve.
3. Pre-process the data
Once you’ve collected the data and your algorithm, you’ll need to pre-process it. This step might include removing unnecessary data, normalizing the data, or dealing with missing values.
4. Split the data
Next, you’ll need to split the data into training and testing sets. The training set will be used to “train” your machine learning model, while the testing set will be used to test its accuracy.
5. Train the model
Once the data is ready, it’s time to train the model. This is done by feeding the training set into your chosen machine learning algorithm and allowing it to adjust its parameters until it fits the data well.
6. Evaluate and Test the Model
Once the model is trained, you’ll need to evaluate its performance. You can run it on the testing set and compare its results with the ground truth.
7. Deploy the Model
Now that the model is trained and evaluated, you can deploy it. Depending on your project, this could entail putting the model into production
on a web server, running it locally on a device, or integrating it into an existing application.
By following these steps, you’ll be well on your way to using machine learning in your own projects. While it can be intimidating at first, once you’re familiar with the basics, you’ll be able to apply machine learning in various scenarios.

Top 11 machine learning algorithms
- Linear Regression: Linear regression is a supervised machine learning algorithm for predicting continuous values. It is one of the most widely used algorithms to model the relationship between a dependent variable and one or more independent variables. It is implemented in Python with the help of the scikit-learn library.
# sample code -> LinearRegression
# import libraries
import numpy as np
from sklearn.linear_model import LinearRegression
# load the data
X=np.array([[1,2],[3,4],[5,6]])
y=np.array([7,9,11])
# create the model
model=LinearRegression()
# train the model using the data
model.fit(X,y)2. Logistic Regression: Logistic regression is an extension of linear regression used to predict binary values (Yes/No). It is used to model the probability of an event occurring. For example, it is often used to indicate an individual’s likelihood of belonging to a particular class or group. It is implemented in Python with the help of the scikit-learn library.
# sample code -> LogisticRegression
# import libraries
import numpy as np
from sklearn.linear_model import LogisticRegression
# load the data
X=np.array([[1,2],[3,4],[5,6]])
y=np.array([0,1,1])
# create the model
model=LogisticRegression()
# train the model using the data
model.fit(X,y)3. Support Vector Machine (SVM): SVM is a supervised machine learning algorithm for classification and regression problems. The goal is to find the best decision boundary that maximizes the distance between multiple classes’ closest data points (support vectors). It is implemented in Python with the help of the scikit-learn library.
# sample code -> SVM
# import libraries
import numpy as np
from sklearn import svm
# load the data
X=np.array([[1,2],[3,4],[5,6]])
y=np.array([0,1,1])
# create the model
model = svm.SVC()
# train the model using the data
model.fit(X,y) 4. Decision Trees: Decision trees are a tree-like algorithm used to model decisions and their possible outcomes. It is used to solve classification and regression problems. It is implemented in Python with the help of the scikit-learn library.
# sample code -> DecisionTreeClassifier
# import libraries
import numpy as np
from sklearn.tree import DecisionTreeClassifier
# load the data
X=np.array([[1,2],[3,4],[5,6]])
y=np.array([0,1,1])
# create the model
model=DecisionTreeClassifier()
# train the model using the data
model.fit(X,y)5. Hierarchical Clustering: Hierarchical clustering is the process of grouping data points into clusters based on similarity. It is used to analyze data and cluster it into meaningful groups. It is implemented in Python with the help of the scikit-learn library.
# sample code -> AgglomerativeClustering
# import libraries
import numpy as np
from sklearn.cluster import AgglomerativeClustering
# load the data
X=np.array([[1,2],[3,4],[5,6]])
# create the model
model=AgglomerativeClustering()
# train the model using the data
model.fit(X)6. K-Means Clustering: K-Means clustering is an unsupervised learning algorithm that solves clustering problems. It is used to group data points into clusters based on their similarity. It is implemented in Python with the help of the scikit-learn library.
# sample code -> K-means
# import libraries
import numpy as np
from sklearn.cluster import KMeans
# load the data
X=np.array([[1,2],[3,4],[5,6]])
# create the model
model=KMeans(n_clusters=2)
# train the model using the data
model.fit(X)7. Naive Bayes: Naive Bayes is a supervised machine learning algorithm for classification and prediction. It is based on the Bayes theorem and calculates the probability of an event occurring given the evidence. It is implemented in Python with the help of the scikit-learn library
# sample code -> Linear Naive Bayes
# import libraries
import numpy as np
from sklearn.naive_bayes import GaussianNB
# load the data
X=np.array([[1,2],[3,4],[5,6]])
y=np.array([0,1,1])
# create the model
model=GaussianNB()
# train the model using the data
model.fit(X,y)8. Random Forest: Random forest is an ensemble machine learning algorithm that solves classification and regression problems. It is based on decision trees and combines them to create an even more accurate and robust prediction model. It is implemented in Python with the help of the scikit-learn library.
# sample code -> Random Forest
# import libraries
import numpy as np
from sklearn.ensemble import RandomForestClassifier
# load the data
X=np.array([[1,2],[3,4],[5,6]])
y=np.array([0,1,1])
# create the model
model=RandomForestClassifier()
# train the model using the data
model.fit(X,y)9. Gradient Boosting: Gradient boosting is an ensemble machine learning algorithm that solves regression and classification problems. It is based on decision trees and uses the residual errors of a model to build new models. It is implemented in Python with the help of the scikit-learn library.
# sample code -> GradientBoostingClassifier
importing libraries
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
# load the data
X=np.array([[1,2],[3,4],[5,6]])
y=np.array([0,1,1])
# create the model
model=GradientBoostingClassifier()
# train the model using the data
model.fit(X,y)10. K-Nearest Neighbors (KNN): KNN is an unsupervised machine learning algorithm that solves classification and prediction problems. It is based on the similarity of a data point to its nearest neighbors and is used to predict the class of a data point. It is implemented in Python with the help of the scikit-learn library.
# sample code -> KNeighborsClassifier
# import libraries
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
# load the data
X=np.array([[1,2],[3,4],[5,6]])
y=np.array([0,1,1])
# create the model
model=KNeighborsClassifier()
# train the model using the data
model.fit(X,y)11. Dimensionality Reduction: Dimensionality reduction is an unsupervised machine learning algorithm used to reduce the number of features or dimensions of a dataset. It is used to make datasets easier to analyze and visualize. It is implemented in Python with the help of the scikit-learn library.
# sample code -> dimensionality reduction using principle component analysis
# import libraries
import numpy as np
from sklearn.decomposition import PCA
# load the data
X=np.array([[1,2],[3,4],[5,6]])
# create the model
model=PCA()
# train the model using the data
model.fit(X)For additional PM and ML reading and resources (mixture of free and subscription services): Bits, Bytes, and Bots
For Education & Analytics reading and resources (mixture of free and subscription services): Education on Education
