avatarNaina Chaturvedi

Summary

The website content provides a comprehensive guide on performance metrics for classification models, including accuracy, precision, recall, F1 score, confusion matrix, ROC curve, and AUC score, as part of a 30-day data analytics series.

Abstract

The provided web content is an installment of a 30-day series on data analytics, focusing on performance metrics essential for evaluating classification models. It introduces the concept of bias and variance in model predictions and discusses various metrics such as accuracy, precision, recall, F1 score, and the confusion matrix. The article also delves into the nuances of Type I and Type II errors, the importance of the ROC curve, and the AUC score in assessing model performance. Additionally, the content includes a complete

Day 27 of 30 days of Data Analytics with Projects Series — Performance Metrics

Pic credits : DS Stack Exchange

Welcome back peep. Hope all’s well. This is Day 27 of 30 days of data analytics where we will be covering Performance Metrics.

  1. Confusion Matrix
  2. F1 Score
  3. Recall
  4. Precision
  5. Type I error
  6. Type II error
  7. Accuracy
  8. ROC Curve
  9. AUC score

What’s covered in 30 days of Data Analytics Series till now —

Day 1 : Data Analytics basics and kickstart of Data analytics with projects series

Day 2: Business Understanding — Data Driven Decision Making, Descriptive Analysis, Predictive Analysis, Diagnostic Analysis, Prescriptive Analysis

Day 3 : Data Analytics Ecosystem — Data Life Cycle, Data Analysis complete process ( most important things)

Day 4 : Probability, Conditional Probability, Binomial Distribution, Probability Density Function, Sampling Distribution

Day 5 : Statistics

Day 6 : Basic and Advanced SQL

Day 7 : Data Collection, Data Cleaning and Python

Day 8 : Pandas and Numpy

Day 9 : Data Manipulation

Day 10 : Data Visualization — Part 1

Day 11 : Project 1 : Data Visualization — Part 2

Day 12 : Data Visualization — Part 3

Day 13: Tableau — Part 1

Day 14: Tableau — Part 2

Day 15: Tableau — Part 3

Tableau Project

Day 16 : Data Analysis Project 2

Day 17 : Data Analysis Project 3

Day 18: Data Analysis Project 4

Day 19: Data Analysis Project 5

Day 20 : Data Analysis Project 6

Categorical and Numerical Features

Missing Value Analysis

Fill the missing Values

Unique Value Analysis

Univariate Analysis

Bivariate Analysis

Multivariate Analysis

Correlation Analysis

Day 21 : Data Analysis Project 7

Data Profiling

Feature Engineering

GroupBy Features

Categorical and Numerical Features

Missing Value Analysis

Fill the missing Values

Unique Value Analysis

Univariate Analysis

Bivariate Analysis

Multivariate Analysis

Correlation Analysis

Day 22 : Data analysis Project 8

Linear Regression

Data Profiling

Feature Engineering

Sort Values

Categorical and Numerical Features

Missing Value Analysis

Unique Value Analysis

Univariate Analysis

Bivariate Analysis

Multivariate Analysis

Correlation Analysis

Correlation Coefficients

Day 25 : Data Analysis Day 11

Summary Functions

Indexing

Grouping

Sorting

Data Profiling

Categorical and Numerical Features

Missing Value Analysis

Unique Value Analysis

Data Visualization

Take Complete Hands On Tableau Course : Link

Projects Videos —

All the projects, data structures, SQL, algorithms, system design, Data Science and ML , Data Analytics, Data Engineering, , Implemented Data Science and ML projects, Implemented Data Engineering Projects, Implemented Deep Learning Projects, Implemented Machine Learning Ops Projects, Implemented Time Series Analysis and Forecasting Projects, Implemented Applied Machine Learning Projects, Implemented Tensorflow and Keras Projects, Implemented PyTorch Projects, Implemented Scikit Learn Projects, Implemented Big Data Projects, Implemented Cloud Machine Learning Projects, Implemented Neural Networks Projects, Implemented OpenCV Projects,Complete ML Research Papers Summarized, Implemented Data Analytics projects, Implemented Data Visualization Projects, Implemented Data Mining Projects, Implemented Natural Leaning Processing Projects, MLOps and Deep Learning, Applied Machine Learning with Projects Series, PyTorch with Projects Series, Tensorflow and Keras with Projects Series, Scikit Learn Series with Projects, Time Series Analysis and Forecasting with Projects Series, ML System Design Case Studies Series videos will be published on our youtube channel ( just launched).

Subscribe today!

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

Let’s get started!

Performance Metrics

Performance metrics for Classification Models can be divided into —

Class Labels

Probabilities

Under Class labels —

  1. Confusion Matrix
  2. F1 Score
  3. Recall
  4. Precision
  5. Type I error
  6. Type II error
  7. Accuracy

Under Probabilities —

  1. ROC Curve
  2. AUC score

Before we jump into further digging, let’s understand some basics —

True Positives ( TP)

Those values that are correctly predicted positive values. For example if a bird’s color actual class is yes and the predicted class is also yes.

True Negatives ( TN)

Those values that are correctly predicted negative values. For example if a bird’s color actual class is no and the predicted class is also no .

False Positives ( FP)

Those values that are incorrectly predicted positive values. For example if a bird’s color actual class is no and the predicted class is yes.

False Negatives (FN)

Those values that are incorrectly predicted negative values. For example if a bird’s color actual class is yes and the predicted class is no.

Before I move forward, lets understand some basic terms —

Pic credits : Stack Exchange

Bias

In layman terms, bias is the difference between the predicted and actual values.

Pic credits : O’Reilly

These biases are of different types —

  1. Selection Bias : It’s about collecting that data which doesn’t represent the target population.
  2. Measurement Bias : It’s about being biased towards a particular result while observing other features/columns.
  3. Algorithm Bias : It’s about the ML algorithm getting biased over the training data.
  4. Exclusion Bias : It’s about excluding valuable data which is tagged as not important.
  5. Recall Bias : It’s about labeling same data inconsistently.

Low bias : In this the model makes only few assumptions about the form of the target function.

High Bias : In this the model makes more assumptions about the form of the target function.

Low bias ML algorithms : KNN, SVM and Decision Trees

High bias ML algorithms : Logistic Regression and Linear Regression

Example —

If the model is trained on the shape of the fruit, then bias example would be predicting a lychee to be a dragon fruit.

Variance

Variance is an error which measures randomness of the predicted results from the actual results. It’s defined as model’s sensitivity to randomness in the data.

Low Variance : It’s about the model having minor variation in the predicted values wrt to any changes in the training data set.

High Variance : It’s about the model having major variation in the predicted values wrt to any changes in the training data set.

Low variance ML algorithms : Logistic Regression, Linear Regression and Linear Discriminant Analysis

High variance ML algorithms : Decision Trees, Support Vector Machines and KNN

Example —

Let’s say if we consider red color as a feature then that will be a noise as many fruits have red color.

Bias vs Variance

  • Non Linear algorithms have low bias but a high variance.
  • Linear ML algorithms have high bias but low variance.
  • Increasing the bias will reduce/decrease the variance.
  • Increasing the variance will decrease the bias.
Pic credits : Alteryx

Training Error

It’s an error that occurs during model training and inappropriately handling the dataset during pre-processing.

Pic credits : samlau

Test Error

It’s an error that occurs during model testing and more inclined towards overfitting or under fitting.

Under Fitting

Under fitting is about high training error and high test error.

Over Fitting

Overfitting is about low training error and high test error.

Errors

Mean Square Error

It is the mean of square of all errors.

Pic credits : Quora

Mean Absolute Error

It is the mean of all the absolute errors. It is less sensitive to the outliers as many small errors is aggregated to form one large error.

It is used with Regression Models.

Root Mean Squared Error

It is the error rate by the square root of Mean Square error between the predicted and actual values.

It is more sensitive to the outliers as compared to the MAE.

R-squared

It is used to represent the coefficient of how well the observations fit. It is the percentage of the dependent variable variation which is explained using linear model. It is always between 0 to 100%.

Pic credits : CFI

To summarize,

Variance with overfitting leads to high test error.

Bias with under fitting leads to high train and test error.

Let’s talk about each performance metrics and when to use—

Pic credits : ResearchGate

Accuracy

What is Accuracy?

It’s the ratio of correctly predicted values to the total values.

It gives you the overall performance of the model.

Formula —

Accuracy = (TP+TN)/(TP+FP+FN+TN)

Precision

What is Precision?

It is the ratio of correctly predicted positive values to the total predicted positive observations.

It tells us how accurate the positive predictions are.

Formula —

Precision = TP/ (TP + FP)

Recall

What is Recall?

It is the ratio of correctly predicted positive values to all the values in the actual class which is yes.

It tell us the coverage of the actual positive samples.

Formula —

Recall = TP / (TP + FN)

F1 score

What is F1 Score?

It is the weighted average of recall and precision. This metric is more useful than accuracy in the case of uneven class distribution.

It is used to as a hybrid metric for the unbalanced classes.

Formula —

F1 Score = 2*(Recall * Precision) / (Recall + Precision)

Confusion Matrix

What is Confusion Matrix?

Pic credits : sciencedirect

It defines the performance of the classification algorithm by summarizing and visualizing the data.

It analyzes the classifier’s potential.

Type I error

What is Type I error?

Type 1 error is a false positive result.

Pic credits : Laptrinhx

Type II error

What is Type II Error?

Type II error is a false negative result.

ROC Curve

Pic credits : AIwiki

What is ROC Curve?

Receiver Operator Characteristics is an evaluation metric for the classification problems that are binary in nature.

It is used to assess the overall performance of the test results.

AUC score

What is AUC Score?

Area under the Curve calculates the area under the ROC. It measures how well the predictions are ordered and ranked.

To summarize —

  1. Confusion matrix: A confusion matrix is a table that is used to define the performance of a classification model. It compares the predicted values with the actual values and shows the number of correct and incorrect predictions for each class. The matrix is usually represented in the form of a table with rows representing the actual class and columns representing the predicted class.
  2. F1 Score: The F1 score is a measure of a model’s accuracy that takes both precision and recall into account. It is the harmonic mean of precision and recall, with a higher score indicating better performance.
  3. Recall: Recall is a measure of a model’s ability to correctly identify all relevant instances. It is calculated as the number of true positives divided by the number of true positives plus the number of false negatives.
  4. Precision: Precision is a measure of a model’s ability to correctly identify positive instances. It is calculated as the number of true positives divided by the number of true positives plus the number of false positives.
  5. Type I error: A Type I error, also known as a false positive, occurs when a model incorrectly classifies a negative instance as positive.
  6. Type II error: A Type II error, also known as a false negative, occurs when a model incorrectly classifies a positive instance as negative.
  7. Accuracy: Accuracy is a measure of a model’s overall performance, calculated as the number of correct predictions divided by the total number of predictions.
  8. ROC Curve: Receiver Operating Characteristic (ROC) curve is a graphical representation of the performance of a binary classifier system as the discrimination threshold is varied. It plots the true positive rate against the false positive rate.
  9. AUC Score: AUC stands for “Area Under the ROC Curve.” It is a measure of a model’s overall performance, with a higher AUC score indicating better performance.

Complete Code Implementation —

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Generate a sample dataset
X = np.random.randn(1000, 10)  # Features
y = np.random.randint(0, 2, size=(1000,))  # Target variable (binary classification)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

# Calculate the F1 Score, Recall, and Precision
f1_score = classification_report(y_test, y_pred)['1']['f1-score']
recall = classification_report(y_test, y_pred)['1']['recall']
precision = classification_report(y_test, y_pred)['1']['precision']
print("F1 Score:", f1_score)
print("Recall:", recall)
print("Precision:", precision)

# Calculate the Type I error and Type II error
tn, fp, fn, tp = cm.ravel()
type_i_error = fp / (fp + tn)
type_ii_error = fn / (fn + tp)
print("Type I Error:", type_i_error)
print("Type II Error:", type_ii_error)

# Calculate the Accuracy
accuracy = (tp + tn) / (tp + tn + fp + fn)
print("Accuracy:", accuracy)

# Calculate the ROC Curve and AUC Score
probs = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, probs)
auc = roc_auc_score(y_test, probs)

# Plot the ROC Curve
plt.plot(fpr, tpr, label='ROC Curve (AUC = {:.2f})'.format(auc))
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

print("AUC Score:", auc)

Which Metric to use and when?

  1. Use RMSE when you want to use more weights to the samples that are further from mean, other wise use MAE.
  2. For Regression problem use MSE, MAE, R squared and Adjusted R square.
  3. Use accuracy for classification problems which don’t have class imbalance.
  4. Use precision to confirm the prediction results.
  5. Use recall to get as many positives as possible.
  6. Use F1 score when you want good precision and recall on a model.
  7. For the classification problems, use accuracy, precision, recall, Log-Loss.
  8. Both ROC and AUC are used to evaluate the binary classification problems as well as multi class classification problems — to assess their performance.

That’s it for now. Day 28 coming soon: Advanced Regression.

Let me know if you have questions in the comment section below. Subscribe/ Follow, Like/Clap as it would encourage me to write more in my free time

Stay Tuned!!

Read More —

11 most important System Design Base Concepts

1. System design basics

2. Horizontal and vertical scaling

3. Load balancing and Message queues

4. High level design and low level design, Consistent Hashing, Monolithic and Microservices architecture

5. Caching, Indexing, Proxies

6. Networking, How Browsers work, Content Network Delivery ( CDN)

7. Database Sharding, CAP Theorem, Database schema Design

8. Concurrency, API, Components + OOP + Abstraction

9. Estimation and Planning, Performance

10. Map Reduce, Patterns and Microservices

11. SQL vs NoSQL and Cloud

12. Most Popular System Design Questions

13. System Design Template — How to solve any System Design Question

14. Quick RoundUp : Solved System Design Case Studies

System Design Case Studies — In Depth

Design Instagram

Design Messenger App

Design Twitter

Design URL Shortener

Design Dropbox

Design Youtube

Design API Rate Limiter

Design Web Crawler

Design Facebook’s Newsfeed

Design Yelp

Design Uber

Design Tinder

Design Tiktok

Design Whatsapp

Most Popular System Design Questions

Mega Compilation : Solved System Design Case studies

Complete Data Structures and Algorithm Series

Complexity Analysis

Backtracking

Sliding Window

Greedy Technique

Two pointer Technique

Arrays

Linked List

Strings

Stack

Queues

Hash Table/Hashing

Binary Search

1- D Dynamic Programming

Divide and Conquer Technique

Recursion

Some of the other best Series —

60 days of Data Science and ML Series with projects

30 Days of Natural Language Processing ( NLP) Series

30 days of Machine Learning Ops

30 days of Data Structures and Algorithms and System Design Simplified

60 Days of Deep Learning with Projects Series

30 days of Data Engineering with projects Series

Data Science and Machine Learning Research ( papers) Simplified **

100 days : Your Data Science and Machine Learning Degree Series with projects

23 Data Science Techniques You Should Know

Tech Interview Series — Curated List of coding questions

Complete System Design with most popular Questions Series

Complete Data Visualization and Pre-processing Series with projects

Complete Python Series with Projects

Complete Advanced Python Series with Projects

Kaggle Best Notebooks that will teach you the most

Complete Developers Guide to Git

Exceptional Github Repos — Part 1

Exceptional Github Repos — Part 2

All the Data Science and Machine Learning Resources

210 Machine Learning Projects

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

For Python Projects —

For complete 60 days of Data Science and ML : Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Follow for more updates. Stay tuned and keep coding!

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Recurrent Neural Network with Keras

Clustering Geolocation Data in Python using DBSCAN and K-Means

Facial Expression Recognition using Keras

Hyperparameter Tuning with Keras Tuner

Custom Layers in Keras

Data Science
Machine Learning
Tech
Programming
Artificial Intelligence
Recommended from ReadMedium