avatarNaina Chaturvedi

Summary

The web content outlines the second part of a clustering project in machine learning, providing detailed code snippets, visualizations, and insights into customer data analysis, along with announcements for upcoming projects and educational resources.

Abstract

The article is a continuation of a machine learning clustering project series, focusing on analyzing customer personality data. It begins by welcoming readers back to the series and linking to the first part of the project. The author then lists various other series and projects available for readers interested in different aspects of data science, machine learning, and related fields. The post introduces the reader to the dataset used for the clustering analysis, which is available on GitHub, and walks through the process of data preprocessing, visualization, and initial insights using Python and libraries such as pandas, seaborn, and matplotlib. The visualizations include age distribution, marital status, education levels, income analysis related to complaints, number of kids at home, and response to marketing campaigns. The article also discusses the removal of non-informative constant value features and the use of heatmaps to understand feature correlations. The author concludes by teasing the upcoming third part of the project series and providing links to other machine learning projects and tutorials. Additionally, the post encourages readers to subscribe to a newsletter and a newly launched YouTube channel for more content and to stay updated with tech interview tips, coding exercises, and system design questions.

Opinions

  • The author emphasizes the educational value of their content, offering a comprehensive guide to practical machine learning applications through projects.
  • The inclusion of code snippets and visualizations suggests a hands-on approach to learning, implying that the author believes in learning by doing.
  • By providing a diverse range of related series and projects, the author indicates a commitment to covering a broad spectrum of topics within data science and machine learning.
  • The encouragement to follow the YouTube channel and subscribe to the newsletter shows the author's dedication to building a community and continuously engaging with their audience.
  • The quote by Vincent van Gogh at the end of the post reflects the author's belief in the importance of perseverance and optimism in the challenging journey of learning and coding.

Day 29 : 60 days of Data Science and Machine Learning Series

ML clustering Project 2 ( Part 2)..

Welcome back peeps. In this post we would be implementing part 2 of the project covering clustering in ML. Project part 1 can be found here :

Some of the other best Series —

30 Days of Natural Language Processing ( NLP) Series

30 days of Data Engineering with projects Series

60 days of Data Science and ML Series with projects

100 days : Your Data Science and Machine Learning Degree Series with projects

23 Data Science Techniques You Should Know

Tech Interview Series — Curated List of coding questions

Complete System Design with most popular Questions Series

Complete Data Visualization and Pre-processing Series with projects

Complete Python Series with Projects

Complete Advanced Python Series with Projects

Kaggle Best Notebooks that will teach you the most

Complete Developers Guide to Git

All the Data Science and Machine Learning Resources

210 Machine Learning Projects

30 days of Machine Learning Ops

Projects Videos —

All the projects, data structures, SQL, algorithms, system design, Data Science and ML , Data Analytics, Data Engineering, , Implemented Data Science and ML projects, Implemented Data Engineering Projects, Implemented Deep Learning Projects, Implemented Machine Learning Ops Projects, Implemented Time Series Analysis and Forecasting Projects, Implemented Applied Machine Learning Projects, Implemented Tensorflow and Keras Projects, Implemented PyTorch Projects, Implemented Scikit Learn Projects, Implemented Big Data Projects, Implemented Cloud Machine Learning Projects, Implemented Neural Networks Projects, Implemented OpenCV Projects,Complete ML Research Papers Summarized, Implemented Data Analytics projects, Implemented Data Visualization Projects, Implemented Data Mining Projects, Implemented Natural Leaning Processing Projects, MLOps and Deep Learning, Applied Machine Learning with Projects Series, PyTorch with Projects Series, Tensorflow and Keras with Projects Series, Scikit Learn Series with Projects, Time Series Analysis and Forecasting with Projects Series, ML System Design Case Studies Series videos will be published on our youtube channel ( just launched).

Subscribe today!

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

The data for this project can be found in the link below —

Lets dive in —

import datetime as dt
df['Age'] = 2021 - df.Year_Birth
# Age Level
plt.figure(figsize=(25, 6))
plt.title('Age distribution')
ax = sns.histplot(df['Age'].sort_values(), bins=56)
sns.rugplot(data=df['Age'], height=.05)
plt.xticks(np.linspace(df['Age'].min(), df['Age'].max(), 56, dtype=int, endpoint = True))
plt.grid(False)
plt.show()

Output —

# Education and Marital Status
cc=df.groupby("Marital_Status").count()['Age']
label=df.groupby('Marital_Status').count()['Age'].index
fig, ax = plt.subplots(1, 2, figsize = (10, 12))
ax[0].pie(cc, labels=label, shadow=True, autopct='%1.2f%%',explode=[0.1 for i in cc.index],radius=2,colors=colors1,startangle=45)
ax[0].set_title('Martial Status', y=-0.6)
cc1 = df.groupby("Education").count()['Age']
label = df.groupby('Education').count()['Age'].index
ax[1].pie(cc1, labels=label, shadow=True, autopct='%1.2f%%',explode=[0.1 for i in cc1.index],radius=2,colors=colors1,startangle=45)
ax[1].set_title('Education Qualification', y=-0.6)
plt.subplots_adjust(wspace = 1.5, hspace =0)
plt.show()

Output —

plt.figure(figsize=(25,20))
sns.kdeplot(
   data=df, x="Income", hue="Complain", log_scale= True,
   fill=True, common_norm=False,palette='mako',
   alpha=.5, linewidth=0,
)
plt.gca().axes.get_yaxis().set_visible(False) # Set y invisible
plt.xlabel('Income')

plt.show()

Output —

# No of Kids home vs Income
plt.figure(figsize=(15,10))
sns.kdeplot(
   data=df, x="Income", hue="Kidhome", log_scale= True,
   fill=True, common_norm=False,palette='mako',
   alpha=.5, linewidth=0,
)
plt.gca().axes.get_yaxis().set_visible(False) 
plt.xlabel('Income')

plt.show()

Output —

plt.figure(figsize=(15,10))
sns.kdeplot(
   data=df, x="Income", hue="Teenhome", log_scale= True,
   fill=True, common_norm=False,palette='crest',
   alpha=.5, linewidth=0,
)
plt.gca().axes.get_yaxis().set_visible(False) # Set y invisible
plt.xlabel('Income')

plt.show()

Output —

# Income and Response
plt.figure(figsize=(28,20))
sns.kdeplot(
   data=df, x="Income", hue="Response", log_scale= True,
   fill=True, common_norm=False,palette='mako',
   alpha=.5, linewidth=0,
)
plt.gca().axes.get_yaxis().set_visible(False)
plt.xlabel('Income')

plt.show()

Output —

Z_Revenue & Z_CostContact have Constant value, which don’t provide any information so we should drop them.

df.drop(['Z_CostContact', 'Z_Revenue'], axis=1, inplace=True)
# Heatmap 
plt.figure(figsize = (30,25))
df_cor = df.corr()
sns.heatmap(df_cor, annot = True, cmap = colors1)

plt.show()

Output —

Part 3 of this project : Coming soon

Follow and Stay tuned.

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Recurrent Neural Network with Keras

Clustering Geolocation Data in Python using DBSCAN and K-Means

Facial Expression Recognition using Keras

Hyperparameter Tuning with Keras Tuner

Custom Layers in Keras

That’s it fellas. Peace out and keep coding :)

Stay Tuned and of-course let me end this post with a quote by Vincent Gogh

“The beginning is perhaps more difficult than anything else, but keep heart, it will turn out all right.”

Machine Learning
Tech
Artificial Intelligence
Programming
Data Science
Recommended from ReadMedium