Summary

The web content outlines the second part of a clustering project in machine learning, providing detailed code snippets, visualizations, and insights into customer data analysis, along with announcements for upcoming projects and educational resources.

Abstract

The article is a continuation of a machine learning clustering project series, focusing on analyzing customer personality data. It begins by welcoming readers back to the series and linking to the first part of the project. The author then lists various other series and projects available for readers interested in different aspects of data science, machine learning, and related fields. The post introduces the reader to the dataset used for the clustering analysis, which is available on GitHub, and walks through the process of data preprocessing, visualization, and initial insights using Python and libraries such as pandas, seaborn, and matplotlib. The visualizations include age distribution, marital status, education levels, income analysis related to complaints, number of kids at home, and response to marketing campaigns. The article also discusses the removal of non-informative constant value features and the use of heatmaps to understand feature correlations. The author concludes by teasing the upcoming third part of the project series and providing links to other machine learning projects and tutorials. Additionally, the post encourages readers to subscribe to a newsletter and a newly launched YouTube channel for more content and to stay updated with tech interview tips, coding exercises, and system design questions.

Opinions

The author emphasizes the educational value of their content, offering a comprehensive guide to practical machine learning applications through projects.
The inclusion of code snippets and visualizations suggests a hands-on approach to learning, implying that the author believes in learning by doing.
By providing a diverse range of related series and projects, the author indicates a commitment to covering a broad spectrum of topics within data science and machine learning.
The encouragement to follow the YouTube channel and subscribe to the newsletter shows the author's dedication to building a community and continuously engaging with their audience.
The quote by Vincent van Gogh at the end of the post reflects the author's belief in the importance of perseverance and optimism in the challenging journey of learning and coding.

Day 29 : 60 days of Data Science and Machine Learning Series

ML clustering Project 2 ( Part 2)..

Welcome back peeps. In this post we would be implementing part 2 of the project covering clustering in ML. Project part 1 can be found here :

Day 28 : 60 days of Data Science and Machine Learning Series

ML Clustering Project 2 ( Part 1)..

medium.com

Some of the other best Series —

30 Days of Natural Language Processing ( NLP) Series

30 days of Data Engineering with projects Series

60 days of Data Science and ML Series with projects

100 days : Your Data Science and Machine Learning Degree Series with projects

23 Data Science Techniques You Should Know

Tech Interview Series — Curated List of coding questions

Complete System Design with most popular Questions Series

Complete Data Visualization and Pre-processing Series with projects

Complete Python Series with Projects

Complete Advanced Python Series with Projects

Kaggle Best Notebooks that will teach you the most

Complete Developers Guide to Git

All the Data Science and Machine Learning Resources

210 Machine Learning Projects

30 days of Machine Learning Ops

Projects Videos —

Subscribe today!

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

www.youtube.com

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication. Launched 7 months…

naina0405.substack.com

The data for this project can be found in the link below —

GitHub - Pikachu0405/Customer-Personality-Analysis

Customer Personality Analysis is a detailed analysis of a company's ideal customers. It helps a business to better…

github.com

Lets dive in —

import datetime as dt
df['Age'] = 2021 - df.Year_Birth

# Age Level

plt.figure(figsize=(25, 6))
plt.title('Age distribution')
ax = sns.histplot(df['Age'].sort_values(), bins=56)
sns.rugplot(data=df['Age'], height=.05)
plt.xticks(np.linspace(df['Age'].min(), df['Age'].max(), 56, dtype=int, endpoint = True))
plt.grid(False)

plt.show()

Output —

# Education and Marital Status

cc=df.groupby("Marital_Status").count()['Age']
label=df.groupby('Marital_Status').count()['Age'].index
fig, ax = plt.subplots(1, 2, figsize = (10, 12))
ax[0].pie(cc, labels=label, shadow=True, autopct='%1.2f%%',explode=[0.1 for i in cc.index],radius=2,colors=colors1,startangle=45)
ax[0].set_title('Martial Status', y=-0.6)

cc1 = df.groupby("Education").count()['Age']
label = df.groupby('Education').count()['Age'].index
ax[1].pie(cc1, labels=label, shadow=True, autopct='%1.2f%%',explode=[0.1 for i in cc1.index],radius=2,colors=colors1,startangle=45)
ax[1].set_title('Education Qualification', y=-0.6)
plt.subplots_adjust(wspace = 1.5, hspace =0)

plt.show()

Output —

plt.figure(figsize=(25,20))
sns.kdeplot(
   data=df, x="Income", hue="Complain", log_scale= True,
   fill=True, common_norm=False,palette='mako',
   alpha=.5, linewidth=0,
)
plt.gca().axes.get_yaxis().set_visible(False) # Set y invisible
plt.xlabel('Income')

plt.show()

Output —

# No of Kids home vs Income

plt.figure(figsize=(15,10))

sns.kdeplot(
   data=df, x="Income", hue="Kidhome", log_scale= True,
   fill=True, common_norm=False,palette='mako',
   alpha=.5, linewidth=0,
)
plt.gca().axes.get_yaxis().set_visible(False) 
plt.xlabel('Income')

plt.show()

Output —

plt.figure(figsize=(15,10))

sns.kdeplot(
   data=df, x="Income", hue="Teenhome", log_scale= True,
   fill=True, common_norm=False,palette='crest',
   alpha=.5, linewidth=0,
)
plt.gca().axes.get_yaxis().set_visible(False) # Set y invisible
plt.xlabel('Income')

plt.show()

Output —

# Income and Response

plt.figure(figsize=(28,20))

sns.kdeplot(
   data=df, x="Income", hue="Response", log_scale= True,
   fill=True, common_norm=False,palette='mako',
   alpha=.5, linewidth=0,
)

plt.gca().axes.get_yaxis().set_visible(False)
plt.xlabel('Income')

plt.show()

Output —

Z_Revenue & Z_CostContact have Constant value, which don’t provide any information so we should drop them.

df.drop(['Z_CostContact', 'Z_Revenue'], axis=1, inplace=True)

# Heatmap 
plt.figure(figsize = (30,25))
df_cor = df.corr()
sns.heatmap(df_cor, annot = True, cmap = colors1)

plt.show()

Output —

Part 3 of this project : Coming soon

Follow and Stay tuned.

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

medium.datadriveninvestor.com

Recurrent Neural Network with Keras

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

medium.datadriveninvestor.com

Clustering Geolocation Data in Python using DBSCAN and K-Means

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

medium.datadriveninvestor.com

Facial Expression Recognition using Keras

Facial Expression Recognition using Keras

Project Implementation…

medium.datadriveninvestor.com

Hyperparameter Tuning with Keras Tuner

Hyperparameter Tuning with Keras Tuner

Project Implementation….

medium.datadriveninvestor.com

Custom Layers in Keras

Custom Layers in Keras

Code implementation …

medium.datadriveninvestor.com

That’s it fellas. Peace out and keep coding :)

Stay Tuned and of-course let me end this post with a quote by Vincent Gogh

“The beginning is perhaps more difficult than anything else, but keep heart, it will turn out all right.”