avatarNaina Chaturvedi

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

9589

Abstract

4a5caa305cf6?sk=89f3d43dd450035546bf3a8cf85bb125">Deep Learning</a>, <a href="https://readmedium.com/60-days-of-applied-machine-learning-with-projects-series-cd975641da0a?sk=09cf1f30e912774cba6501c8bac5edde">Applied Machine Learning with Projects Series</a>, <a href="https://readmedium.com/30-days-of-pytorch-with-projects-series-737941e5aa4f?sk=d0ead140034be9f1fff27d059b525221">PyTorch with Projects Series</a>, <a href="https://readmedium.com/30-days-of-tensorflow-and-keras-with-projects-series-f52e0815d696?sk=945bb73c32bc967b7e056f894fab7626">Tensorflow and Keras with Projects Series</a>, <a href="https://readmedium.com/day-1-of-30-days-of-scikit-learn-series-with-projects-76341935e5fd?sk=44a6845c53109c2482c368bdb7924e46">Scikit Learn Series with Projects</a>, <a href="https://readmedium.com/day-1-of-15-days-of-time-series-analysis-and-forecasting-with-projects-series-5ba3b6cf7528?sk=7a5826927d95b8fd22deae9ee53bc54d">Time Series Analysis and Forecasting with Projects Series</a>, <a href="https://readmedium.com/day-1-of-ml-system-design-case-studies-series-ml-system-design-basics-dbf7765b3c0c?sk=9ce5aee0a8b5208be05ac5284872e91b">ML System Design Case Studies Series</a> videos will be published on our youtube channel ( just launched).</i></b></p><p id="4b19"><b><i>Subscribe today!</i></b></p><div id="1520" class="link-block"> <a href="https://www.youtube.com/@ignito5917/about"> <div> <div> <h2>Ignito</h2> <div><h3>Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …</h3></div> <div><p>www.youtube.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*N9OmxhpEw0AuQEey)"></div> </div> </div> </a> </div><h2 id="9083">Tech Newsletter —</h2><blockquote id="8abe"><p>If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to <b>Tech Brew :</b></p></blockquote><div id="8d5c" class="link-block"> <a href="https://naina0405.substack.com/"> <div> <div> <h2>Ignito</h2> <div><h3>Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication. Launched 7 months…</h3></div> <div><p>naina0405.substack.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*_ER1J-h50iqAjH70)"></div> </div> </div> </a> </div><h2 id="c2f6">Bayes Theorem —</h2><p id="b932">It’s a method which is used to determine conditional probabilities which means the probability of one event occurring given that another event has already occurred.</p><p id="6d61">The conditional probability can be calculated as follows —</p><p id="09d1"><b>P(A | B) = P(A, B) / P(B)</b></p><p id="2a29">Bayes theorem is extensively used in the medical sciences ( to calculate the likelihood ratio if some one has a disease or not). Nevertheless, it’s also used in the other applications — filtering spam.</p><p id="8cb6">NB classifiers are a set of classification algorithms based on Bayes’ Theorem.</p><p id="9080">A good reference to understand the vastness of Naive Bayes —</p><div id="ad59" class="link-block"> <a href="https://scikit-learn.org/stable/modules/naive_bayes.html"> <div> <div> <h2>1.9. Naive Bayes</h2> <div><h3>Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes' theorem with the "naive"…</h3></div> <div><p>scikit-learn.org</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*2_TJGntwn_ufvKjZ)"></div> </div> </div> </a> </div><p id="e063">MultinomialNB is used for text classification which implements naive Bayes algorithm for multinomially distributed data . In this post we are going to implement language classification with Naive Bayes through project.</p><p id="333e">Let’s dive in!</p><h2 id="7b4d">Import Necessary Libraries</h2><div id="dfde"><pre><span class="hljs-keyword">import</span> matplotlib <span class="hljs-meta">%matplotlib</span> inline <span class="hljs-meta">%config</span> InlineBackend.figure_format = <span class="hljs-string">'svg'</span> <span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt plt.style.use(<span class="hljs-string">'ggplot'</span>)</pre></div><div id="e98a"><pre><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np <span class="hljs-keyword">import</span> string</pre></div><div id="49ee"><pre><span class="hljs-keyword">from</span> collections <span class="hljs-keyword">import</span> defaultdict</pre></div><div id="e1f1"><pre><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> f1_score <span class="hljs-keyword">from</span> sklearn.naive_bayes <span class="hljs-keyword">import</span> MultinomialNB <span class="hljs-keyword">from</span> sklearn.feature_extraction.text <span class="hljs-keyword">import</span> CountVectorizer</pre></div><div id="9df3"><pre><span class="hljs-keyword">import</span> joblib </pre></div><h2 id="3b04">Load the data</h2><div id="65ac"><pre>model = joblib.<span class="hljs-keyword">load</span>(<span class="hljs-string">'Path to Data/Models/final_model.joblib'</span>) vectorizer = joblib.<span class="hljs-keyword">load</span>(<span class="hljs-string">'Path to Data/Vectorizers/final_model.joblib'</span>)

<span class="hljs-type">text</span> = <span class="hljs-string">'Its a beautiful day'</span> <span class="hljs-type">text</span> = preprocess_function(<span class="hljs-type">text</span>) <span class="hljs-type">text</span> = [split_into_subwords_function(<span class="hljs-type">text</span>)] tv =vectorizer.<span class="hljs-keyword">transform</span>(<span class="hljs-type">text</span>) model.predict(tv)</pre></div><p id="64bc">Output —</p><div id="7c6b"><pre><span class="hljs-function"><span class="hljs-title">array</span><span class="hljs-params">([<span class="hljs-string">'en'</span>], dtype=<span class="hljs-string">'<U2'</span>)</span></span></pre></div><h2 id="9754">Exploratory data analysis</h2><div id="bbfe"><pre>def open_file(filename): with <span class="hljs-keyword">open</span>(filename, <span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> <span class="hljs-keyword">f</span>: data = <span class="hljs-keyword">f</span>.readlines() <span class="hljs-keyword">return</span> data <span class="hljs-keyword">dr</span>= dict() <span class="hljs-keyword">dr</span>[<span class="hljs-string">'sk'</span>] = open_file(<span class="hljs-string">'Data/Sentences/train_sentences.sk'</span>) <span class="hljs-keyword">dr</span>[<span class="hljs-string">'cs'</span>] = open_file(<span class="hljs-string">'Data/Sentences/train_sentences.cs'</span>) <span class="hljs-keyword">dr</span>[<span class="hljs-string">'en'</span>] = open_file(<span class="hljs-string">'Data/Sentences/train_sentences.en'</span>) def show_statistics(data): <span class="hljs-keyword">for</span> <span class="hljs-keyword">language</span>, sentences in data.<span class="hljs-built_in">items</span>():

    number_of_sentences = <span class="hljs-number">0</span>
    number_of_words = <span class="hljs-number">0</span>
    number_of_unique_words = <span class="hljs-number">0</span>
    sample_extract = <span class="hljs-string">''</span>
    
    
    word_list = <span class="hljs-string">' '</span>.<span class="hljs-keyword">join</span>(sentences).<span class="hljs-keyword">split</span>()
    ns = <span class="hljs-built_in">len</span>(sentences)
    nw=<span class="hljs-built_in">len</span>(word_list)
    <span class="hljs-keyword">nu</span> = <span class="hljs-built_in">len</span>(<span class="hljs-keyword">set</span>(word_list))
    <span class="hljs-keyword">se</span> = <span class="hljs-string">' '</span>.<span class="hljs-keyword">join</span>(sentences[<span class="hljs-number">0</span>].<span class="hljs-keyword">split</span>()[<span class="hljs-number">0</span>:<span class="hljs-number">7</span>])
    
    

show_statistics(<span class="hljs-keyword">dr</span>)</pre></div><p id="193d">Output —</p><div id="0b3b"><pre><span class="hljs-section">Language: sk -----------------------</span> Number of sentences : 100 Number of words : 2016 Number of unique words : 1322 Sample extract : Pán de Grandes Pascual jasne vysvetlil, aká...

<span class="hljs-section">Language: cs -----------------------</span> Number of sentences : 10 Number of words : 158 Number of unique words : 141 Sample extract : Upozorňujeme, že jejím cílem je šetřit penězi...

<span class="hljs-section">Language: en -----------------------</span> Number of sentences : 100 Number of words : 2381 Number of unique words : 1037 Sample extract : I can understand your approach a little...</pre></div><h2 id="1945">Data preprocessing</h2><div id="62a4"><pre>def <span class="h

Options

ljs-built_in">preprocess</span>(text):

preprocessed_text = text
preprocessed_text = text.<span class="hljs-built_in">lower</span>().<span class="hljs-built_in">replace</span>(<span class="hljs-string">'-'</span>,<span class="hljs-string">' '</span>)
translation_table = str.<span class="hljs-built_in">maketrans</span>(<span class="hljs-string">'\n'</span>,<span class="hljs-string">' '</span>,string.punctuation + string.digits)
preprocessed_text= preprocessed_text.<span class="hljs-built_in">translate</span>(translation_table)
    
return preprocessed_text

data_preprocessed = {k: [<span class="hljs-built_in">preprocess</span>(sentence) for sentence in v] for k,v in dr.<span class="hljs-built_in">items</span>()} <span class="hljs-built_in">show_statistics</span>(data_preprocessed)</pre></div><p id="0d77">Output —</p><div id="c93c"><pre><span class="hljs-section">Language: sk -----------------------</span> Number of sentences : 100 Number of words : 1996 Number of unique words : 1207 Sample extract : pán de grandes pascual jasne vysvetlil aká...

<span class="hljs-section">Language: cs -----------------------</span> Number of sentences : 10 Number of words : 155 Number of unique words : 133 Sample extract : upozorňujeme že jejím cílem je šetřit penězi...

<span class="hljs-section">Language: en -----------------------</span> Number of sentences : 100 Number of words : 2366 Number of unique words : 904 Sample extract : i can understand your approach a little...</pre></div><h2 id="30d9">Vectorize</h2><div id="14cd"><pre>strain,y_train =<span class="hljs-comment">[]</span>,<span class="hljs-comment">[]</span></pre></div><div id="e979"><pre>for k,v in data_preprocessed<span class="hljs-selector-class">.items</span>(): for sentence in v: strain.<span class="hljs-built_in">append</span>(sentence) y_train.<span class="hljs-built_in">append</span>(k) vectorizer = <span class="hljs-built_in">CountVectorizer</span>() X_train = vectorizer.<span class="hljs-built_in">fit_transform</span>(strain)</pre></div><h2 id="ebe8">Build Model and Training</h2><div id="4339"><pre>nc = <span class="hljs-built_in">MultinomialNB</span>() nc<span class="hljs-selector-class">.fit</span>(X_train,y_train) </pre></div><h2 id="cba0">Evaluate the model</h2><div id="54d7"><pre><span class="hljs-variable">dv</span>= <span class="hljs-function"><span class="hljs-title">dict</span>()</span> <span class="hljs-variable">dv</span>[<span class="hljs-string">'sk'</span>] = <span class="hljs-function"><span class="hljs-title">open_file</span>(<span class="hljs-string">'Data/Sentences/val_sentences.sk'</span>)</span> <span class="hljs-variable">dv</span>[<span class="hljs-string">'cs'</span>] = <span class="hljs-function"><span class="hljs-title">open_file</span>(<span class="hljs-string">'Data/Sentences/val_sentences.cs'</span>)</span> <span class="hljs-variable">dv</span>[<span class="hljs-string">'en'</span>] = <span class="hljs-function"><span class="hljs-title">open_file</span>(<span class="hljs-string">'Data/Sentences/val_sentences.en'</span>)</span></pre></div><div id="a862"><pre><span class="hljs-attr">data_val_preprocessed</span> = {k: [preprocess(sentence) for sentence in v] for k,v in dv.items()}</pre></div><div id="0e5a"><pre>sval,y_val =<span class="hljs-comment">[]</span>,<span class="hljs-comment">[]</span></pre></div><div id="1243"><pre><span class="hljs-keyword">for</span> k,v <span class="hljs-keyword">in</span> data_val_preprocessed.<span class="hljs-keyword">items</span>(): <span class="hljs-keyword">for</span> <span class="hljs-keyword">sentence</span> <span class="hljs-keyword">in</span> v: sval.append(<span class="hljs-keyword">sentence</span>) y_val.append(k)</pre></div><div id="9984"><pre><span class="hljs-attr">X_val</span> = vectorizer.transform(sval) <span class="hljs-attr">pred</span> = nc.predict(X_val)</pre></div><div id="32af"><pre><span class="hljs-attribute">nc</span> = MultinomialNB(alpha=<span class="hljs-number">0</span>.<span class="hljs-number">0001</span>,fit_prior=False) <span class="hljs-attribute">nc</span>.fit(X_train,y_train)</pre></div><div id="f6f7"><pre>pred = nc<span class="hljs-selector-class">.predict</span>(X_val) <span class="hljs-function"><span class="hljs-title">f1_score</span><span class="hljs-params">(y_val,pred,average=<span class="hljs-string">'weighted'</span>)</span></span></pre></div><p id="1627">Output —</p><div id="3c13"><pre><span class="hljs-attribute">0</span>.<span class="hljs-number">8368507601649364</span></pre></div><p id="84af"><b><i>Learnings —</i></b></p><p id="a76c">How to clean and preprocess data for language classification as well as build, train and evaluate a multinomial Naive Bayes model.</p><p id="2b93"><b><i>Day 47: Coming soon!</i></b></p><p id="a039">Follow and Stay tuned. Keep coding :)</p><h1 id="a69d">For other projects, tune to —</h1><p id="b31f"><b>Build Machine Learning Pipelines( With Code)</b></p><div id="5b37" class="link-block"> <a href="https://medium.datadriveninvestor.com/build-machine-learning-pipelines-with-code-part-1-bd3ed7152124"> <div> <div> <h2>Build Machine Learning Pipelines( With Code) — Part 1</h2> <div><h3>Complete implementation…</h3></div> <div><p>medium.datadriveninvestor.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*KdToBD8RDMBH4jXM.png)"></div> </div> </div> </a> </div><p id="946c"><b>Recurrent Neural Network with Keras</b></p><div id="607d" class="link-block"> <a href="https://medium.datadriveninvestor.com/recurrent-neural-network-with-keras-b5b5f6fe5187"> <div> <div> <h2>Recurrent Neural Network with Keras</h2> <div><h3>Project Implementation and cheatsheet…</h3></div> <div><p>medium.datadriveninvestor.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*xs3Dya3qQBx6IU7C.png)"></div> </div> </div> </a> </div><p id="56e1"><b>Clustering Geolocation Data in Python using DBSCAN and K-Means</b></p><div id="2b3e" class="link-block"> <a href="https://medium.datadriveninvestor.com/clustering-geolocation-data-in-python-using-dbscan-and-k-means-3705d9f44522"> <div> <div> <h2>Clustering Geolocation Data in Python using DBSCAN and K-Means</h2> <div><h3>Project Implementation…</h3></div> <div><p>medium.datadriveninvestor.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*0uPCZnohdaPCO4NN.png)"></div> </div> </div> </a> </div><p id="a29c"><b>Facial Expression Recognition using Keras</b></p><div id="ccaa" class="link-block"> <a href="https://medium.datadriveninvestor.com/facial-expression-recognition-using-keras-cbdd661a0a54"> <div> <div> <h2>Facial Expression Recognition using Keras</h2> <div><h3>Project Implementation…</h3></div> <div><p>medium.datadriveninvestor.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*CGch7hzdjg1fpgKy.jpg)"></div> </div> </div> </a> </div><p id="0db7"><b>Hyperparameter Tuning with Keras Tuner</b></p><div id="6dff" class="link-block"> <a href="https://medium.datadriveninvestor.com/hyperparameter-tuning-with-keras-tuner-3a609d3fd85b"> <div> <div> <h2>Hyperparameter Tuning with Keras Tuner</h2> <div><h3>Project Implementation….</h3></div> <div><p>medium.datadriveninvestor.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*jlaEz8AZaptNWHEr.png)"></div> </div> </div> </a> </div><p id="fed8"><b>Custom Layers in Keras</b></p><div id="e4fd" class="link-block"> <a href="https://medium.datadriveninvestor.com/custom-layers-in-keras-de5f793217aa"> <div> <div> <h2>Custom Layers in Keras</h2> <div><h3>Code implementation …</h3></div> <div><p>medium.datadriveninvestor.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*1IH67KJadqeqeO01.png)"></div> </div> </div> </a> </div><p id="2ea9"><b><i>That’s it fellas. Peace out and keep coding :)</i></b></p><p id="ec55">Stay Tuned and of-course let me end this post with a quote by Steve Jobs ;)</p><p id="5004" type="7">“Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it.”</p></article></body>

Day 46: 60 days of Data Science and Machine Learning Series

Language Classification…

Pic credits : ResearchGate

Welcome back peeps. In this post we are going to understand the basics of Multinomial Naive Bayes through a project.

Some of the other best Series —

30 Days of Natural Language Processing ( NLP) Series

30 days of Data Engineering with projects Series

60 days of Data Science and ML Series with projects

100 days : Your Data Science and Machine Learning Degree Series with projects

23 Data Science Techniques You Should Know

Tech Interview Series — Curated List of coding questions

Complete System Design with most popular Questions Series

Complete Data Visualization and Pre-processing Series with projects

Complete Python Series with Projects

Complete Advanced Python Series with Projects

Kaggle Best Notebooks that will teach you the most

Complete Developers Guide to Git

All the Data Science and Machine Learning Resources

210 Machine Learning Projects

30 days of Machine Learning Ops

Projects Videos —

All the projects, data structures, SQL, algorithms, system design, Data Science and ML , Data Analytics, Data Engineering, , Implemented Data Science and ML projects, Implemented Data Engineering Projects, Implemented Deep Learning Projects, Implemented Machine Learning Ops Projects, Implemented Time Series Analysis and Forecasting Projects, Implemented Applied Machine Learning Projects, Implemented Tensorflow and Keras Projects, Implemented PyTorch Projects, Implemented Scikit Learn Projects, Implemented Big Data Projects, Implemented Cloud Machine Learning Projects, Implemented Neural Networks Projects, Implemented OpenCV Projects,Complete ML Research Papers Summarized, Implemented Data Analytics projects, Implemented Data Visualization Projects, Implemented Data Mining Projects, Implemented Natural Leaning Processing Projects, MLOps and Deep Learning, Applied Machine Learning with Projects Series, PyTorch with Projects Series, Tensorflow and Keras with Projects Series, Scikit Learn Series with Projects, Time Series Analysis and Forecasting with Projects Series, ML System Design Case Studies Series videos will be published on our youtube channel ( just launched).

Subscribe today!

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

Bayes Theorem —

It’s a method which is used to determine conditional probabilities which means the probability of one event occurring given that another event has already occurred.

The conditional probability can be calculated as follows —

P(A | B) = P(A, B) / P(B)

Bayes theorem is extensively used in the medical sciences ( to calculate the likelihood ratio if some one has a disease or not). Nevertheless, it’s also used in the other applications — filtering spam.

NB classifiers are a set of classification algorithms based on Bayes’ Theorem.

A good reference to understand the vastness of Naive Bayes —

MultinomialNB is used for text classification which implements naive Bayes algorithm for multinomially distributed data . In this post we are going to implement language classification with Naive Bayes through project.

Let’s dive in!

Import Necessary Libraries

import matplotlib
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import numpy as np
import string
from collections import defaultdict
from sklearn.metrics import f1_score
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import joblib

Load the data

model = joblib.load('Path to Data/Models/final_model.joblib')
vectorizer = joblib.load('Path to Data/Vectorizers/final_model.joblib')

text = 'Its a beautiful day'
text = preprocess_function(text)
text = [split_into_subwords_function(text)]
tv =vectorizer.transform(text)
model.predict(tv)

Output —

array(['en'], dtype='<U2')

Exploratory data analysis

def open_file(filename):
    with open(filename, 'r') as f:
        data = f.readlines()
    return data
dr= dict()
dr['sk'] = open_file('Data/Sentences/train_sentences.sk')
dr['cs'] = open_file('Data/Sentences/train_sentences.cs')
dr['en'] = open_file('Data/Sentences/train_sentences.en')
def show_statistics(data):
    for language, sentences in data.items():
        
        number_of_sentences = 0
        number_of_words = 0
        number_of_unique_words = 0
        sample_extract = ''
        
        
        word_list = ' '.join(sentences).split()
        ns = len(sentences)
        nw=len(word_list)
        nu = len(set(word_list))
        se = ' '.join(sentences[0].split()[0:7])
        
        
show_statistics(dr)

Output —

Language: sk
-----------------------
Number of sentences	:	 100
Number of words		:	 2016
Number of unique words	:	 1322
Sample extract		:	 Pán de Grandes Pascual jasne vysvetlil, aká...

Language: cs
-----------------------
Number of sentences	:	 10
Number of words		:	 158
Number of unique words	:	 141
Sample extract		:	 Upozorňujeme, že jejím cílem je šetřit penězi...

Language: en
-----------------------
Number of sentences	:	 100
Number of words		:	 2381
Number of unique words	:	 1037
Sample extract		:	 I can understand your approach a little...

Data preprocessing

def preprocess(text):
   
        
    preprocessed_text = text
    preprocessed_text = text.lower().replace('-',' ')
    translation_table = str.maketrans('\n',' ',string.punctuation + string.digits)
    preprocessed_text= preprocessed_text.translate(translation_table)
        
    return preprocessed_text
data_preprocessed = {k: [preprocess(sentence) for sentence in v] for k,v in dr.items()}
show_statistics(data_preprocessed)

Output —

Language: sk
-----------------------
Number of sentences	:	 100
Number of words		:	 1996
Number of unique words	:	 1207
Sample extract		:	 pán de grandes pascual jasne vysvetlil aká...

Language: cs
-----------------------
Number of sentences	:	 10
Number of words		:	 155
Number of unique words	:	 133
Sample extract		:	 upozorňujeme že jejím cílem je šetřit penězi...

Language: en
-----------------------
Number of sentences	:	 100
Number of words		:	 2366
Number of unique words	:	 904
Sample extract		:	 i can understand your approach a little...

Vectorize

strain,y_train =[],[]
for k,v in data_preprocessed.items():
    for sentence in v:
        strain.append(sentence)
        y_train.append(k)
vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(strain)

Build Model and Training

nc = MultinomialNB()
nc.fit(X_train,y_train)

Evaluate the model

dv= dict()
dv['sk'] = open_file('Data/Sentences/val_sentences.sk')
dv['cs'] = open_file('Data/Sentences/val_sentences.cs')
dv['en'] = open_file('Data/Sentences/val_sentences.en')
data_val_preprocessed = {k: [preprocess(sentence) for sentence in v] for k,v in dv.items()}
sval,y_val =[],[]
for k,v in data_val_preprocessed.items():
    for sentence in v:
        sval.append(sentence)
        y_val.append(k)
X_val = vectorizer.transform(sval)
pred = nc.predict(X_val)
nc = MultinomialNB(alpha=0.0001,fit_prior=False)
nc.fit(X_train,y_train)
pred = nc.predict(X_val)
f1_score(y_val,pred,average='weighted')

Output —

0.8368507601649364

Learnings —

How to clean and preprocess data for language classification as well as build, train and evaluate a multinomial Naive Bayes model.

Day 47: Coming soon!

Follow and Stay tuned. Keep coding :)

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Recurrent Neural Network with Keras

Clustering Geolocation Data in Python using DBSCAN and K-Means

Facial Expression Recognition using Keras

Hyperparameter Tuning with Keras Tuner

Custom Layers in Keras

That’s it fellas. Peace out and keep coding :)

Stay Tuned and of-course let me end this post with a quote by Steve Jobs ;)

“Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it.”

Machine Learning
Artificial Intelligence
Data Science
Programming
Tech
Recommended from ReadMedium