Day 8: 30 days of Natural Language Processing Series with Projects
NLTK …Part 1

Welcome back peeps. I hope all’s well at your end. For me, last few weeks have been crazy busy and exhausting at work ( and lots of travel). Anyways, let’s hop on to the Day 8 of NLP series with projects. In this post we will be starting with NLTK.
Some of the other best Series —
100 days : Your Data Science and Machine Learning Degree Series with projects
Complete Data Visualization and Pre-processing Series with projects
Projects Videos —
All the projects, data structures, SQL, algorithms, system design, Data Science and ML , Data Analytics, Data Engineering, , Implemented Data Science and ML projects, Implemented Data Engineering Projects, Implemented Deep Learning Projects, Implemented Machine Learning Ops Projects, Implemented Time Series Analysis and Forecasting Projects, Implemented Applied Machine Learning Projects, Implemented Tensorflow and Keras Projects, Implemented PyTorch Projects, Implemented Scikit Learn Projects, Implemented Big Data Projects, Implemented Cloud Machine Learning Projects, Implemented Neural Networks Projects, Implemented OpenCV Projects,Complete ML Research Papers Summarized, Implemented Data Analytics projects, Implemented Data Visualization Projects, Implemented Data Mining Projects, Implemented Natural Leaning Processing Projects, MLOps and Deep Learning, Applied Machine Learning with Projects Series, PyTorch with Projects Series, Tensorflow and Keras with Projects Series, Scikit Learn Series with Projects, Time Series Analysis and Forecasting with Projects Series, ML System Design Case Studies Series videos will be published on our youtube channel ( just launched).
Subscribe today!
Tech Newsletter —
If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :
What is NLTK?
Natural Language Toolkit (NLTK ) is an amazing library for working in linguistics, natural language using Python which lets you analyze linguistic structure, classification, tokenization, stemming, tagging, parsing, and semantic reasoning, corpora, categorizing text etc.

NLTK lets you do following tasks —
- Relationship Extraction
- Sentiment Analysis
- Speech Recognition and Translation
- Topic Segmentation
- Automatic Summarization
- Named Entity Recognition
- Relationship Extraction: This is the process of identifying and extracting relationships between entities in a text, such as the relationship between two people or a person and an organization. This can be useful for tasks such as information extraction, question answering, and knowledge base construction.
- Sentiment Analysis: This is the process of determining the sentiment or emotion expressed in a piece of text, such as positive, negative, or neutral. It can be used to analyze customer reviews, social media posts, and other forms of text to understand public opinion or customer sentiment.
- Speech Recognition and Translation: This is the process of converting speech to text and then translating that text into another language. This can be used for tasks such as speech-to-text dictation, voice commands, and multilingual communication.
- Topic Segmentation: This is the process of dividing a text into different segments, each of which covers a specific topic or theme. This can be useful for tasks such as document summarization, text classification, and information retrieval.
- Automatic Summarization: This is the process of generating a shorter version of a text that retains the most important information. This can be used for tasks such as news summarization, document summarization, and text summarization.
- Named Entities Recognition (NER): This is the process of identifying and classifying named entities, such as people, organizations, and locations, in a text. NER can be used in a variety of applications, such as information extraction, question answering, and text classification.
Let’s get started with NLTK —
To import NLTK
import nltkTokenization
Tokenization is the process to break the text into smaller components/units called as tokens.
from nltk.tokenize import word_tokenize
t = "NewYork is the best city in the world"
tn = word_tokenize(t)To split by space —
text.split(" ")To tokenize a sentence —
nltk.sent_tokenize(doc)Lemmatization & Stemming
Lemmatization and Stemming is used to get the root forms of the text.
Stemming —
from nltk.stem import PorterStemmer
tn = ["NewYork", "is", "a", "beautiful", "city"]
s = PorterStemmer()
sm = [stemmer.stem(tk) for tk in tn] # To create list of stemsLemmatization —
from nltk.stem import WordNetLemmatizer
tn = ["NewYork", "is", "a", "beautiful", "city"]
l = WordNetLemmatizer()
lm = [l.lemmatize(tk) for tk in tn]Stopwords
These are non-content words meant for only grammatical purpose in the text.
from nltk.corpus import stopwords
stp = set(stopwords.words('english'))Parts of Speech
POS is used to assign part of speech to every unit/string in the text.
nltk.pos_tag(words)To work with files
f = open(‘myfile.txt’)
t =file.read();
tn = nltk.word_tokenize(t)
text = nltk.Text(tn)Day 9 and Part 2 of this series : coming soon!
For Complete Data Science and Machine Learning with projects series —
Follow for more updates, stay tuned and of-course let me end this post with a quote by Steve Jobs ;)
“Your time is limited, so don’t waste it living someone else’s life.”
For other projects, tune to —
Build Machine Learning Pipelines( With Code)
Recurrent Neural Network with Keras
Clustering Geolocation Data in Python using DBSCAN and K-Means
Facial Expression Recognition using Keras
Hyperparameter Tuning with Keras Tuner
Custom Layers in Keras






