Summary

The web content introduces Day 8 of a 30-day Natural Language Processing (NLP) series with a focus on NLTK, providing an overview of its capabilities, and announces upcoming projects and resources.

Abstract

The article is part of a "30 days of Natural Language Processing (NLP) Series" and marks the commencement of exploring the Natural Language Toolkit (NLTK) library in Python. It acknowledges the author's recent busy schedule and sets the stage for delving into NLTK's functionalities, such as tokenization, lemmatization, stemming, and handling stopwords, as well as more advanced tasks like relationship extraction, sentiment analysis, and named entity recognition. The post also highlights a range of other educational series and projects available on various topics, including data science, machine learning, data engineering, and system design. It emphasizes the practical aspect of these series by pointing to implemented projects and coding exercises. Additionally, the author announces the launch of a YouTube channel, Ignito, which will feature videos on these projects and coding exercises. The article concludes by inviting readers to subscribe to a tech newsletter for further insights and tips on tech interviews, machine learning, data science, and more, and ends with an inspirational quote from Steve Jobs.

Opinions

The author expresses excitement about the launch of their YouTube channel, Ignito, which is dedicated to covering all the projects and coding exercises related to their educational series.
There is an emphasis on the practical application of the concepts taught in the series, with a focus on real-world projects and their implementation.
The author believes in the importance of continuous learning and staying updated in the tech field, as evidenced by the invitation to join the tech newsletter and the quote by Steve Jobs about the value of time and living authentically.
The article conveys a sense of community and shared learning journey, with the author encouraging readers to follow for more updates and to stay tuned for upcoming content.
There is a clear endorsement of NLTK as an "amazing library" for linguistics and natural language processing tasks, suggesting that the author holds this tool in high regard for those working in the field.

Day 8: 30 days of Natural Language Processing Series with Projects

NLTK …Part 1

Welcome back peeps. I hope all’s well at your end. For me, last few weeks have been crazy busy and exhausting at work ( and lots of travel). Anyways, let’s hop on to the Day 8 of NLP series with projects. In this post we will be starting with NLTK.

Some of the other best Series —

30 Days of Natural Language Processing ( NLP) Series

30 days of Data Engineering with projects Series

60 days of Data Science and ML Series with projects

100 days : Your Data Science and Machine Learning Degree Series with projects

23 Data Science Techniques You Should Know

Tech Interview Series — Curated List of coding questions

Complete System Design with most popular Questions Series

Complete Data Visualization and Pre-processing Series with projects

Complete Python Series with Projects

Complete Advanced Python Series with Projects

Kaggle Best Notebooks that will teach you the most

Complete Developers Guide to Git

All the Data Science and Machine Learning Resources

210 Machine Learning Projects

30 days of Machine Learning Ops

Projects Videos —

Subscribe today!

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

www.youtube.com

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication. Launched 7 months…

naina0405.substack.com

What is NLTK?

Natural Language Toolkit (NLTK ) is an amazing library for working in linguistics, natural language using Python which lets you analyze linguistic structure, classification, tokenization, stemming, tagging, parsing, and semantic reasoning, corpora, categorizing text etc.

NLTK lets you do following tasks —

Relationship Extraction
Sentiment Analysis
Speech Recognition and Translation
Topic Segmentation
Automatic Summarization
Named Entity Recognition

Relationship Extraction: This is the process of identifying and extracting relationships between entities in a text, such as the relationship between two people or a person and an organization. This can be useful for tasks such as information extraction, question answering, and knowledge base construction.
Sentiment Analysis: This is the process of determining the sentiment or emotion expressed in a piece of text, such as positive, negative, or neutral. It can be used to analyze customer reviews, social media posts, and other forms of text to understand public opinion or customer sentiment.
Speech Recognition and Translation: This is the process of converting speech to text and then translating that text into another language. This can be used for tasks such as speech-to-text dictation, voice commands, and multilingual communication.
Topic Segmentation: This is the process of dividing a text into different segments, each of which covers a specific topic or theme. This can be useful for tasks such as document summarization, text classification, and information retrieval.
Automatic Summarization: This is the process of generating a shorter version of a text that retains the most important information. This can be used for tasks such as news summarization, document summarization, and text summarization.
Named Entities Recognition (NER): This is the process of identifying and classifying named entities, such as people, organizations, and locations, in a text. NER can be used in a variety of applications, such as information extraction, question answering, and text classification.

Let’s get started with NLTK —

To import NLTK

import nltk

Tokenization

Tokenization is the process to break the text into smaller components/units called as tokens.

from nltk.tokenize import word_tokenize
 
t = "NewYork is the best city in the world"
tn = word_tokenize(t)

To split by space —

text.split(" ")

To tokenize a sentence —

nltk.sent_tokenize(doc)

Lemmatization & Stemming

Lemmatization and Stemming is used to get the root forms of the text.

Stemming —

from nltk.stem import PorterStemmer
 
tn = ["NewYork", "is", "a", "beautiful", "city"]
 
s = PorterStemmer()
sm = [stemmer.stem(tk) for tk in tn] # To create list of stems

Lemmatization —

from nltk.stem import WordNetLemmatizer
 
tn = ["NewYork", "is", "a", "beautiful", "city"]
 
l = WordNetLemmatizer()
lm = [l.lemmatize(tk) for tk in tn]

Stopwords

These are non-content words meant for only grammatical purpose in the text.

from nltk.corpus import stopwords 
 
stp = set(stopwords.words('english'))

Parts of Speech

POS is used to assign part of speech to every unit/string in the text.

nltk.pos_tag(words)

To work with files

f = open(‘myfile.txt’)
t =file.read();
tn = nltk.word_tokenize(t)
text = nltk.Text(tn)

Day 9 and Part 2 of this series : coming soon!

For Complete Data Science and Machine Learning with projects series —

Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Connect the ML dots…

medium.com

Follow for more updates, stay tuned and of-course let me end this post with a quote by Steve Jobs ;)

“Your time is limited, so don’t waste it living someone else’s life.”

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

medium.datadriveninvestor.com

Recurrent Neural Network with Keras

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

medium.datadriveninvestor.com

Clustering Geolocation Data in Python using DBSCAN and K-Means

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

medium.datadriveninvestor.com

Facial Expression Recognition using Keras

Facial Expression Recognition using Keras

Project Implementation…

medium.datadriveninvestor.com

Hyperparameter Tuning with Keras Tuner

Hyperparameter Tuning with Keras Tuner

Project Implementation….

medium.datadriveninvestor.com

Custom Layers in Keras

Custom Layers in Keras

Code implementation …

medium.datadriveninvestor.com

Day 8: 30 days of Natural Language Processing Series with Projects

NLTK …Part 1

Some of the other best Series —

Projects Videos —

Ignito

Excited to share that we have launched our Youtube channel — Ignito to cover all the projects and coding exercise for …

Tech Newsletter —

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication. Launched 7 months…

What is NLTK?

Tokenization

Lemmat­ization & Stemming

Stopwords

Parts of Speech

To work with files

Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Connect the ML dots…

For other projects, tune to —

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

Facial Expression Recognition using Keras

Project Implementation…

Hyperparameter Tuning with Keras Tuner

Project Implementation….

Custom Layers in Keras

Code implementation …

Lemmatization & Stemming