NLP: Text Part Of Speech Tagging
How Part Of Speech Works Can Be Implemented In Python
Part Of Speech (PoS) is a useful technique that is used in the NLP projects. This article focuses on providing an overview of the PoS and how we can implement it in Python.
What Is Part Of Speech (PoS)?
Each language is made up of a number of parts of speech such as verbs, nouns, adverbs, adjectives and so on.
PoS is all about tagging (assigning) language-specific parts of a speech on a text.
NLTK is a fantastic library to support your NLP project. It provides a number of tagging models. The default tagging model is the maxent_treebank_pos_tagger. This tagger relies on the Penn Tree bank corpus. Essentially each sentence S can be composed of a noun (NP), verb (VP) and the full stop.
There are a large number of PoS taggers available such as: maxent_treebank_pos_tagger, HiddenMarkovModelTagger, PerceptronTagger and StanfordPOSTagger.
This example illustrates how we can use the PoS functionality:
from nltk import chunk
text = 'where are you going'
words = nltk.word_tokenize(text)
tags = nltk.pos_tag(words)
print(tags)
#where = R
#going = V
#you = N
..etcWhen the tags are returned, we can use following command to find more information about it:
nltk.help.upenn_tagset('N') #tells us N is a nounThe common tags are:
J is an Adjective, N is a noun, V is a verb and R is an adverb.Summary
Part Of Speech (PoS) is a useful technique that is used in the NLP projects. This article focused on providing an overview of the PoS and how we can use it in Python.
We can combine it with Lemmatisation and Stemming to help process the text better.
