The Beginner’s Guide to Summarize Articles Using NLP.
Summarize text using this simple technique.
Hello everyone! With the overload of information available today I always wished for a way I can read just the summary instead of going through pages after pages of an article .This lead me to learn how to summarize an article and extract the most relevant information out of it. Now auto summarization is not limited to just articles ,it can be used for anything from an article , newsletters , legal documents , social media marketing content and the list goes on.
There are two ways to summarize data in natural processing language :
- Extraction-based summarization :Here we extract key phrases and create a summary without adding any extra information.
- Abstraction-based summarization: In this type of summarization we create new phrases paraphrasing the original source. It is most common approach when it comes to auto summarization.
Auto summarization using natural processing language can be applied to a variety of documents e.g., URL , emails , PDF files , text files etc. It can be done for a single document or multiple documents together.
We will see how to do extraction-based summarization using natural procession language.
Here is the input data I used : https://github.com/poonam-ydv/Auto-Summarization---NLP/blob/main/Auto-Sum.txt
The following is an explanation of the code behind the extraction summarization technique:
Step 1: Installing Spacy and Scikit-learn libraries required for the task.


Step 2: Load the general-purpose spacy model in English and then open the file that contains the text(Auto-Sum.txt in this case)to be summarized. Then apply the pipeline to the loaded text.




Step 3: In the following step we remove the stop words from the file and create a dictionary of works with their respective frequencies.

Step 4: This is where I determine the relevance of the sentences .We measure it based on the cumulative frequency of their words.

Step 5: In last step We now have ranked all sentences in the article in order of importance. We can now extract the top N (say 10) sentences to create a summary.

There are a lot of other ways we can do this . This article is meant to develop an understanding the basics of auto summarization.
Feel free to get in touch with me for any questions 👇
