Sentiment Analysis of Stock Market in Python (Part 1)- Web Scraping Financial News

Stock market sentiments can be valuable info that hints at future price action. Many often stock investors react to the market sentiments in making their decision to buy or sell their assets. Hence, stock sentiment analysis has become a popular and useful technique to gauge the investors’ opinions of a specific stock and plan for an investment strategy.
One direct way to understand market sentiments is by following and reading the news on daily basis. However, this can be quite a tedious process. Here, we are going to explore how can we use Python to perform the stock sentiment analysis for us.
We will break this sentiment analysis process into two main parts:
- Web scraping financial news and preprocessing the text data
- Calculating sentiment score and visualization (Presented in Part 2 Article)
In this article, we will only focus on the first part and the second part will be presented in another article.
Disclaimer: The writing of this article is only aimed at demonstrating the steps to perform stock market sentiment analysis in Python. It doesn’t serve any purpose of promoting any stock or giving any specific investment advice.
Prerequisite Python Packages
- BeautifulSoup — https://pypi.org/project/beautifulsoup4/
- Pandas — https://pandas.pydata.org/
- NLTK — https://pypi.org/project/nltk/ (Will be used in the Part 2 Article)
Github
The original full source codes presented in this article are available on my Github Repo. Feel free to download it (SentimentAnalysis_part1.py) if you wish to use it to follow my article.
Web Scraping Financial News
1. Identifying sources of financial news
Firstly, we need to identify the source of the financial news where we would like to gather the sentiment data. There are many potential sources such as Google Finance, Yahoo Finance, FINVIZ, MarketWatch, etc.
In this article, we are going to gather our sentiment data from Financial Modeling Prep (FMP).

FMP offers us clean and well structured financial information. We can simply type a ticker symbol “AAPL” in the search bar at the top-left corner to search for further details of Apple stock.

The search will lead us to a financial summary page as below.

If we look at the bottom part of the financial summary page of AAPL, there is a list of the latest AAPL news.

The news is the sources of our sentiments that we will extract for sentiment analysis using Python.
2. Examining HTML Structure of Web Page
To extract the financial news, we will first need to examine the HTML structure of the page. HTML is a markup language that lays down the structure of a webpage.
We can right-click on the web page and click “Inspect” to view the HTML codes. (This step is done by presuming we are using Google Chrome).

We shall see the HTML codes that render our web page as below.

We can traverse through the HTML tags to hunt for the tag that is responsible to render the news content. We do it by placing our mouse cursor on each of the tags (e.g. div) and examine the highlighted area of the webpage. Besides, we can also click on the “triangle” shape button to expand the HTML tags.
We will find that the news contents are rendered by a “div” with a class name “articles”.

If we try to expand the div class= “article” further, we shall see the news contents are wrapped inside an anchor tag <a> with a class name, article-item. Inside the anchor tag, the news’ title, date and text are marked up by h4, h5 and p tags, respectively.

A similar HTML structure, as shown above, is repetitive for all the news on the web page.
Our next task is to use Python to perform the web scraping on the financial news page.
3. Extracting HTML contents
Now, we are going to use Python to extract the content of the financial web page. To do so, let us examine the URL of the financial web page again. We can see the URL can be split into two components: a static base URL, and a ticker.

Based on this observation, we can generate a dynamic link to the FMP financial page for different tickers.









