How to Extract Bonds Information on Markets Insider via Python? Part 2
Previously on the How to Extract Corporate Bonds Information on Markets Insider via Python? Part 1, we discussed how to extract bond information and store it into SQLite DB. In this article, we will further discuss on how to extract hyper link for each item so that we can enter it to pull in-depth information.

We will leverage BeautifulSoup to parse the response and grab the hyper link information. Let’s take a page as an example:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://markets.businessinsider.com/bonds/finder?p=1&borrower=&maturity=midterm&yield=0&bondtype=2%2C3%2C4%2C16&coupon=0¤cy=333&rating=&country=18'
df = pd.read_html(url)[0]
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')We can find the target information is located under <table> -> <tr> -> <td>-><a> => href.
<table class="table">
...
<tr class="table__tr">
<td class="table__td">
<a href="/bonds/dl-inflation-prot_secs_1828-Bond-2028-us912828y388">
United States of America</a>
</td>
</tr>
As a result, we can parse it based on this structure to extract the link values as follows.
table = soup.find('table')
links = []
for tr in table.findAll("tr"):
trs = tr.findAll("td")
for each in trs:
try:
link = each.find('a')['href']
links.append(link)
except:
passThen, we can attach this information to the Dataframe we created via Pandas (read_html).
df['Link'] = linksIn addition, we add one more part to break the loop if the pulled data frame is blank so we don’t need to wait till the script run through all the unnecessary steps.
if (len(df) == 0):
breakHere is the full scripts for your reference.
from datetime import date
from datetime import timedelta
import pandas as pd
import requests
import sqlite3
from google.colab import drive
from bs4 import BeautifulSoup
con = sqlite3.connect('/content/drive/MyDrive/data/Stock.db')
for i in range(1, 100):
url = 'https://markets.businessinsider.com/bonds/finder?p='+ str(i) +'&borrower=&maturity=midterm&yield=5&bondtype=6%2C7%2C8%2C19&coupon=5¤cy=333&rating=&country=18'
df = pd.read_html(url)[0]
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
links = []
for tr in table.findAll("tr"):
trs = tr.findAll("td")
for each in trs:
try:
link = each.find('a')['href']
links.append(link)
except:
pass
df['Link'] = links
df['As_Of'] = today
if (len(df) == 0):
break
df.to_sql('Corporate_Bond_Markets_Insider_2', con, if_exists='append')
con.close()
Thank you. In the next article, we will show how to use this hyper link to grab more detailed information for the attracting bond candidates. More to come!
If you want to support Informula, you can buy us a coffee here :)
