avatarThe Scraper Guy

Summary

The web content provides a detailed guide on how to scrape Premier League football match odds from Bet365 using Selenium and Python.

Abstract

The article is a technical tutorial aimed at data scraping enthusiasts, particularly those interested in extracting betting odds from Bet365 for Premier League football matches. It outlines the necessary libraries and Selenium WebDriver setup, including the use of ChromeDriver and user agent manipulation to avoid detection as an automated script. The author guides readers through the process of accepting cookies, opening new tabs to access the data, and scraping information such as team names, match times, and odds. The scraped data is then organized into a pandas DataFrame for further analysis. The tutorial promises a follow-up article that will delve into scraping additional odds markets.

Opinions

  • The author believes that Bet365 is a challenging site to scrape due to its measures to detect and block automated scripts.
  • The use of specific user agents and disabling certain Chrome features is recommended to mimic human behavior and avoid detection.
  • Opening multiple tabs is a necessary step to access the betting odds data on Bet365, indicating the site's sophisticated defense mechanisms.
  • The author values the importance of organizing scraped data into a structured format, such as a pandas DataFrame, for ease of manipulation and verification.
  • There is an expectation that readers will gain value from the tutorial, as evidenced by the invitation to follow the author for more content and the upcoming second part of the article.

EASIEST Way to Scrape Bet365 Premier League Football Match Odds Using Selenium and Python V2

Welcome to another scraping article. Today we will be tackling the most difficult site to scrape betting odds from, Bet365. Check out my other articles and consider leaving a clap and follow if you gained some value from this.

As always, you can check out the full code here -

To begin, you need to download Chromedriver which can be found at the link below. In addition, make sure that the version you download aligns with the current version of chrome you are running.

Now beginning with the actual code, simply import the libraries.

import json
import pandas as pd
from datetime import date
import numpy as np
import datetime
from datetime import datetime
import pandas as pd
import numpy as np
import statistics
import requests #The requests library for HTTP requests in Python
from scipy import stats #The SciPy stats module
import time
from selenium import webdriver
from selenium.webdriver import ChromeOptions
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys

These are just a copy and pasted list of libraries I use for each project, we will not be using most of these in this project.

We can then move on to some functions we will need. Firstly we have our driver_code class which houses all of the settings we will use for our chromedriver instance. You will need to replace the CHROMEDRIVER PATH HERE with the location of your newly downloaded chromdriver.exe file.

def driver_code():
    Capabilities = DesiredCapabilities.CHROME
    Capabilities["pageLoadStrategy"] = "normal"
    options = ChromeOptions()

    useragentarray = [
        "Mozilla/5.0 (Linux; Android 13) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.5672.76 Mobile Safari/537.36"
    ]

    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    # options.add_argument(f"--user-data-dir=./profile{driver_num}")

    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option("useAutomationExtension", False)
    options.add_argument("disable-infobars")
    options.add_argument("disable-blink-features=AutomationControlled")

    driver = webdriver.Chrome(
        'CHROMEDRIVER PATH HERE',
        options=options,
        desired_capabilities=Capabilities,
    )
    driver.execute_script(
        "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"
    )

    driver.execute_cdp_cmd(
        "Network.setUserAgentOverride", {"userAgent": useragentarray[0]}
    )

    options.add_argument("--disable-popup-blocking")
    #     driver.execute_script(
    #         """setTimeout(() => window.location.href="https://www.bet365.com.au", 100)"""
    #     )
    driver.get("https://www.bet365.com/#/AC/B1/C1/D1002/E91422157/G40/H^1/")

    driver.set_window_size(390, 844)
    time.sleep(1)
    return driver

Moving on to the other functions, we have an accept_cookies method which is self-explanatory.

#Accept Cookies
def accept_cookies(driver):
    cookies = driver.find_elements(By.CSS_SELECTOR, ".ccm-CookieConsentPopup_Accept ")
    if(len(cookies) > 0):
        cookies[0].click()

We also have an open_tab function which simply takes the current driver and a link as arguments and opens the link in a new tab and switches the drivers main window to this new tab. This is a crucial function as we will see next.

def open_tab(driver,link):
    driver.execute_script(f"""window.open('{link}', "_blank");""")
    time.sleep(2)
    driver.switch_to.window(driver.window_handles[-1])

Now, we move onto the main code of this project. We begin by initialising our driver. We then need to open two new tabs by calling the above function twice. This is because, Bet365 is clever and wont show any of the matches or odds in the initial window we open, or even when one new tab is opened as shown below.

So we have to open two new tabs and finally we have access to the data we are looking for.

new_driver = driver_code()
open_tab(new_driver, 'https://www.bet365.com/#/AC/B1/C1/D1002/E91422157/G40/H^1/')
accept_cookies(new_driver)
time.sleep(1)
open_tab(new_driver, 'https://www.bet365.com/#/AC/B1/C1/D1002/E91422157/G40/H^1/')

We can then initialize some arrays and search for the elements we will need i.e team names, time of matches and of course the odds.

teams = []
times = []
odds = []
dates = []
teams_ = new_driver.find_elements(
                By.CSS_SELECTOR, ".rcl-ParticipantFixtureDetailsTeam_TeamName "
                 )
times_ = new_driver.find_elements(
                By.CSS_SELECTOR, ".rcl-ParticipantFixtureDetails_BookCloses "
                )
for i in teams_:
    teams.append(i.text)
for i in times_:
    times.append(i.text)

odds_ = new_driver.find_elements(
    By.CSS_SELECTOR, ".sgl-ParticipantOddsOnly80_Odds"
    )

for key in odds_:
    odds.append(key.text)

We then need to split the odds into home, away and draw odds and the teams into home and away teams, which can be done as shown below using some array slicing.

home_teams = teams[::2]
away_teams = teams[1::2]
home_odds = odds[0:len(times)]
draw_odds = odds[len(times):len(times) * 2]
away_odds = odds[len(times)*2:len(odds)]

Then, we create a dataframe and add all of our data to it, which will allow us to verify its accuracy and allow us to manipulate the data easily.

columns = ['Home Team', 'Away Team',"Home Odds","Draw Odds","Away Odds"]

# Initialize a new DataFrame with columns
new_dataframe = pd.DataFrame(columns=columns)

# Add arrays to columns
new_dataframe['Home Team'] = home_teams
new_dataframe['Away Team'] = away_teams
new_dataframe['Home Odds'] = home_odds
new_dataframe['Draw Odds'] = draw_odds
new_dataframe['Away Odds'] = away_odds

Finally we quit the driver instance and check our dataframe.

new_driver.quit()
new_dataframe

Which should look something like this.

That is all for now folks. Stay tuned because tomorrow I will be dropping a part 2 to this article which will show you how to actually click into each match and scrape the other odds markets e.g. Over 2.5 goals, Both Teams To Score etc.

Hope you enjoyed, drop a clap and follow and consider following me on X/Twitter at PaulConish. Let me know if there is anything specific you would like me to cover and see you next time.

Python
Sports
Sports Betting
Gambling
Football
Recommended from ReadMedium