avatarEsteban Thilliez

Summary

The web content provides a tutorial on using Selenium with Python for web scraping, focusing on dynamic content extraction and interaction with web elements.

Abstract

The article is part of a series on web scraping with Python, specifically addressing the extraction of dynamic content from websites that require user interactions, such as clicking buttons or entering data. It introduces Selenium as a tool for automating browser actions, demonstrates how to set up and use Selenium with Python, and explains how to locate and interact with web elements using various methods like XPath and CSS selectors. The tutorial also covers the importance of waiting for elements to load and suggests combining Selenium with BeautifulSoup for more powerful scraping capabilities. The author, Esteban Thilliez, provides code examples and links to further resources, encouraging readers to follow his Medium account for more Python-related content.

Opinions

  • The author believes that web scraping static content is limited and that handling dynamic content is essential for comprehensive data extraction.
  • Selenium is presented as a versatile tool for web automation, capable of simulating user actions in a web browser.
  • The author emphasizes the ease of integrating Selenium with Python and the convenience of using Selenium with common web browsers without specifying their paths.
  • The article suggests that waiting mechanisms are crucial in web scraping to ensure elements are fully loaded before interacting with them.
  • Combining Selenium with BeautifulSoup is recommended for efficient web scraping, leveraging the strengths of both tools.
  • The author acknowledges that there are more complex aspects of Selenium not covered in the article, implying they may be less commonly needed.
  • The author invites engagement by asking readers to clap, comment, and follow for more content, indicating a desire to build a community around their work.
  • A cost-effective AI service, ZAI.chat, is recommended as an alternative to ChatGPT Plus, suggesting the author's endorsement of the service for similar performance at a lower price.

Web Scraping with Python — 2. Dynamic Content

Photo by Luca Bravo on Unsplash

This story follows the Web Scraping series. If you have missed the last story, you can find it here:

There is also a GitHub repo associated with this series if you want to find code examples: Web Scraping Series

In the last story, we’ve seen how you can easily scrap static content. But it’s a limited approach because sometimes, you will have to scrap websites requiring interactions, such as clicking on buttons, keyboard entries, etc…

The content you can get from these actions is called dynamic content, and it’s usually content generated by JavaScript or PHP scripts.

Selenium

Selenium is a bundle of several tools used for web automation projects. It has a Python implementation. Let’s install it now:

pip install selenium

As Selenium simulates user actions, it works directly through the browser. So, you need to download a web driver you can use with your browser. For example, if you’re on Chrome, you can download it here: https://chromedriver.chromium.org/downloads (choose the version corresponding to your Chrome version).

If you’re not on Chrome, just download the web driver corresponding to your web browser.

Launching the Driver

Now, you can open a Python project, and we’ll start to configure the driver. Let’s start with importing the web driver, and configuring it:

As I use Brave, I need to specify its location. If you use Chrome, Firefox, or any common web browser, you don’t need to do this. Also, as my web driver is on the same path as my script, I don’t especially need to specify its location. So, with Chrome, I could have just done this:

Now, I can launch the driver with an URL using driver.get(url) .

Find Elements

Selenium provides two methods to find elements: either webdriver.find_element for a single element or webdriver.find_elements for a list of all the elements.

The two parameters we can use with these methods are by and value :

  • by specifies the method used to find the element. It can be either By.ID , By.CLASS_NAME , By.XPATH , By.CSS_SELECTOR , etc…
  • value specifies the value used by by .

For example:

I won’t explain the XPath or CSS syntaxes, you can find the XPath syntax here and the CSS syntax here.

Interacting with the Elements

We have several ways to interact with elements. Perhaps we want to retrieve their content, their attributes, or execute actions with them.

To retrieve their content, as seen above, you can use element.text . To retrieve attributes, you use instead element.get_attribute(attribute) .

Then, you can also execute actions such as clicking on a button, or on a link, sending keys to a search bar, etc…

Waiting

Sometimes, you will need to wait before scraping content or executing actions. For example, if you click on a button, perhaps you have to wait 3 seconds before anything happens.

You have two main ways to wait using Selenium:

  • You can wait a predefined time.
  • You can wait until an element is present on the page.

Selenium with BeautifulSoup

A powerful way to web scrape is to combine Selenium with BeautifulSoup. You can do it easily as you can extract the page’s source code of the web driver with an attribute.

Then, you just have to initialize a soup and do it as we’ve done in the previous story:

Final Note

Now, you know most of the things you can do with Selenium. There are still other things, but they’re a bit complex, and not so useful, so I won’t talk about them.

To find the other stories of this series, check this: Web Scraping with Python.

To explore more of my Python stories, click here! You can also access all my content by checking this page.

If you liked the story, don’t forget to clap, comment, and maybe follow me if you want to explore more of my content :)

You can also subscribe to me via email to be notified every time I publish a new story, just click here!

If you’re not subscribed to Medium yet and wish to support me or get access to all my stories, you can use my link:

Web Scraping
Python
Web
Programming
Coding
Recommended from ReadMedium