avatarCoşkun Deniz

Summary

This article discusses the architecture of Selenium applications, including the primary building blocks, types of communication, and steps of communication.

Abstract

The article begins with an overview of the primary building blocks of Selenium applications, which include Selenium Client Libraries, JSON Wire Protocol, Browser Drivers, and Browsers. It then delves into the two types of communication used by Selenium WebDriver: direct and remote. Direct communication involves WebDriver talking to a browser through a driver, while remote communication involves Selenium Server or RemoteWebDriver. The article also explains the steps of communication, including launching a browser-specific server, creating an HTTP request, sending the request to the browser driver, executing the command on the browser, and returning the response to the automation script.

Bullet points

  • Primary building blocks of Selenium applications include Selenium Client Libraries, JSON Wire Protocol, Browser Drivers, and Browsers.
  • Selenium WebDriver talks to browsers in two ways: direct and remote communication.
  • Direct communication involves WebDriver talking to a browser through a driver.
  • Remote communication involves Selenium Server or RemoteWebDriver.
  • Steps of communication include launching a browser-specific server, creating an HTTP request, sending the request to the browser driver, executing the command on the browser, and returning the response to the automation script.
  • Selenium WebDriver has one job: communicate with the browser.
  • The article provides a more detailed explanation with an example of how communication works.
  • Things to remember include the ability to write automation scripts in different languages, the two types of communication used by Selenium WebDriver, and the fact that Selenium WebDriver only communicates with the browser.

Browser Automation with Python and Selenium — 3: Architecture

Bird’s eye view of Selenium applications

Photo by LSE Library on Unsplash

In the previous post, we looked at a simple, complete example. I will try to explain the high-level architecture of Selenium applications in this post.

Primary Building Blocks

  • Selenium Client Libraries: Selenium Client Libraries or Selenium Language Bindings allow us to write automation scripts in the language of our choice like Python, Ruby, Java, etc.
  • JSON Wire Protocol: JSON Wire Protocol is a REST-based API that is responsible for communication between the selenium scripts and the browser drivers.
  • Browser Drivers: Selenium script communicates with the actual browser through the related browser driver. Browser drivers are responsible for controlling the real browser. The driver runs on the same system as the browser. Selenium framework does not need to know the implementation details of different web browsers thanks to drivers.
  • Browsers: These are the actual browsers on which the desired automation tasks are performed. They receive the command from the driver and call the respective method to accomplish the task. After executing the command they return the response through the same route.
Drawing by the author with Inkscape

Types of Communication

Selenium WebDriver talks to browsers in 2 different ways: direct and remote.

1. Direct Communication

WebDriver talks to a browser through a driver. WebDriver, browser driver, and the real browser are on the same system in this type of communication. WebDriver passes commands and receives responses through the same route.

Drawing by the author with Inkscape

2. Remote Communication

Communication can also be remote through Selenium Server or RemoteWebDriver. In this case, RemoteWebDriver runs on the same system as the driver and the browser. Commands are sent and received over the remote webdriver.

Drawing by the author with Inkscape

Another way of remote communication is through Selenium Server or Selenium Grid components. They talk to the browser driver on the host system.

Drawing by the author with Inkscape

As stated in the official documentation

“WebDriver has one job and one job only: communicate with the browser via any of the methods above.”

Steps of Communication

driver = Chrome()
driver.get("https://www.python.org")

What happens when the above code snippet is executed?

  • Selenium WebDriver launches browser-specific server first.
  • An HTTP request is created and sent to the browser driver(in this case chromedriver) using JSON Wire Protocol over HTTP.
  • The browser driver receives the HTTP request.
  • The browser driver sends the request to the real browser(Chrome) via the HTTP server.
  • The command is executed on the browser.
  • The response is sent back to the browser driver by the browser which is sent back to the automation script finally.

More detailed explanation with an example

  1. In this step, we are creating a Firefox WebDriver instance. Through this instance, we will make api calls to run some actions on the real browser.

2. After creating a driver instance, we are making the get api call.

  • First, an HTTP request is created to send to the browser driver.
  • Browser driver which is the geckodriver in this case receives the HTTP request through the HTTP server.
  • Selenium WebDriver API will send the command taken from language level binding to the browser driver with the help of JSON Wire Protocol.
  • This get method makes a POST request to /session/:sessionId/url endpoint that will instruct the browser to open the given url.
  • The browser driver sends the request to the real browser(Firefox) via the HTTP server.
  • The response is sent to the browser driver by the browser and then to the automation script with the same route.

3. In this step, we are querying the title of the page with the title property of the webdriver interface.

4. In the last step, we are quitting the browser by calling the quit method of the webdriver interface.

  • This method makes a DELETE request to /session/:sessionId endpoint with the steps mentioned in item 2.

Things to Remember

  • Selenium Client Libraries or Selenium Language Bindings allow us to write automation scripts in different languages. They send commands to perform some actions on the real browser through browser drivers using the JSON Wire Protocol.
  • Selenium WebDriver talks to browsers in 2 different ways: direct and remote communication.
  • WebDriver has one job and one job only: communicate with the browser.

In the next post, I will write about locating elements on a web page to interact with them.

References

  1. https://www.selenium.dev/documentation/en/webdriver/understanding_the_components/
  2. https://www.lambdatest.com/blog/automated-browser-testing-with-opera-and-selenium-in-python/
  3. https://w3c.github.io/webdriver/#endpoints
  4. https://artoftesting.com/selenium-webdriver-architecture
  5. https://www.journaldev.com/25698/selenium-webdriver-architecture
  6. http://makeseleniumeasy.com/2017/03/03/architecture-of-selenium-webdriver/

Thank you for your time.

Selenium
Python
Automation
Programming
Technology
Recommended from ReadMedium