avatarLynn Kwong

Summary

This context provides a guide on how to use Selenium in Docker on Linux and macOS, focusing on building custom Docker images for Selenium.

Abstract

The context discusses the importance of running web scraping or web testing using Selenium in Docker for stability and ease of delivery purposes. It highlights the challenges faced when setting up Selenium in Docker, especially for those not well-versed in Docker and DevOps techniques. The guide introduces how to start standalone Selenium in Docker on both Linux and macOS, emphasizing the need for a different Dockerfile for macOS due to the latest Mac CPUs not being supported by classical Docker images of Selenium. The context also provides a detailed walkthrough of creating a custom Docker image with Selenium using a Dockerfile, along with critical points to consider during the process.

Opinions

  • Running web scraping or testing in Docker with Selenium is beneficial for production due to its stability and ease of delivery.
  • Setting up Selenium in Docker can be challenging for those not proficient in Docker and DevOps techniques.
  • Different Dockerfiles are needed for macOS and Linux due to compatibility issues with the latest Mac CPUs and classical Docker images of Selenium.
  • Installing third-party libraries inside the Docker container is necessary for web scraping or testing.
  • The official Docker images for Selenium use a non-root user called 'seluser', requiring a switch to the root user to install libraries and then switching back.
  • Python virtual environments should be used to install libraries, as installing them system-wide may not work with the latest Docker images.
  • The environment variable PATH should be updated with the path to the virtual environment folder so the libraries can be accessed without specifying the whole path to the folder.

How to Use Selenium in Docker on Linux and macOS

Learn a trick that may save you hours of debugging time

Image by 1602904 on Pixabay

In production, we often need to run web scraping or web testing using Selenium in Docker rather than on bare metal machines for stability and ease of delivery purposes. It can be demanding to set it up if you are not very good at Docker and general DevOps techniques.

In this post, I will introduce how to start standalone Selenium in Docker on both Linux and macOS. As you will find out later, it’s needed to use a different Dockerfile for macOS because the latest Mac CPUs are not supported by classical Docker images of Selenium.

Build a custom Docker image for Selenium on Linux

There are some official Docker images released by Selenium that can be found on DockerHub. If you want to use Selenium with Chrome, you can use selenium/standalone-chrome. It’s called standalone because all Grid components are put into one Docker image and can thus be run with a single Docker container.

Normally we would need to install some third-party libraries inside the Docker container so we can run web scraping or testing in it. Let’s put the libraries in a file called requirements.txt:

selenium==4.16.0

Here selenium is the library used to automate web browser interaction from Python.

Then we need to create Dockerfile that can be used to create a custom Docker image based on the official one with the third-party libraries installed:

FROM selenium/standalone-chrome:4.16.1-20231219

# Use root to install libraries.
USER root

WORKDIR /app
COPY requirements.txt requirements.txt

# Give permissions to seluser for the folders needed.
RUN chown -R seluser:seluser /app && \
    chown -R seluser:seluser /tmp

# Create a virtual environment and install libraries there.
RUN apt-get update -yqq && \
    apt-get -yqq install python3.10-venv python3-pip && \
    python3 -m venv /venv && \
    /venv/bin/pip install -r requirements.txt && \
    rm -rf /tmp/*

# Update PATH so the libraries installed can be accessed directly.
ENV PATH="/venv/bin:$PATH"

COPY . /app

# Change back to the default seluser.
USER seluser

EXPOSE 4444

The above Dockerfile is the key to creating a custom Docker image with Selenium. Some critical points are highlighted here:

  1. In the official Docker image for Selenium, the default user is a non-root user called seluser. We need to switch to the root user to install the libraries and then switch back to seluser in the end.
  2. We need to install the libraries in a Python virtual environment. We can install them system-wide with some older images, but not with the latest ones.
  3. We need to give permission to seluser for it to access the project folder (here /app) and the /tmp folder (if you need to write to the /tmp folder).
  4. The environment variable PATH should be updated with the path to the virtual environment folder so the libraries can be accessed without specifying the whole path to the folder.
  5. We should copy the project files in the end so we don’t need to rebuild the image from scratch whenever some code is updated.

Then we can write some simple application code that will run inside the Docker container for Selenium:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument("--headless")

driver = webdriver.Chrome(options)
driver.get("https://superdataminer.com")
print(driver.title)

More examples of using Selenium can be found in the references of this post.

Now let’s build the custom Docker image and run the above script in it:

docker build -t custom-chrome-selenium:latest .

docker run --rm -it custom-chrome-selenium:latest python scrape.py

If the above commands are run on a modern Linux machine, you will see they can work properly. However, if you run them on a Mac, they will fail saying that Chrome has crashed, with the following error:

selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
  (session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

Let’s fix it on Mac.

Build a custom Docker image for Selenium on macOS

To make it work on Mac, we cannot use the official Docker image for Selenium because it’s not supported on macOS yet. Fortunately, some brilliant developers have created another Docker image that can work properly on macOS which will be used as the source Docker image in our example:

# Only the source image needs to be changed.
FROM seleniarm/standalone-chromium:4.10.0-20230926

# Use root to install libraries.
USER root

WORKDIR /app
COPY requirements.txt requirements.txt

# Give permissions to seluser for the folders needed.
RUN chown -R seluser:seluser /app && \
    chown -R seluser:seluser /tmp

# Create a virtual environment and install libraries there.
RUN apt-get update -yqq && \
    apt-get -yqq install python3.11-venv python3-pip && \
    python3 -m venv /venv && \
    /venv/bin/pip install -r requirements.txt && \
    rm -rf /tmp/*

# Update PATH so the libraries installed can be accessed directly.
ENV PATH="/venv/bin:$PATH"

COPY . /app

# Change back to the default seluser.
USER seluser

EXPOSE 4444

Only the source Docker image needs to be changed in the above Dockerfile. Besides, Python 3.11 is used in this Docker image rather than Python 3.10 at the time of writing. Therefore, it should be changed to python3.11-venv in Dockerfile, otherwise, it won’t work.

Now if you run the same commands above, everything should work properly.

In this post, we have introduced the procedures to build custom Docker images for Selenium on Linux and macOS, which can then be run on systems using Docker containers like Kubernetes or Airflow. Hopefully, it can be helpful for your work.

Related posts:

Python
Selenium
Docker
DevOps
Scraping
Recommended from ReadMedium