How to Scrape Websites Requiring Login with Python
It’s Easy!
This story follows the Web Scraping series. If you have missed the last story, you can find it here:
There is also a GitHub repo associated with this series if you want to find code examples: Web Scraping Series
When scraping websites, you’ll sometimes have to log in to access the data you want. But sometimes, it can be a bit tricky to log in and keep track of your cookies through all the requests you want to make.
That’s why I’ll show you two ways to easily scrape websites requiring authentication.
To understand correctly this article, you must have read the two previous ones because I won’t re-explain everything.
1. Requests
My favorite way of doing this is using requests. It’s the most convenient way, but it can be only used when the content you want to scrape is static. Else, you’ll have to use Selenium.
For the example, we’ll try to log in to the following website: https://www.pianostreet.com/.
First, we want to get to the login page: https://www.pianostreet.com/amember/login/.
Then, we have to understand the page a little bit. Let’s use “Inspect” to open the developer tools!

Most of the time, the login pages are built using forms. Here we can find the form easily in the elements.
We’re looking for the “method” and “action” attributes. Here, “method” is “post” and “action” is “/amember/login”. It means we can log in by sending a POST request to “https://www.pianostreet.com/amember/login”.
Then we want to find the name of the fields, which will allow us to build our payload.

The Username field is named “amember_login” and the Password field is named “amember_pass”. So our payload will look like this:
{
"amember_login": "user",
"amember_pass": "pass",
}We know have everything to log in to this website using requests !
But wait! What if the login page is more complicated, or if it’s not a form? In this case, we can open the “Network” tools, and log in to see what happens.

That’s a lot of requests… The one interesting us is “login”. Sometimes, it won’t be called “login” so you’ll have to search among all the requests.

We can find a lot of information to try to reproduce the request in Python. In this example, the payload isn’t visible, but sometimes it will be visible and you will be able to understand how it is done. Example:

Well, here it’s not easy to understand even if we can see the payload. So let’s look into the page:

The form has an “onsubmit” attribute. So it executes a JS function before sending the request. By looking through the page source we can find this function and try to reproduce it in Python.
It’s a bit harder to do but it’s still doable!
Let’s get back to our previous website. We can now write the code to log in with requests:






