Python Concurrency
Concurrent Web Requests with Aiohttp: Get More Done in Less Time
Discovering Aiohttp for Faster, Concurrent Web Requests
Are you tired?
Tired of waiting for your requests to complete one by one.
Being stuck. Waiting. Only to be met and overwhelmed with the feeling of frustration and disappointment when the request finally times out? Have you tried using async/await everywhere, only to find out that most libraries are blocking anyway?
Fear not, as the answer to your problems lies in Aiohttp.
In the following sections, we will explore the beautifully concurrent world of Aiohttp. A popular asynchronous HTTP client/server library for Python.
We will discover how to make non-blocking web requests that run concurrently and improve the application's performance. By the end of this blog post, you’ll not only be armed with the knowledge of how to use Aiohttp but also how exceptions can be handled or asynchronous context managers work.
So don’t go anywhere, take a seat, fire up your IDE, and let’s get started.
No more Blocking: Introducing Aiohttp
It’s all about concurrency. Allowing multiple tasks to be executed simultaneously. That’s why asynchronous programming and libraries like Python’s Asyncio exist in the first place.
However, one of the most common mistakes we tend to make (Yep, I did it too) is to apply the async/await syntax to every line of code we can get our hands on and hope for the best.
Well, most of the time the best is — nothing. Nothing happens at all. No concurrency. No sweet performance gains. But why?
Unfortunately, most libraries are blocking, meaning that they will block the main thread and event loop, rendering async/await basically ineffective. This is where non-blocking libraries like Aiohttp come into play. By using non-blocking sockets and utilizing asynchronous context managers, Aiohttp allows for efficient acquisition and closure of HTTP sessions, leading to improved performance and a more pythonic way of working [1].
Before diving deep into the inner workings of Aiohttp, let’s take a small detour and talk about asynchronous context managers first.
Managing Asynchrony: The Pythonic Way
It’s very common to deal with resources in a way that requires them to be opened and then to be closed. Think of a file for example.
We open it. We read it. We close it. Nothing fancy so far.
However, we need to be careful not to leak any resources. If for any reason an exception is raised our resource might never be properly closed. To avoid any leaking resources we have several options to choose from.
First, we can wrap our code in a try/finally block, making sure the resource will be closed no matter what. Second, we can apply a more pythonic way of dealing with resources. Context managers [2].
# Use of a synchronous context manager
with open("example.txt", "r") as f:
contents = f.read()
print(contents)In Python, context managers are used to ensure that resources are properly closed even if an exception is raised during runtime [3]. However, traditional context managers only work with synchronous code.
With the introduction of asynchronous context managers [4], we can now manage resources asynchronously by using the async with syntax. Now, we can acquire and close resources like HTTP sessions more cleanly and in a more Pythonic way. This is why asynchronous context managers lay at the core of Aiohttp.
Let’s take a look at a super basic example to illustrate and understand the way asynchronous context managers work.
import asyncio
# Implement the context manager protocol
class AsyncContextManager:
async def __aenter__(self):
print("Entering async context...")
return self
async def __aexit__(self, exc_type, exc_value, traceback):
print("Exiting async context...")
return False
# Define the main coroutine
async def main():
async with AsyncContextManager():
print("Inside async context...")
asyncio.run(main())In this example, we define an AsyncContextManager class that implements the async context manager protocol by defining the __aenter__ and __aexit__ methods.
When the async with block is executed, the __aenter__ method is called, which in this case simply prints a message to indicate that the async context has been entered. The code inside the async with block is then executed, which in this example just prints another message.
When the async with block is exited, the __aexit__ method is called, which also prints a message to indicate that the async context has been exited.
If an exception occurs inside the async with block, the __aexit__ method is called with the details of the exception, allowing the context manager to handle the exception if necessary.
Making Concurrent Web Requests with Aiohttp
Now that we know about non-blocking libraries and resource handling with asynchronous context managers, it’s finally time to make some requests.
Non-blocking requests. Concurrently. Of course.
But, before we do any of that. Let’s do it the old-fashioned way first — synchronously.
import time
import requests
def fetch_status(url: str) -> int:
response = requests.get(url)
return response.status_code
def main() -> None:
start = time.time()
urls = ['http://python.org' for _ in range(10)]
results = [fetch_status(url) for url in urls]
print(results)
end = time.time()
print(f"Total time: {end-start:.4f} seconds")
main()In this example, we make use of the requests library, which is blocking by default. We simply execute 10 requests sequentially and fetch the status code.
Running this code takes about 1.4 seconds.
Now, let’s do the same thing again. However this time we make use of Aiohttp.
import time
import aiohttp
import asyncio
from aiohttp import ClientSession
async def fetch_status(session: ClientSession, url: str) -> int:
# Use ClientSession to make a GET Request
async with session.get(url) as response:
return response.status
async def main() -> None:
start = time.time()
# Acquire new ClientSession
async with aiohttp.ClientSession() as session:
urls = ['http://python.org' for _ in range(10)]
requests = [fetch_status(session, url) for url in urls]
results = await asyncio.gather(*requests)
print(results)
end = time.time()
print(f"Total time: {end-start:.4f} seconds")
loop = asyncio.get_event_loop()
loop.run_until_complete(main())Running the code above takes only about 0.2 seconds.
7x faster than before. This is the power of concurrency.
So how does this work?
In order to make web requests Aiohttp relies on the concept of sessions, where one session can have multiple connections open. This is known as “Connection pooling” a technique for managing a pool of reusable network connections to a server [5]. This allows us to avoid the overhead of creating a new connection for each request.
Once we obtained a session, we can make our GET requests. We utilize our helper coroutine fetch_status to create multiple requests and schedule them on the event loop by using asyncio.gather.
Things will fail. They simply do.
A lot of things can go wrong when making a network request. Unreliable connections. Bad requests. Data errors. All of these issues can cause our request to run indefinitely. Thus, we need a way to time out.
Luckily for us, we can make use of Aiohttp’s ClientTimeout data structure.
import aiohttp
import asyncio
from aiohttp import ClientSession
async def fetch_status(session: ClientSession, url: str) -> int:
# Apply a timeout at request level
request_timeout = aiohttp.ClientTimeout(total=0.2)
async with session.get(url, timeout=request_timeout) as response:
return response.status
async def main() -> None:
# Apply a timeout at session level
session_timeout = aiohttp.ClientTimeout(total=1.0, connect=0.2)
async with aiohttp.ClientSession(timeout=session_timeout) as session:
result = await fetch_status(session, 'http://python.org')
print(result)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())In the example above, we simply specify two timeouts. One at the session and the other at the request level. If our request, for example, takes too long an asyncio.TimeoutError will be raised.
But what if a single request fails? What about exception handling?
Unfortunately, exception handling when running multiple requests with asyncio.gather is a bit clunky. However, we can make use of the parameter return_exceptions=True which will include all exceptions raised in the result list. This allows us to handle the exceptions accordingly.
import aiohttp
import asyncio
from aiohttp import ClientSession
async def fetch_status(session: ClientSession, url: str) -> int:
async with session.get(url) as response:
return response.status
async def main() -> None:
async with aiohttp.ClientSession() as session:
urls = ['http://python.org', 'invalid://address.org']
requests = [fetch_status(session, url) for url in urls]
# Include raised exceptions in result list
results = await asyncio.gather(*requests, return_exceptions=True)
# Outputs: [200, AssertionError()]
print(results)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())Just slightly more control. Please.
Using asyncio.gather is convenient. But it has its drawbacks.
Exception handling is somewhat clunky and additionally, we have to wait. We have to wait until all requests are completed before we can proceed to work with any of the results. So if there is just one bad request, that takes forever — we’ll most likely end up waiting forever.
Fortunately, there is another way.
We can make use of asyncio.wait which takes a list of awaitables and returns two sets. A set of tasks that are finished, and a set of tasks that are pending.
import aiohttp
import asyncio
from aiohttp import ClientSession
async def fetch_status(session: ClientSession, url: str, delay: int) -> int:
await asyncio.sleep(delay)
async with session.get(url) as response:
return response.status
async def main() -> None:
async with aiohttp.ClientSession() as session:
fetchers = [
asyncio.create_task(fetch_status(session, 'http://python.org', 1)),
asyncio.create_task(fetch_status(session, 'http://python.org', 1)),
]
# Wait for all tasks to be completed
done, pending = await asyncio.wait(fetchers)
for done_task in done:
result = await done_task
print(result)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())In the example above, we get the same effect as if we’d use asyncio.gather. We run our requests concurrently and wait until all tasks are completed.
However, with asyncio.wait we can specify a return_when parameter.
Let’s slightly modify the example and include long-running requests. We also want to make sure to set return_when=FIRST_COMPLETED to return the result of whatever task finishes first.
We loop over a set of pending tasks and call async.wait on that set with each iteration. Once we have a result, we update done and pending and print out any results as soon as possible.
import aiohttp
import asyncio
from aiohttp import ClientSession
async def fetch_status(session: ClientSession, url: str, delay: int) -> int:
await asyncio.sleep(delay)
async with session.get(url) as response:
return response.status
async def main() -> None:
async with aiohttp.ClientSession() as session:
# Create a set of pending tasks with different delays
pending = [
asyncio.create_task(fetch_status(session, 'http://python.org', 3)),
asyncio.create_task(fetch_status(session, 'http://python.org', 1)),
asyncio.create_task(fetch_status(session, 'http://python.org', 2)),
]
# Loop over the set as long as tasks are pending
while pending:
# Update both sets
done, pending = await asyncio.wait(
pending,
return_when=asyncio.FIRST_COMPLETED,
)
print(f"Tasks done: {len(done)}")
print(f"Tasks pending: {len(pending)}")
# Print results that are already done
for done_task in done:
result = await done_task
print(result)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
# Output:
# Tasks done: 1
# Tasks pending: 2
# 200
# Tasks done: 1
# Tasks pending: 1
# 200
# Tasks done: 1
# Tasks pending: 0
# 200While this approach is definitely less convenient than the use of asyncio.gather and more verbose it allows for more fine-grained control.
As soon as one task is completed we can proceed to work with its result. Moreover, we get the ability to handle each task individually, which also includes exception handling or the cancellation of a task.
Conclusion
Aiohttp provides a solution to the issue of blocking libraries and allows for concurrent web requests and efficient acquisition and closure of HTTP sessions. This leads to improved performance and a more pythonic way of working.
However, it is important to note that while Aiohttp offers a significant improvement in performance, it may not be the best solution for every scenario. Additionally, it’s important to handle exceptions and timeouts appropriately when using Aiohttp and Asyncio in general.
This blog post only scratches the surface of what can be accomplished with non-blocking libraries, and there are many more to discover.
If you enjoyed the read, make sure to hit ‘follow’ for more on Python concurrency and advanced techniques to take your programming skills to the next level.
Consider becoming a Medium member and continue learning with no limits. I’ll receive a portion of your membership fee if you use the following link, at no extra cost to you.
References / Further Material:
- [1] The Zen of Python
- [2] Why You Should Use Context Managers in Python
- [3] https://www.geeksforgeeks.org/context-manager-in-python/
- [4] https://peps.python.org/pep-0492/
- [5] https://www.cockroachlabs.com/blog/what-is-connection-pooling/
- Fowler, Matthew. (2022). Python Concurrency with Asyncio. Manning Publications.
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 💰 Free coding interview course ⇒ View Course
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job
