Free AI web copilot to create summaries, insights and extended knowledge, download it at here

9162

Abstract

="hljs-string">f"Total time: {end-start:.4f} seconds")

main()</pre></div>In this example, we make use of the <code>requests</code> library, which is blocking by default. We simply execute 10 requests sequentially and fetch the status code.Running this code takes about 1.4 seconds.Now, let’s do the same thing again. However this time we make use of Aiohttp.<div id="22f9"><pre>import time import aiohttp import asyncio

from aiohttp import ClientSession

async def fetch_status(session: ClientSession, url: str) -> int: # Use ClientSession to make a GET Request async with session.get(url) as response: return response.status

async def main() -> None: start = time.time()

<span class="hljs-comment"># Acquire new ClientSession</span>
<span class="hljs-keyword">async</span> <span class="hljs-keyword">with</span> aiohttp.ClientSession() <span class="hljs-keyword">as</span> session:
    urls = [<span class="hljs-string">'http://python.org'</span> <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-number">10</span>)]
    requests = [fetch_status(session, url) <span class="hljs-keyword">for</span> url <span class="hljs-keyword">in</span> urls]

    results = <span class="hljs-keyword">await</span> asyncio.gather(*requests)
    <span class="hljs-built_in">print</span>(results)

end = time.time()
<span class="hljs-built_in">print</span>(<span class="hljs-string">f"Total time: <span class="hljs-subst">{end-start:<span class="hljs-number">.4</span>f}</span> seconds"</span>)

loop = asyncio.get_event_loop() loop.run_until_complete(main())</pre></div>Running the code above takes only about 0.2 seconds.7x faster than before. This is the power of concurrency.So how does this work?In order to make web requests Aiohttp relies on the concept of sessions, where one session can have multiple connections open. This is known as “Connection pooling” a technique for managing a pool of reusable network connections to a server [5]. This allows us to avoid the overhead of creating a new connection for each request.Once we obtained a session, we can make our GET requests. We utilize our helper coroutine <code>fetch_status</code> to create multiple requests and schedule them on the event loop by using <code>asyncio.gather</code>.<h2 id="7966">Things will fail. They simply do.</h2>A lot of things can go wrong when making a network request. Unreliable connections. Bad requests. Data errors. All of these issues can cause our request to run indefinitely. Thus, we need a way to time out.Luckily for us, we can make use of Aiohttp’s <code>ClientTimeout</code> data structure.<div id="0a70"><pre>import aiohttp import asyncio

from aiohttp import ClientSession

async def fetch_status(session: ClientSession, url: str) -> int: # Apply a timeout at request level request_timeout = aiohttp.ClientTimeout(total=0.2) async with session.get(url, timeout=request_timeout) as response: return response.status

async def main() -> None: # Apply a timeout at session level session_timeout = aiohttp.ClientTimeout(total=1.0, connect=0.2) async with aiohttp.ClientSession(timeout=session_timeout) as session: result = await fetch_status(session, 'http://python.org') print(result)

loop = asyncio.get_event_loop() loop.run_until_complete(main())</pre></div>In the example above, we simply specify two timeouts. One at the session and the other at the request level. If our request, for example, takes too long an <code>asyncio.TimeoutError</code> will be raised.But what if a single request fails? What about exception handling?Unfortunately, exception handling when running multiple requests with <code>asyncio.gather</code> is a bit clunky. However, we can make use of the parameter <code>return_exceptions=True</code> which will include all exceptions raised in the result list. This allows us to handle the exceptions accordingly.<div id="d969"><pre>import aiohttp import asyncio

from aiohttp import ClientSession

async def fetch_status(session: ClientSession, url: str) -> int: async with session.get(url) as response: return response.status

    <span class="hljs-comment"># Include raised exceptions in result list</span>
    results = <span class="hljs-keyword">await</span> asyncio.gather(*requests, return_exceptions=<span class="hljs-literal">True</span>)
    <span class="hljs-comment"># Outputs: [200, AssertionError()]        </span>
    <span class="hljs-built_in">print</span>(results)

loop = asyncio.get_event_loop() loop.run_until_complete(main())</pre></div><h2 id="b23d">Just slightly more control. Please.</h2>Using <code>asyncio.gather</code> is convenient. But it has its drawbacks.Exception handling is somewhat clunky and additionally, we have to wait. We have to wait until all requests are completed before we can proceed to work with any of the results. So if there is just one bad request, that takes forever — we’ll most likely end up waiting forever.Fortunately, there is another way.We can make use of <code>asyncio.wait</code> which takes a list of awaitables and returns two sets. A set of tasks that are finished, and a set of tasks that are pending.<div id="3778"><pre>import aiohttp import asyncio

from aiohttp import ClientSession

async def fetch_status(session: ClientSession, url: str, delay: int) -> int: await asyncio.sleep(delay) async with</spa

Options

n> session.get(url) as response: return response.status

async def main() -> None: async with aiohttp.ClientSession() as session: fetchers = [ asyncio.create_task(fetch_status(session, 'http://python.org', 1)), asyncio.create_task(fetch_status(session, 'http://python.org', 1)), ]

    <span class="hljs-comment"># Wait for all tasks to be completed</span>
    done, pending = <span class="hljs-keyword">await</span> asyncio.wait(fetchers)

    <span class="hljs-keyword">for</span> done_task <span class="hljs-keyword">in</span> done:
        result = <span class="hljs-keyword">await</span> done_task
        <span class="hljs-built_in">print</span>(result)

loop = asyncio.get_event_loop() loop.run_until_complete(main())</pre></div>In the example above, we get the same effect as if we’d use <code>asyncio.gather</code>. We run our requests concurrently and wait until all tasks are completed.However, with <code>asyncio.wait</code> we can specify a <code>return_when</code> parameter.Let’s slightly modify the example and include long-running requests. We also want to make sure to set <code>return_when=FIRST_COMPLETED</code> to return the result of whatever task finishes first.We loop over a set of pending tasks and call <code>async.wait</code> on that set with each iteration. Once we have a result, we update <code>done</code> and <code>pending</code> and print out any results as soon as possible.<div id="bd32"><pre>import aiohttp import asyncio

from aiohttp import ClientSession

async def fetch_status(session: ClientSession, url: str, delay: int) -> int: await asyncio.sleep(delay) async with session.get(url) as response: return response.status

async def main() -> None: async with aiohttp.ClientSession() as session: # Create a set of pending tasks with different delays pending = [ asyncio.create_task(fetch_status(session, 'http://python.org', 3)), asyncio.create_task(fetch_status(session, 'http://python.org', 1)), asyncio.create_task(fetch_status(session, 'http://python.org', 2)), ]

    <span class="hljs-comment"># Loop over the set as long as tasks are pending</span>
    <span class="hljs-keyword">while</span> pending:
        <span class="hljs-comment"># Update both sets</span>
        done, pending = <span class="hljs-keyword">await</span> asyncio.wait(
            pending,
            return_when=asyncio.FIRST_COMPLETED,
        )

        <span class="hljs-built_in">print</span>(<span class="hljs-string">f"Tasks done: <span class="hljs-subst">{<span class="hljs-built_in">len</span>(done)}</span>"</span>)
        <span class="hljs-built_in">print</span>(<span class="hljs-string">f"Tasks pending: <span class="hljs-subst">{<span class="hljs-built_in">len</span>(pending)}</span>"</span>)
        
        <span class="hljs-comment"># Print results that are already done</span>
        <span class="hljs-keyword">for</span> done_task <span class="hljs-keyword">in</span> done:
            result = <span class="hljs-keyword">await</span> done_task
            <span class="hljs-built_in">print</span>(result)

loop = asyncio.get_event_loop() loop.run_until_complete(main())

# Output: # Tasks done: 1 # Tasks pending: 2 # 200 # Tasks done: 1 # Tasks pending: 1 # 200 # Tasks done: 1 # Tasks pending: 0 # 200</pre></div>While this approach is definitely less convenient than the use of <code>asyncio.gather</code> and more verbose it allows for more fine-grained control.As soon as one task is completed we can proceed to work with its result. Moreover, we get the ability to handle each task individually, which also includes exception handling or the cancellation of a task.<h2 id="19fd">Conclusion</h2>Aiohttp provides a solution to the issue of blocking libraries and allows for concurrent web requests and efficient acquisition and closure of HTTP sessions. This leads to improved performance and a more pythonic way of working.However, it is important to note that while Aiohttp offers a significant improvement in performance, it may not be the best solution for every scenario. Additionally, it’s important to handle exceptions and timeouts appropriately when using Aiohttp and Asyncio in general.This blog post only scratches the surface of what can be accomplished with non-blocking libraries, and there are many more to discover.If you enjoyed the read, make sure to hit ‘follow’ for more on Python concurrency and advanced techniques to take your programming skills to the next level.Consider becoming a <a href="https://medium.com/@marvinlanhenke/membership">Medium member</a> and continue learning with no limits. I’ll receive a portion of your membership fee if you use the following link, at no extra cost to you.<div id="5917" class="link-block"> <a href="https://medium.com/@marvinlanhenke/membership"> <div> <div> <h2>Join Medium with my referral link — Marvin Lanhenke</h2> <div><h3>As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…</h3></div> <div>medium.com</div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*2j7Gh9xg1m7xh4bq)"></div> </div> </div> </a> </div>References / Further Material:<ul><li>[1] <a href="https://peps.python.org/pep-0020/">The Zen of Python</a></li><li>[2] <a href="https://towardsdatascience.com/why-you-should-use-context-managers-in-python-4f10fe231206">Why You Should Use Context Managers in Python</a></li><li>[3] <a href="https://www.geeksforgeeks.org/context-manager-in-python/">https://www.geeksforgeeks.org/context-manager-in-python/</a></li><li>[4] <a href="https://peps.python.org/pep-0492/">https://peps.python.org/pep-0492/</a></li><li>[5] <a href="https://www.cockroachlabs.com/blog/what-is-connection-pooling/">https://www.cockroachlabs.com/blog/what-is-connection-pooling/</a></li><li>Fowler, Matthew. (2022). Python Concurrency with Asyncio. Manning Publications.</li></ul><h1 id="e815">Level Up Coding</h1>Thanks for being a part of our community! Before you go:<ul><li>👏 Clap for the story and follow the author 👉</li><li>📰 View more content in the <a href="https://levelup.gitconnected.com/?utm_source=pub&utm_medium=post">Level Up Coding publication</a></li><li>💰 Free coding interview course ⇒ <a href="https://skilled.dev/?utm_source=luc&utm_medium=article">View Course</a></li><li>🔔 Follow us: <a href="https://twitter.com/gitconnected">Twitter</a> | <a href="https://www.linkedin.com/company/gitconnected">LinkedIn</a> | <a href="https://newsletter.levelup.dev">Newsletter</a></li></ul>🚀👉 <a href="https://jobs.levelup.dev/talent/welcome?referral=true">Join the Level Up talent collective and find an amazing job</a></article></body>

Python Concurrency

Concurrent Web Requests with Aiohttp: Get More Done in Less Time

Discovering Aiohttp for Faster, Concurrent Web Requests

Are you tired?

Tired of waiting for your requests to complete one by one.

Being stuck. Waiting. Only to be met and overwhelmed with the feeling of frustration and disappointment when the request finally times out? Have you tried using async/await everywhere, only to find out that most libraries are blocking anyway?

Fear not, as the answer to your problems lies in Aiohttp.

In the following sections, we will explore the beautifully concurrent world of Aiohttp. A popular asynchronous HTTP client/server library for Python.

We will discover how to make non-blocking web requests that run concurrently and improve the application's performance. By the end of this blog post, you’ll not only be armed with the knowledge of how to use Aiohttp but also how exceptions can be handled or asynchronous context managers work.

So don’t go anywhere, take a seat, fire up your IDE, and let’s get started.

No more Blocking: Introducing Aiohttp

It’s all about concurrency. Allowing multiple tasks to be executed simultaneously. That’s why asynchronous programming and libraries like Python’s Asyncio exist in the first place.

The Beginner’s Guide to Asyncio in Python: A Deeper Dive into Coroutines and Tasks

Harness the Power of Coroutines, Tasks, and Futures

levelup.gitconnected.com

However, one of the most common mistakes we tend to make (Yep, I did it too) is to apply the async/await syntax to every line of code we can get our hands on and hope for the best.

Well, most of the time the best is — nothing. Nothing happens at all. No concurrency. No sweet performance gains. But why?

Unfortunately, most libraries are blocking, meaning that they will block the main thread and event loop, rendering async/await basically ineffective. This is where non-blocking libraries like Aiohttp come into play. By using non-blocking sockets and utilizing asynchronous context managers, Aiohttp allows for efficient acquisition and closure of HTTP sessions, leading to improved performance and a more pythonic way of working [1].

Before diving deep into the inner workings of Aiohttp, let’s take a small detour and talk about asynchronous context managers first.

Managing Asynchrony: The Pythonic Way

It’s very common to deal with resources in a way that requires them to be opened and then to be closed. Think of a file for example.

We open it. We read it. We close it. Nothing fancy so far.

However, we need to be careful not to leak any resources. If for any reason an exception is raised our resource might never be properly closed. To avoid any leaking resources we have several options to choose from.

First, we can wrap our code in a try/finally block, making sure the resource will be closed no matter what. Second, we can apply a more pythonic way of dealing with resources. Context managers [2].

# Use of a synchronous context manager
with open("example.txt", "r") as f:
    contents = f.read()
    print(contents)

In Python, context managers are used to ensure that resources are properly closed even if an exception is raised during runtime [3]. However, traditional context managers only work with synchronous code.

With the introduction of asynchronous context managers [4], we can now manage resources asynchronously by using the async with syntax. Now, we can acquire and close resources like HTTP sessions more cleanly and in a more Pythonic way. This is why asynchronous context managers lay at the core of Aiohttp.

Let’s take a look at a super basic example to illustrate and understand the way asynchronous context managers work.

import asyncio

# Implement the context manager protocol
class AsyncContextManager:
    async def __aenter__(self):
        print("Entering async context...")
        return self
    
    async def __aexit__(self, exc_type, exc_value, traceback):
        print("Exiting async context...")
        return False

# Define the main coroutine
async def main():
    async with AsyncContextManager():
        print("Inside async context...")

asyncio.run(main())

In this example, we define an AsyncContextManager class that implements the async context manager protocol by defining the __aenter__ and __aexit__ methods.

When the async with block is executed, the __aenter__ method is called, which in this case simply prints a message to indicate that the async context has been entered. The code inside the async with block is then executed, which in this example just prints another message.

When the async with block is exited, the __aexit__ method is called, which also prints a message to indicate that the async context has been exited.

If an exception occurs inside the async with block, the __aexit__ method is called with the details of the exception, allowing the context manager to handle the exception if necessary.

Making Concurrent Web Requests with Aiohttp

Now that we know about non-blocking libraries and resource handling with asynchronous context managers, it’s finally time to make some requests.

Non-blocking requests. Concurrently. Of course.

But, before we do any of that. Let’s do it the old-fashioned way first — synchronously.

import time
import requests


def fetch_status(url: str) -> int:
    response = requests.get(url)
    return response.status_code


def main() -> None:
    start = time.time()

    urls = ['http://python.org' for _ in range(10)]
    results = [fetch_status(url) for url in urls]
    print(results)

    end = time.time()
    print(f"Total time: {end-start:.4f} seconds")


main()

In this example, we make use of the requests library, which is blocking by default. We simply execute 10 requests sequentially and fetch the status code.

Running this code takes about 1.4 seconds.

Now, let’s do the same thing again. However this time we make use of Aiohttp.

import time
import aiohttp
import asyncio

from aiohttp import ClientSession


async def fetch_status(session: ClientSession, url: str) -> int:
    # Use ClientSession to make a GET Request
    async with session.get(url) as response:
        return response.status


async def main() -> None:
    start = time.time()

    # Acquire new ClientSession
    async with aiohttp.ClientSession() as session:
        urls = ['http://python.org' for _ in range(10)]
        requests = [fetch_status(session, url) for url in urls]

        results = await asyncio.gather(*requests)
        print(results)

    end = time.time()
    print(f"Total time: {end-start:.4f} seconds")


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Running the code above takes only about 0.2 seconds.

7x faster than before. This is the power of concurrency.

So how does this work?

In order to make web requests Aiohttp relies on the concept of sessions, where one session can have multiple connections open. This is known as “Connection pooling” a technique for managing a pool of reusable network connections to a server [5]. This allows us to avoid the overhead of creating a new connection for each request.

Once we obtained a session, we can make our GET requests. We utilize our helper coroutine fetch_status to create multiple requests and schedule them on the event loop by using asyncio.gather.

Things will fail. They simply do.

A lot of things can go wrong when making a network request. Unreliable connections. Bad requests. Data errors. All of these issues can cause our request to run indefinitely. Thus, we need a way to time out.

Luckily for us, we can make use of Aiohttp’s ClientTimeout data structure.

import aiohttp
import asyncio

from aiohttp import ClientSession


async def fetch_status(session: ClientSession, url: str) -> int:
    # Apply a timeout at request level
    request_timeout = aiohttp.ClientTimeout(total=0.2)
    async with session.get(url, timeout=request_timeout) as response:
        return response.status


async def main() -> None:
    # Apply a timeout at session level
    session_timeout = aiohttp.ClientTimeout(total=1.0, connect=0.2)
    async with aiohttp.ClientSession(timeout=session_timeout) as session:
        result = await fetch_status(session, 'http://python.org')
        print(result)


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

In the example above, we simply specify two timeouts. One at the session and the other at the request level. If our request, for example, takes too long an asyncio.TimeoutError will be raised.

But what if a single request fails? What about exception handling?

Unfortunately, exception handling when running multiple requests with asyncio.gather is a bit clunky. However, we can make use of the parameter return_exceptions=True which will include all exceptions raised in the result list. This allows us to handle the exceptions accordingly.

import aiohttp
import asyncio

from aiohttp import ClientSession


async def fetch_status(session: ClientSession, url: str) -> int:
    async with session.get(url) as response:
        return response.status


async def main() -> None:
    async with aiohttp.ClientSession() as session:
        urls = ['http://python.org', 'invalid://address.org']
        requests = [fetch_status(session, url) for url in urls]
        
        # Include raised exceptions in result list
        results = await asyncio.gather(*requests, return_exceptions=True)
        # Outputs: [200, AssertionError()]        
        print(results)
        

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Just slightly more control. Please.

Using asyncio.gather is convenient. But it has its drawbacks.

Exception handling is somewhat clunky and additionally, we have to wait. We have to wait until all requests are completed before we can proceed to work with any of the results. So if there is just one bad request, that takes forever — we’ll most likely end up waiting forever.

Fortunately, there is another way.

We can make use of asyncio.wait which takes a list of awaitables and returns two sets. A set of tasks that are finished, and a set of tasks that are pending.

import aiohttp
import asyncio

from aiohttp import ClientSession


async def fetch_status(session: ClientSession, url: str, delay: int) -> int:
    await asyncio.sleep(delay)
    async with session.get(url) as response:
        return response.status


async def main() -> None:
    async with aiohttp.ClientSession() as session:
        fetchers = [
            asyncio.create_task(fetch_status(session, 'http://python.org', 1)),
            asyncio.create_task(fetch_status(session, 'http://python.org', 1)),
        ]

        # Wait for all tasks to be completed
        done, pending = await asyncio.wait(fetchers)

        for done_task in done:
            result = await done_task
            print(result)


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

In the example above, we get the same effect as if we’d use asyncio.gather. We run our requests concurrently and wait until all tasks are completed.

However, with asyncio.wait we can specify a return_when parameter.

Let’s slightly modify the example and include long-running requests. We also want to make sure to set return_when=FIRST_COMPLETED to return the result of whatever task finishes first.

We loop over a set of pending tasks and call async.wait on that set with each iteration. Once we have a result, we update done and pending and print out any results as soon as possible.

import aiohttp
import asyncio

from aiohttp import ClientSession


async def fetch_status(session: ClientSession, url: str, delay: int) -> int:
    await asyncio.sleep(delay)
    async with session.get(url) as response:
        return response.status


async def main() -> None:
    async with aiohttp.ClientSession() as session:
        # Create a set of pending tasks with different delays
        pending = [
            asyncio.create_task(fetch_status(session, 'http://python.org', 3)),
            asyncio.create_task(fetch_status(session, 'http://python.org', 1)),
            asyncio.create_task(fetch_status(session, 'http://python.org', 2)),
        ]

        # Loop over the set as long as tasks are pending
        while pending:
            # Update both sets
            done, pending = await asyncio.wait(
                pending,
                return_when=asyncio.FIRST_COMPLETED,
            )

            print(f"Tasks done: {len(done)}")
            print(f"Tasks pending: {len(pending)}")
            
            # Print results that are already done
            for done_task in done:
                result = await done_task
                print(result)


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

# Output:
# Tasks done: 1
# Tasks pending: 2
# 200
# Tasks done: 1
# Tasks pending: 1
# 200
# Tasks done: 1
# Tasks pending: 0
# 200

While this approach is definitely less convenient than the use of asyncio.gather and more verbose it allows for more fine-grained control.

As soon as one task is completed we can proceed to work with its result. Moreover, we get the ability to handle each task individually, which also includes exception handling or the cancellation of a task.

Conclusion

Aiohttp provides a solution to the issue of blocking libraries and allows for concurrent web requests and efficient acquisition and closure of HTTP sessions. This leads to improved performance and a more pythonic way of working.

However, it is important to note that while Aiohttp offers a significant improvement in performance, it may not be the best solution for every scenario. Additionally, it’s important to handle exceptions and timeouts appropriately when using Aiohttp and Asyncio in general.

This blog post only scratches the surface of what can be accomplished with non-blocking libraries, and there are many more to discover.

If you enjoyed the read, make sure to hit ‘follow’ for more on Python concurrency and advanced techniques to take your programming skills to the next level.

Consider becoming a Medium member and continue learning with no limits. I’ll receive a portion of your membership fee if you use the following link, at no extra cost to you.

Join Medium with my referral link — Marvin Lanhenke

As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…

medium.com

References / Further Material:

[1] The Zen of Python
[2] Why You Should Use Context Managers in Python
[3] https://www.geeksforgeeks.org/context-manager-in-python/
[4] https://peps.python.org/pep-0492/
[5] https://www.cockroachlabs.com/blog/what-is-connection-pooling/
Fowler, Matthew. (2022). Python Concurrency with Asyncio. Manning Publications.

Level Up Coding

Thanks for being a part of our community! Before you go:

👏 Clap for the story and follow the author 👉
📰 View more content in the Level Up Coding publication
💰 Free coding interview course ⇒ View Course
🔔 Follow us: Twitter | LinkedIn | Newsletter

🚀👉 Join the Level Up talent collective and find an amazing job