Intro to aiohttp

In the previous section, we’ve got a taste of what asynchronous task execution looks like using asyncio library.

The example before was quite simple, and perhaps not very exciting. Let’s now try building something bigger, like our own web scraper. A web scraper is a way to extract data from websites. It is a way to copy/download data programmatically.

For demonstration purpose, we’ll try downloading some PEPs and save them to our local machine for further analysis.

The PEPs we want to download are the governance PEPs:

Downloading contents synchronously

First, let us try doing this synchronously using the requests library. It can be installed using pip.

python3.7 -m pip install requests

Downloading an online resource using requests is straightforward.

import requests

response = requests.get("https://www.python.org/dev/peps/pep-8010/")
print(response.content)

It will print out the HTML content of PEP 8010. To save it locally to a file:

filename = "sync_pep_8010.html"

with open(filename, "wb") as pep_file:
    pep_file.write(content.encode('utf-8'))

The file sync_pep_8010.html will be created.

💡 Exercise

Let’s take the next 10-15 minutes to write a script that will programmatically download PEPs 8010 to 8016 using requests library.

Solution

Here is an example solution. It is ok if yours do not look exactly the same.

import time

import requests


def download_pep(pep_number: int) -> bytes:

    url = f"https://www.python.org/dev/peps/pep-{pep_number}/"
    print(f"Begin downloading {url}")
    response = requests.get(url)
    print(f"Finished downloading {url}")
    return response.content


def write_to_file(pep_number: int, content: bytes) -> None:

    filename = f"sync_{pep_number}.html"

    with open(filename, "wb") as pep_file:
        print(f"Begin writing to {filename}")
        pep_file.write(content)
        print(f"Finished writing {filename}")


if __name__ == "__main__":
    s = time.perf_counter()

    for i in range(8010, 8016):
        content = download_pep(i)
        write_to_file(i, content)

    elapsed = time.perf_counter() - s
    print(f"Execution time: {elapsed:0.2f} seconds.")

# Begin downloading https://www.python.org/dev/peps/pep-8010/
# Finished downloading https://www.python.org/dev/peps/pep-8010/
# Begin writing to 8010.html
# Finished writing 8010.html
# Begin downloading https://www.python.org/dev/peps/pep-8011/
# Finished downloading https://www.python.org/dev/peps/pep-8011/
# Begin writing to 8011.html
# Finished writing 8011.html
# Begin downloading https://www.python.org/dev/peps/pep-8012/
# Finished downloading https://www.python.org/dev/peps/pep-8012/
# Begin writing to 8012.html
# Finished writing 8012.html
# Begin downloading https://www.python.org/dev/peps/pep-8013/
# Finished downloading https://www.python.org/dev/peps/pep-8013/
# Begin writing to 8013.html
# Finished writing 8013.html
# Begin downloading https://www.python.org/dev/peps/pep-8014/
# Finished downloading https://www.python.org/dev/peps/pep-8014/
# Begin writing to 8014.html
# Finished writing 8014.html
# Begin downloading https://www.python.org/dev/peps/pep-8015/
# Finished downloading https://www.python.org/dev/peps/pep-8015/
# Begin writing to 8015.html
# Finished writing 8015.html
# Execution time: 3.60 seconds.

In the above solution, we’re downloading the PEP one at a time. From previous section, we know that using asyncio, we can run the same task asynchronously. requests itself is not an asyncio library. Enter aiohttp.

Downloading contents asynchronously

Install aiohttp if you have not already:

python3.7 -m pip install aiohttp

Here’s an example of downloading an online resource using aiohttp.

import asyncio
import aiohttp

async def download_pep(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            content = await resp.read()
            print(content)
            return content

asyncio.run(download_pep("https://www.python.org/dev/peps/pep-8010/"))

Writing the downloaded content to a new file can be done as its own coroutine.

async def write_to_file(pep_number, content):
    filename = f"async_{pep_number}.html"
    with open(filename, "wb") as pep_file:
        pep_file.write(content)

Since we now have two coroutines, we can execute them like so:

async def web_scrape_task(pep_number):
    url = f"https://www.python.org/dev/peps/pep-{pep_number}/"

    downloaded_content = await download_pep(url)
    await write_to_file(pep_number, downloaded_content)


asyncio.run(web_scrape_task(8010))

💡 Exercise

The code is looking more complex than when we’re doing it synchronously, using requests. But you got this. Now that you know how to download an online resource using aiohttp, now you can download multiple pages asynchronously.

Let’s take the next 10-15 minutes to write the script for downloading PEPs 8010 - 8016 using aiohttp.

Solution

Here is an example solution. It is ok if yours do not look exactly the same.

import asyncio
import time

import aiohttp


async def download_pep(pep_number: int) -> bytes:

    url = f"https://www.python.org/dev/peps/pep-{pep_number}/"
    print(f"Begin downloading {url}")
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            content = await resp.read()
            print(f"Finished downloading {url}")
            return content


async def write_to_file(pep_number: int, content: bytes) -> None:
    filename = f"async_{pep_number}.html"
    with open(filename, "wb") as pep_file:
        print(f"Begin writing to {filename}")
        pep_file.write(content)
        print(f"Finished writing {filename}")


async def web_scrape_task(pep_number: int) -> None:
    content = await download_pep(pep_number)
    await write_to_file(pep_number, content)


async def main() -> None:
    tasks = []
    for i in range(8010, 8016):
        tasks.append(web_scrape_task(i))
    await asyncio.wait(tasks)


if __name__ == "__main__":
    s = time.perf_counter()

    asyncio.run(main())

    elapsed = time.perf_counter() - s
    print(f"Execution time: {elapsed:0.2f} seconds.")


# Begin downloading https://www.python.org/dev/peps/pep-8010/
# Begin downloading https://www.python.org/dev/peps/pep-8015/
# Begin downloading https://www.python.org/dev/peps/pep-8012/
# Begin downloading https://www.python.org/dev/peps/pep-8013/
# Begin downloading https://www.python.org/dev/peps/pep-8014/
# Begin downloading https://www.python.org/dev/peps/pep-8011/
# Finished downloading https://www.python.org/dev/peps/pep-8014/
# Begin writing to async_8014.html
# Finished writing async_8014.html
# Finished downloading https://www.python.org/dev/peps/pep-8012/
# Begin writing to async_8012.html
# Finished writing async_8012.html
# Finished downloading https://www.python.org/dev/peps/pep-8013/
# Begin writing to async_8013.html
# Finished writing async_8013.html
# Finished downloading https://www.python.org/dev/peps/pep-8010/
# Begin writing to async_8010.html
# Finished writing async_8010.html
# Finished downloading https://www.python.org/dev/peps/pep-8011/
# Begin writing to async_8011.html
# Finished writing async_8011.html
# Finished downloading https://www.python.org/dev/peps/pep-8015/
# Begin writing to async_8015.html
# Finished writing async_8015.html
# Execution time: 0.87 seconds.

While the code looks longer and more complex than our solution using requests, by executing the code asynchronously, the task is taking less time to complete.

Why aiohttp

  • Web frameworks like Django and Flask don’t support asyncio.

  • aiohttp provides the framework for both web Server and Client. For example, Django is mainly the framework you’d use if you need a server, and you’ll use it in conjuction with requests.

We’re not advocating for you to replace your existing web application with aiohttp. Each framework comes with their own benefits. Our goal in this tutorial is to learn something new together, and be comfortable working with asyncio.