Intro to aiohttp¶
In the previous section, we’ve got a taste of what asynchronous task execution
looks like using asyncio
library.
The example before was quite simple, and perhaps not very exciting. Let’s now try building something bigger, like our own web scraper. A web scraper is a way to extract data from websites. It is a way to copy/download data programmatically.
For demonstration purpose, we’ll try downloading some PEPs and save them to our local machine for further analysis.
The PEPs we want to download are the governance PEPs:
Downloading contents synchronously¶
First, let us try doing this synchronously using the requests library. It can be installed using pip.
python3.7 -m pip install requests
Downloading an online resource using requests is straightforward.
import requests
response = requests.get("https://www.python.org/dev/peps/pep-8010/")
print(response.content)
It will print out the HTML content of PEP 8010. To save it locally to a file:
filename = "sync_pep_8010.html"
with open(filename, "wb") as pep_file:
pep_file.write(content.encode('utf-8'))
The file sync_pep_8010.html
will be created.
💡 Exercise¶
Let’s take the next 10-15 minutes to write a script that will programmatically download PEPs 8010 to 8016 using requests library.
Solution¶
Here is an example solution. It is ok if yours do not look exactly the same.
import time
import requests
def download_pep(pep_number: int) -> bytes:
url = f"https://www.python.org/dev/peps/pep-{pep_number}/"
print(f"Begin downloading {url}")
response = requests.get(url)
print(f"Finished downloading {url}")
return response.content
def write_to_file(pep_number: int, content: bytes) -> None:
filename = f"sync_{pep_number}.html"
with open(filename, "wb") as pep_file:
print(f"Begin writing to {filename}")
pep_file.write(content)
print(f"Finished writing {filename}")
if __name__ == "__main__":
s = time.perf_counter()
for i in range(8010, 8016):
content = download_pep(i)
write_to_file(i, content)
elapsed = time.perf_counter() - s
print(f"Execution time: {elapsed:0.2f} seconds.")
# Begin downloading https://www.python.org/dev/peps/pep-8010/
# Finished downloading https://www.python.org/dev/peps/pep-8010/
# Begin writing to 8010.html
# Finished writing 8010.html
# Begin downloading https://www.python.org/dev/peps/pep-8011/
# Finished downloading https://www.python.org/dev/peps/pep-8011/
# Begin writing to 8011.html
# Finished writing 8011.html
# Begin downloading https://www.python.org/dev/peps/pep-8012/
# Finished downloading https://www.python.org/dev/peps/pep-8012/
# Begin writing to 8012.html
# Finished writing 8012.html
# Begin downloading https://www.python.org/dev/peps/pep-8013/
# Finished downloading https://www.python.org/dev/peps/pep-8013/
# Begin writing to 8013.html
# Finished writing 8013.html
# Begin downloading https://www.python.org/dev/peps/pep-8014/
# Finished downloading https://www.python.org/dev/peps/pep-8014/
# Begin writing to 8014.html
# Finished writing 8014.html
# Begin downloading https://www.python.org/dev/peps/pep-8015/
# Finished downloading https://www.python.org/dev/peps/pep-8015/
# Begin writing to 8015.html
# Finished writing 8015.html
# Execution time: 3.60 seconds.
In the above solution, we’re downloading the PEP one at a time. From previous
section, we know that using asyncio
, we can run the
same task asynchronously. requests itself is not an asyncio library.
Enter aiohttp.
Downloading contents asynchronously¶
Install aiohttp if you have not already:
python3.7 -m pip install aiohttp
Here’s an example of downloading an online resource using aiohttp.
import asyncio
import aiohttp
async def download_pep(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
content = await resp.read()
print(content)
return content
asyncio.run(download_pep("https://www.python.org/dev/peps/pep-8010/"))
Writing the downloaded content to a new file can be done as its own coroutine.
async def write_to_file(pep_number, content):
filename = f"async_{pep_number}.html"
with open(filename, "wb") as pep_file:
pep_file.write(content)
Since we now have two coroutines, we can execute them like so:
async def web_scrape_task(pep_number):
url = f"https://www.python.org/dev/peps/pep-{pep_number}/"
downloaded_content = await download_pep(url)
await write_to_file(pep_number, downloaded_content)
asyncio.run(web_scrape_task(8010))
💡 Exercise¶
The code is looking more complex than when we’re doing it synchronously, using requests. But you got this. Now that you know how to download an online resource using aiohttp, now you can download multiple pages asynchronously.
Let’s take the next 10-15 minutes to write the script for downloading PEPs 8010 - 8016 using aiohttp.
Solution¶
Here is an example solution. It is ok if yours do not look exactly the same.
import asyncio
import time
import aiohttp
async def download_pep(pep_number: int) -> bytes:
url = f"https://www.python.org/dev/peps/pep-{pep_number}/"
print(f"Begin downloading {url}")
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
content = await resp.read()
print(f"Finished downloading {url}")
return content
async def write_to_file(pep_number: int, content: bytes) -> None:
filename = f"async_{pep_number}.html"
with open(filename, "wb") as pep_file:
print(f"Begin writing to {filename}")
pep_file.write(content)
print(f"Finished writing {filename}")
async def web_scrape_task(pep_number: int) -> None:
content = await download_pep(pep_number)
await write_to_file(pep_number, content)
async def main() -> None:
tasks = []
for i in range(8010, 8016):
tasks.append(web_scrape_task(i))
await asyncio.wait(tasks)
if __name__ == "__main__":
s = time.perf_counter()
asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"Execution time: {elapsed:0.2f} seconds.")
# Begin downloading https://www.python.org/dev/peps/pep-8010/
# Begin downloading https://www.python.org/dev/peps/pep-8015/
# Begin downloading https://www.python.org/dev/peps/pep-8012/
# Begin downloading https://www.python.org/dev/peps/pep-8013/
# Begin downloading https://www.python.org/dev/peps/pep-8014/
# Begin downloading https://www.python.org/dev/peps/pep-8011/
# Finished downloading https://www.python.org/dev/peps/pep-8014/
# Begin writing to async_8014.html
# Finished writing async_8014.html
# Finished downloading https://www.python.org/dev/peps/pep-8012/
# Begin writing to async_8012.html
# Finished writing async_8012.html
# Finished downloading https://www.python.org/dev/peps/pep-8013/
# Begin writing to async_8013.html
# Finished writing async_8013.html
# Finished downloading https://www.python.org/dev/peps/pep-8010/
# Begin writing to async_8010.html
# Finished writing async_8010.html
# Finished downloading https://www.python.org/dev/peps/pep-8011/
# Begin writing to async_8011.html
# Finished writing async_8011.html
# Finished downloading https://www.python.org/dev/peps/pep-8015/
# Begin writing to async_8015.html
# Finished writing async_8015.html
# Execution time: 0.87 seconds.
While the code looks longer and more complex than our solution using requests, by executing the code asynchronously, the task is taking less time to complete.
Why aiohttp¶
aiohttp provides the framework for both web Server and Client. For example, Django is mainly the framework you’d use if you need a server, and you’ll use it in conjuction with requests.
We’re not advocating for you to replace your existing web application with aiohttp. Each framework comes with their own benefits. Our goal in this tutorial is to learn something new together, and be comfortable working with asyncio.