Python `asyncio` module with use case
In this article, We will start with a simple use case then as we build our knowledge we will introduce more features by extending our simple use case.
We'll also see how Asynchronous programming improves our code performance
Now we will be taking a practical approach to asyncio
in this blog and see how it can help us in improving code performance
From the documentation
asyncio is a library to write concurrent code using the async/await syntax.
asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc.
asyncio is often a perfect fit for IO-bound and high-level structured network code. more information can be found here
Let's get started
Note: You would require Python 3.5 or above to follow along with the examples
Here's the simple use case:
import requests
url = "https://www.google.com"
print(requests.get(url).text)
As you can see in the above example I have used the requests
package to fetch website content. In this case, I am getting content from Google
The above code is synchronous and is processed after converting it to bytecode which is then converted to machine code one line at a time by Python Virtual Machine (PVM). For more information on how code is getting run behind the scene, you can refer to this blog.
Let's see what changes we need to make in the above example to make it Asynchronous
import requests
import asyncio
async def get(url: str) -> str:
return requests.get(url).text
coroutine = get("https://www.google.com")
loop = asyncio.get_event_loop()
print(loop.run_until_complete(coroutine))
In the above code snippet
import asyncio
- We introduced theasyncio
module through which async code gets executedasync def func()
- Usedasync
keyword to make the function asynchronouscoroutine = get("https://www.google.com")
- Don't get confused by this statement which seems like the code inside the get function is getting executed however it is not, when introducedasync
keyword it is no longer a normal function but a coroutine which you can check by printing the coro variable type, coroutine is a specialized version of Python generator functionsloop = asyncio.get_event_loop()
- Get event loop which is used to run async code, event loop is similar to while loop which monitors the coroutine, taking feedback on what’s idle, and looking around for tasks that can be executed in the meantimeloop.run_until_complete(coroutine)
- Usedrun_until_complete
method from the event loop to run get coroutine and retrieve the function result
Even though we have used asyncio
the code is still synchronous
Let's introduce some extra processing by fetching the same content multiple times instead of just once as in previous examples
Note: Just for example I am fetching the same content multiple times but in real life scenario you would be fetching different content either by changing parameters or by using different URLs
import time
import requests
def main():
url = "https://www.google.com"
for i in range(5):
print("Started: %d" % i)
data = requests.get(url).text
print("Finished: %d" % i)
start = time.perf_counter()
main()
elapsed = time.perf_counter() - start
print(f"Executed in {elapsed:0.2f} seconds")
What the above code does is it fetches data from Google 5 times using for loop and it also included code to check how much time it is taking to complete to execution of the code.
If you run the code you will get similar output except there may be a difference in execution time due to various factors like network, memory, CPU, etc.
Started: 0
Finished: 0
Started: 1
Finished: 1
Started: 2
Finished: 2
Started: 3
Finished: 3
Started: 4
Finished: 4
Executed in 5.25 seconds
Now let's rewrite the above example with asyncio
and see what's the difference
import time
import requests
import asyncio
async def get(url: str) -> str:
return requests.get(url).text
async def main():
result = []
for i in range(5):
print("Started: %d" % i)
data = await get('http://www.google.com')
print("Finished: %d" % i)
result.append(data)
return None
loop = asyncio.get_event_loop()
start = time.perf_counter()
loop.run_until_complete(main())
elapsed = time.perf_counter() - start
print(f"Executed in {elapsed:0.2f} seconds")
The code is pretty much similar to the previous example with a few changes
Added a new keyword
await
that tells to wait for the completion of the current statement and then move to the next statementAdded main function which is being passed to
loop.run_until_complete
method of the event loop
Let's run the code and see what's the difference
Started: 0
Finished: 0
Started: 1
Finished: 1
Started: 2
Finished: 2
Started: 3
Finished: 3
Started: 4
Finished: 4
Executed in 3.99 seconds
As you can see it only took 3.99 seconds to run which is approx 24 percent faster in comparison with the synchronous version. However, if you run the same code multiple times it may not perform well since it is not running concurrently let's see why.
In the above code, we are handling the running of a coroutine it would be better if instead of us running the coroutine asyncio
handles the running of coroutines concurrently next let's see how we can do that
import time
import requests
import asyncio
async def get(url: str) -> str:
return requests.get(url).text
async def call(i: int, url: str) -> str:
print("Started i : %d" % i)
data = await get(url)
print("Finished i : %d" % i)
return data
async def main():
return await asyncio.gather(*(
call(i, 'http://www.google.com')
for i in range(5)
))
loop = asyncio.get_event_loop()
start = time.perf_counter()
loop.run_until_complete(main())
elapsed = time.perf_counter() - start
print(f"Executed in {elapsed:0.2f} seconds")
Using
asyncio.gather
method we can tellasyncio
to run the coroutines and collects the results for usIn the above example, I have used generator comprehension which is similar to list comprehension
Introduced call function which is calling get function with
await
keyword, the call function is being called byasyncio.gather
method so we can see what part of the code is in execution
There is still one problem requests
package blocks the event loop and does not let the event loop starts the processing of the next set of coroutines until the one running completes
There are two ways to solve the problem either use a different package like aiohttp
that does not block the event loop or update in a way so that requests
package does not block the event loop
Let's look at the second approach since first approach examples can be easily found online on whichever package you decided to use
import time
import requests
import asyncio
async def get(url: str) -> str:
return await loop.run_in_executor(None, requests.get, url)
async def call(i: int, url: str) -> str:
print("Started: %d" % i)
data = await get(url)
print("Finished: %d" % i)
return data
async def main():
return await asyncio.gather(*(
call(i, 'http://www.google.com')
for i in range(5)
))
loop = asyncio.get_event_loop()
start = time.perf_counter()
loop.run_until_complete(main())
elapsed = time.perf_counter() - start
print(f"Executed in {elapsed:0.2f} seconds")
By calling the event loop run_in_executor
method it moves the execution of requests.get
out of the main thread to a different thread which let the main thread continue on to the next set of steps
Note: By default event loop uses
ThreadExecutor
internally to run the passed function/method for more info you can check here
And when you run the code
Started: 0
Started: 1
Started: 2
Started: 3
Started: 4
Finished: 4
Finished: 1
Finished: 0
Finished: 3
Finished: 2
Executed in 0.59 seconds
We can see that all the coroutine execution started at the same time however each coroutine finished at random and in performance comparison, it is approx 89 percent faster than the synchronous version
Note: If you are using Python 3.7 then you can replace these two lines
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
with just
asyncio.run(main())
P.S.: Let me know if something can be improved or If I had committed any mistakes. Thanks for reading :) Happy learning !!