Speed up your server — 3 Link to heading

In the previous articles, we explored how to speed up a server by employing coroutines and multiple workers. Today, we will explore how to…

Speed up your server — 3 Link to heading

In the previous articles, we explored how to speed up a server by employing coroutines and multiple workers. Today, we will explore how to serve with parallel coroutines working concurrently.

Baseline Link to heading

As before, let’s start with the baseline. This time, we want to simulate both CPU-intensive task as well as IO-task. For that, we will combine fibo() and sleep() together.

# baseline.py

from fastapi import FastAPI
import time

app = FastAPI()


def fibo(n: int) -> int:
    if n <= 1:
        return n
    return fibo(n - 2) + fibo(n - 1)


@app.get("/baseline/fibo_sleep/{n}")
def baseline_fibo_sleep(n: int):
    x = fibo(n) / 1_000_000.0
    time.sleep(x)
    return f"hello after {x}s of sleep"

Let’s fire up the server and measure its latency.

ab -n1000 -c100 -e baseline.csv SERVER_IP:8000/baseline/fibo_sleep/20

Async Link to heading

We can easily implement async version to take advantage of the IO-bound tasks.

--- baseline.py
+++ async.py
@@ -1,5 +1,6 @@
 from fastapi import FastAPI
 import time
+import asyncio

 app = FastAPI()


@@ -15,3 +16,10 @@ def baseline_fibo_sleep(n: int):
     x = fibo(n) / 1_000_000.0
     time.sleep(x)
     return f"hello after {x}s of sleep"
+
+
@app.get("/async/fibo_sleep/{n}")
async def async_fibo_sleep(n: int):
    x = fibo(n) / 1_000_000.0
    await asyncio.sleep(x)
    return f"hello after {x}s of sleep"

Let’s now compare with the baseline.

ab -n1000 -c100 -e async.csv SERVER_IP:8000/async/fibo_sleep/20

Image

As we have seen before, running the coroutine does help with IO-bound work, but adds more overhead for CPU-bound work. We do see slight improvement over the baseline, but async itself does not help too much here due to the CPU-intensive work.

Parallelization Link to heading

Let’s experiment with ProcessPoolExecutor to utilize multiple workers as before.

--- async.py
+++ par.py
@@ -1,8 +1,10 @@
 from fastapi import FastAPI
 import time
 import asyncio
+from concurrent.futures import ProcessPoolExecutor

 app = FastAPI()
+executor = ProcessPoolExecutor()


 def fibo(n: int) -> int:
@@ -23,3 +25,11 @@ async def async_fibo_sleep(n: int):
     x = fibo(n) / 1_000_000.0
     await asyncio.sleep(x)
     return f"hello after {x}s of sleep"
+
+
@app.get("/par/fibo_sleep/{n}")
async def par_fibo_sleep(n: int):
    loop = asyncio.get_event_loop()
    x = await loop.run_in_executor(executor, fibo, n) / 1_000_000.0
    await asyncio.sleep(x)
    return f"hello after {x}s of sleep"

Let’s again run the benchmark

ab -n1000 -c100 -e par.csv SERVER_IP:8000/par/fibo_sleep/20

Image

Utilizing multiple works definitely help throughput of the server. However, there is one thing to note. The graph of the parallel implementation looks similar to baseline rather than async. This is because we are only parallelizing the CPU-bound work, i.e., fibo() but not the IO-bound work, i.e., sleep() . It would be even better if we could parallelize all work, right?

Parallel Coroutines Link to heading

Most server framework support parallel workers out of the box. For FastAPI, which delegates the server deployment to Uvicorn, we can simply pass --workers parameter to instantiate multiple identical servers. What this will do is creating multiple identical processes of the server running independently. The parent process will simply send the task to one of the workers.

If worker process has its own event loop to run coroutines concurrently. Hence, this allows us parallelize not only a fibo() but also sleep(). Let’s re-run the async implementation, but this time simply supply --workers parameter with the number of cores the system has, say 6.

fastapi run --workers=6 async.py

Image

Finally, we get the best result. By parallelizing all work, we are able to reduce the p95 latency from ~90s to ~50s. Not only is this easier than our manual parallel implementation but also more efficient!

Conclusion Link to heading

So, we will conclude our exploration here with the following lessons to juice every droplet from the server

  • if any IO-bound task is present, use a coroutine
  • don’t manually parallelize the work yourself — your server framework most likely already supports parallelization out of the box

But more importantly, always benchmark your server!

References Link to heading

  • FastAPI - FastAPI
    FastAPI framework, high performance, easy to learn, fast to code, ready for production
    fastapi.tiangolo.com

  • Uvicorn
    The lightning-fast ASGI server.
    www.uvicorn.org

  • asyncio - Asynchronous I/O
    Hello World!: asyncio is a library to write concurrent code using the async/await syntax. asyncio is used as a…
    docs.python.org

  • concurrent.futures - Launching parallel tasks
    Source code: Lib/concurrent/futures/thread.py and Lib/concurrent/futures/process.py The concurrent.futures module…
    docs.python.org