ArchitectureStaff

Staff Prep 09: async vs sync in Python — When Async Actually Helps

April 4, 20269 min readPART 07 / 18

Back to Part 08: FastAPI Lifecycle. "Just make it async" is cargo-cult engineering. Async Python is a concurrency mechanism. It helps exactly one class of problem: waiting on I/O. Apply it to CPU-bound work and you've made things worse, not better. I'd argue this is the single most important distinction in Python backend architecture, and most of the async Python I read in interviews gets it wrong.

The event loop: one thread, cooperative multitasking

Python's asyncio event loop runs on a single OS thread. It executes coroutines cooperatively. When a coroutine hits an await, it suspends and hands control back to the loop, which then picks up another coroutine. No OS context switch, no thread overhead, pure user-space scheduling.

The upside is that thousands of concurrent I/O operations can be in-flight at once, all on one thread, without the cost of thousands of OS threads. A single Uvicorn worker can juggle 10,000 slow HTTP requests if they're all waiting on I/O.

The catch is that while a coroutine is doing CPU-heavy work with no await, the event loop is frozen. Nothing else runs. A 200ms CPU computation blocks every other request for 200ms. I've seen this crater a service whose creator swore up and down that "it's all async, it should be fast."

python

import asyncio
import time

# Concurrent I/O: async shines here
async def fetch_user(user_id: int) -> dict:
    await asyncio.sleep(0.1)  # simulates 100ms database query
    return {"id": user_id, "name": f"User {user_id}"}

async def fetch_all_users_concurrent():
    start = time.perf_counter()
    # Run 10 "queries" concurrently — total time ~100ms, not 1000ms
    users = await asyncio.gather(*[fetch_user(i) for i in range(10)])
    elapsed = time.perf_counter() - start
    print(f"Fetched {len(users)} users in {elapsed:.2f}s")  # ~0.10s

asyncio.run(fetch_all_users_concurrent())

# CPU-bound: async does NOT help
import hashlib

async def hash_password_wrong(password: str) -> str:
    # This runs synchronously — blocks the event loop for its entire duration
    # All other requests wait while this runs
    return hashlib.pbkdf2_hmac("sha256", password.encode(), b"salt", 200000).hex()

The fundamental rule

Use async when you're waiting on something external. Database queries, HTTP calls, Redis, file I/O, or any time you want multiple I/O operations happening at once.

Don't use async for CPU-bound computation like hashing, image processing, or ML inference. Don't use it with sync libraries that block (almost anything without native async support). If you need that work inside an async handler, offload it to a thread or process pool.

run_in_executor: offloading blocking work

python

import asyncio
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import hashlib

# Thread pool: for blocking I/O (sync library calls, file operations)
thread_pool = ThreadPoolExecutor(max_workers=10)

async def hash_password_safe(password: str) -> str:
    loop = asyncio.get_event_loop()
    # Offload to thread pool — event loop stays free
    result = await loop.run_in_executor(
        thread_pool,
        lambda: hashlib.pbkdf2_hmac("sha256", password.encode(), b"salt", 200000).hex()
    )
    return result

# Process pool: for CPU-bound work (releases the GIL entirely)
process_pool = ProcessPoolExecutor(max_workers=4)

def cpu_bound_task(data: list) -> int:
    return sum(x * x for x in data)  # pure CPU work

async def run_cpu_task(data: list) -> int:
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(process_pool, cpu_bound_task, data)
    return result

# FastAPI integration
from fastapi import FastAPI
app = FastAPI()

@app.post("/hash")
async def hash_endpoint(password: str):
    hashed = await hash_password_safe(password)
    return {"hash": hashed}

Gather vs wait: choosing the right tool

python

import asyncio

# asyncio.gather: run tasks concurrently, collect all results
# Raises on first exception by default
async def fetch_dashboard_data(user_id: int):
    # These three queries run concurrently — total time = max of three, not sum
    user, orders, notifications = await asyncio.gather(
        get_user(user_id),
        get_recent_orders(user_id),
        get_notifications(user_id),
    )
    return {"user": user, "orders": orders, "notifications": notifications}

# gather with return_exceptions=True: collect errors instead of raising
async def fetch_with_partial_failure(user_id: int):
    results = await asyncio.gather(
        get_user(user_id),
        get_recent_orders(user_id),
        get_notifications(user_id),
        return_exceptions=True,  # errors become results, not exceptions
    )
    return {
        "user": results[0] if not isinstance(results[0], Exception) else None,
        "orders": results[1] if not isinstance(results[1], Exception) else [],
        "notifications": results[2] if not isinstance(results[2], Exception) else [],
    }

# asyncio.wait: more control — first_completed, first_exception, all_completed
async def fetch_with_timeout(user_id: int):
    tasks = [
        asyncio.create_task(get_user(user_id)),
        asyncio.create_task(get_orders_expensive(user_id)),
    ]
    done, pending = await asyncio.wait(tasks, timeout=1.0)
    for task in pending:
        task.cancel()  # cancel tasks that did not complete in time
    return [task.result() for task in done]

When async hurts: the overhead cost

python

import time

# Sync: fast for CPU-bound sequences
def compute_sync(n: int) -> int:
    return sum(i * i for i in range(n))

# Async: overhead from coroutine machinery for no benefit
async def compute_async(n: int) -> int:
    return sum(i * i for i in range(n))  # no await = no concurrency benefit
    # Just added coroutine overhead: frame allocation, event loop scheduling

# Benchmark: compute_sync(1_000_000) is faster than compute_async(1_000_000)
# Async overhead: ~5-15% slower for CPU-bound work with no I/O

# FastAPI: sync route functions run in a thread pool automatically
# Use sync for pure CPU work, async for I/O-bound
from fastapi import FastAPI
app = FastAPI()

@app.post("/compute")
def compute_route(n: int):  # sync: FastAPI runs this in thread pool
    return {"result": compute_sync(n)}

@app.get("/user/{id}")
async def get_user_route(id: int, db=Depends(get_db)):  # async: awaiting DB
    return await db.get(User, id)

Common async pitfalls

python

import asyncio
import time

# Pitfall 1: time.sleep() in async code — blocks entire event loop
async def bad_sleep():
    time.sleep(5)  # WRONG: blocks event loop for 5 seconds
    # All other requests are frozen

async def good_sleep():
    await asyncio.sleep(5)  # CORRECT: yields to event loop

# Pitfall 2: Calling sync DB driver from async context
import psycopg2  # sync driver

async def bad_db_query():
    conn = psycopg2.connect(...)  # blocks event loop during connection
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users")  # blocks event loop during query

# Fix: use async driver (asyncpg, aiopg, databases)
import asyncpg

async def good_db_query(pool):
    async with pool.acquire() as conn:
        rows = await conn.fetch("SELECT * FROM users")  # non-blocking

# Pitfall 3: Missing await (silent bug)
async def silent_bug():
    result = some_async_function()  # missing await!
    # result is a coroutine object, not the actual result
    # No error raised, just wrong behavior
    print(result)  #

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

You have a FastAPI endpoint that calls a function doing 500ms of CPU work (no I/O). What happens to all other concurrent requests while this runs? How do you fix it?
What is the difference between asyncio.gather and asyncio.wait? When would you choose one over the other?
A developer uses run_in_executor with a ThreadPoolExecutor for a CPU-intensive task. Why does this not help as much as expected in Python? What should they use instead?
FastAPI allows both async def and def route handlers. What does FastAPI do differently for each?
You see result = some_async_fn() (missing await) in a codebase. What is the runtime behavior? Will Python raise an error?

Next up: Part 10: Background Processing. In-process background tasks, Celery architecture, and when each is the right tool.

← PREV

Staff Prep 08: FastAPI Request Lifecycle — From TCP to Response