When should I use asyncio over threading?

Use asyncio for I/O-bound work (web scraping, API calls, database queries). Threading also works but has GIL overhead and limits to ~50-100 threads practically. asyncio scales to thousands of concurrent operations on a single thread, much lower memory. Use multiprocessing for CPU-bound work (parsing, computation) — asyncio doesn't help there.

What's the difference between aiohttp and httpx for async scraping?

aiohttp is older, more battle-tested, faster for raw throughput. httpx is newer with a cleaner API and identical sync/async interfaces (lets you share code between blocking and non-blocking parts). Both work fine. Pick aiohttp for maximum performance, httpx for code consistency with sync code.

How do I limit concurrency in asyncio?

Use asyncio.Semaphore(N) — wrap your request in 'async with semaphore:' and N coroutines can be in that block at once; others wait. For complex limits (per-host, per-domain), use libraries like aiometer or build a custom rate limiter. Always combine with await asyncio.sleep(delay) for polite scraping.

Why does my asyncio code run sequentially?

Most likely you used await on each call individually instead of asyncio.gather(). 'result = await fetch(u); for u in urls' runs sequentially. 'results = await asyncio.gather(*(fetch(u) for u in urls))' runs concurrently. Also check that you're not using sync libraries (requests, time.sleep) which block the event loop.

How do I use rotating proxies with aiohttp?

Pass proxy= parameter per session.get(). For rotating, generate a fresh proxy URL per request: 'session.get(url, proxy=f"http://USER-session-{random.randint(0,99999)}:PASS@gw.spyderproxy.com:8000")'. Each call uses a different sticky-session IP.

What's the maximum concurrency I should use?

Depends on the target. Public APIs: 5-10 concurrent. Sites with anti-bot (Cloudflare, DataDome): 1-3 with rotating proxies. Your own infrastructure: 100+. The bottleneck is usually the server side, not your asyncio code. Start at 10 and tune up while watching for 429/503 responses.

Can I mix asyncio with requests library?

Not directly — requests is blocking and freezes the event loop. Either: 1) switch to httpx (drop-in async replacement), 2) use aiohttp (more async-native), 3) wrap requests calls in asyncio.to_thread() to run them in a thread pool (slower but works as a quick fix).

How do I handle exceptions in asyncio.gather?

Use asyncio.gather(*coros, return_exceptions=True) — exceptions become regular return values instead of raising. Then iterate the results and check 'if isinstance(r, Exception)'. Without return_exceptions=True, the first exception cancels all sibling coroutines and propagates up.

Python asyncio Tutorial: Async Web Scraping (2026)

Alex R.

Sun May 10 2026

Quick verdict: Python asyncio is built for I/O-bound concurrency — the perfect tool for web scraping, where most time is spent waiting for the network. Replacing 100 sequential requests.get calls with asyncio.gather + aiohttp typically goes from ~30 seconds to ~1 second. The core primitives: async def to define coroutines, await to wait for one, asyncio.gather() to run many concurrently, asyncio.Semaphore to cap concurrency for rate limiting.

Why asyncio for Scraping

A web request spends 99% of its time waiting:

DNS lookup: ~20ms
TCP handshake: ~30ms
TLS handshake: ~50ms
Server processing: ~100-500ms
Body transfer: ~100ms
Total: ~300-700ms of which your CPU does ~2ms of work

Synchronous code wastes the wait time. asyncio uses it to start other requests. For I/O-bound work, expect 50-100x speedup over sequential. (For CPU-bound work, asyncio does nothing — use multiprocessing.)

The Basics: async/await

import asyncio

async def hello(name):
    print(f"hi {name}, waiting...")
    await asyncio.sleep(1)        # simulates I/O
    print(f"done with {name}")
    return f"result-{name}"

async def main():
    # Run three coroutines concurrently
    results = await asyncio.gather(
        hello("alice"),
        hello("bob"),
        hello("carol"),
    )
    print(results)

asyncio.run(main())

Output:

hi alice, waiting...
hi bob, waiting...
hi carol, waiting...
# (1 second passes — all three run concurrently)
done with alice
done with bob
done with carol
['result-alice', 'result-bob', 'result-carol']

Total wall time: ~1 second (not 3). The three coroutines wait concurrently because await asyncio.sleep(1) yields control to the event loop, which picks up the next ready coroutine.

aiohttp: Async HTTP Client

pip install aiohttp

import asyncio, aiohttp

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = [f"https://example.com/page/{i}" for i in range(100)]
    async with aiohttp.ClientSession() as session:
        # All 100 requests in flight at once
        results = await asyncio.gather(*(fetch(session, u) for u in urls))
    print(f"Got {len(results)} pages")

asyncio.run(main())

100 requests in ~1 second on a decent connection. Compare to sequential requests: ~30 seconds. aiohttp.ClientSession() must be used inside an async context — reusing the session pools connections, which is critical for performance.

httpx: Sync + Async in One Library

httpx is a drop-in replacement for requests with async support:

import asyncio, httpx

async def fetch(client, url):
    r = await client.get(url)
    return r.text

async def main():
    urls = [f"https://example.com/page/{i}" for i in range(100)]
    async with httpx.AsyncClient() as client:
        results = await asyncio.gather(*(fetch(client, u) for u in urls))
    return results

asyncio.run(main())

API choice: aiohttp is older and more battle-tested. httpx is newer with a cleaner API and matching sync/async. Both work; pick the one that matches the rest of your stack.

Rate Limiting With Semaphore

Launching 1,000 requests at once will get your IP banned. Cap concurrency with asyncio.Semaphore:

import asyncio, aiohttp

semaphore = asyncio.Semaphore(10)   # max 10 concurrent requests

async def fetch(session, url):
    async with semaphore:           # acquire slot
        async with session.get(url) as response:
            await asyncio.sleep(0.5)  # polite delay
            return await response.text()

async def main():
    urls = [f"https://example.com/page/{i}" for i in range(1000)]
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(*(fetch(session, u) for u in urls))
    return results

asyncio.run(main())

The semaphore lets only 10 coroutines past at once. Others wait at async with semaphore: until a slot frees up. 1,000 URLs at 10 concurrency + 0.5s delay completes in ~50 seconds vs ~500 seconds sequential.

Error Handling: Some Will Fail

Use return_exceptions=True with gather so one failure does not kill the batch:

results = await asyncio.gather(
    *(fetch(session, u) for u in urls),
    return_exceptions=True,         # exceptions become results, not raises
)

for url, r in zip(urls, results):
    if isinstance(r, Exception):
        print(f"FAIL {url}: {r}")
    else:
        print(f"OK   {url}: {len(r)} bytes")

Or wrap each fetch in try/except for inline retry logic:

async def fetch_with_retry(session, url, max_retries=3):
    for attempt in range(max_retries):
        try:
            async with session.get(url, timeout=10) as r:
                if r.status == 200:
                    return await r.text()
                if r.status in (429, 503):
                    await asyncio.sleep(2 ** attempt)
                    continue
                r.raise_for_status()
        except (aiohttp.ClientError, asyncio.TimeoutError):
            if attempt == max_retries - 1:
                raise
            await asyncio.sleep(2 ** attempt)

Rotating Proxies in asyncio

aiohttp accepts a proxy per request:

import asyncio, aiohttp, random

def fresh_proxy():
    """SpyderProxy sticky session: fresh IP per session_id."""
    sid = random.randint(0, 100000)
    return f"http://USER-session-{sid}:[email protected]:8000"

async def fetch(session, url):
    async with session.get(url, proxy=fresh_proxy(), timeout=15) as r:
        return await r.text()

async def main():
    urls = [f"https://target.com/item/{i}" for i in range(100)]
    sem = asyncio.Semaphore(10)
    async def bounded(u):
        async with sem:
            return await fetch(session, u)
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(*(bounded(u) for u in urls), return_exceptions=True)
    return results

asyncio.run(main())

Each request gets a fresh sticky-session IP from Premium Residential. 100 URLs through 100 different IPs, all in flight at once (capped at 10 concurrent by the semaphore).

Common Patterns

Producer-consumer (queue-based)

import asyncio, aiohttp

async def producer(queue, urls):
    for url in urls:
        await queue.put(url)
    for _ in range(NUM_WORKERS):
        await queue.put(None)   # sentinel to stop workers

async def worker(queue, session, results):
    while True:
        url = await queue.get()
        if url is None:
            break
        async with session.get(url) as r:
            results.append(await r.text())

NUM_WORKERS = 20
async def main(urls):
    queue = asyncio.Queue(maxsize=100)
    results = []
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            producer(queue, urls),
            *(worker(queue, session, results) for _ in range(NUM_WORKERS)),
        )
    return results

Better for unknown-size streams or when each request might enqueue more URLs (crawling).

Common Gotchas

Calling sync code blocks the event loop. Never use requests.get, time.sleep, or any blocking I/O inside an async function. Use aiohttp/httpx and asyncio.sleep.
"RuntimeWarning: coroutine was never awaited" — you called an async function without await. Fix: await coro() or asyncio.gather(coro()).
One slow URL hangs the batch. Always set timeouts. aiohttp.ClientTimeout(total=30) per session, or timeout= per request.
SSL errors with aiohttp. Pass connector=aiohttp.TCPConnector(ssl=False) to skip cert verification (only for testing — see why ignoring SSL is dangerous).
Cannot mix asyncio with multiprocessing easily. If you need both CPU and I/O concurrency, use asyncio.run_in_executor(None, cpu_bound_fn) to offload CPU work to a thread/process pool.

When NOT to Use asyncio

Pure CPU-bound work — use multiprocessing instead.
Few requests at low concurrency — the async complexity is not worth it for <20 URLs.
You need a sync interface elsewhere — using async-only libraries leaks asyncio.run() calls everywhere.
Selenium / Playwright is needed — Playwright has its own async API; do not try to wrap Selenium's sync API in asyncio.