Concurrency vs Parallelism for Web Scraping

Daniel K.

Sat Apr 25 2026

|12 min read

Concurrency vs parallelism is the most-confused pair of terms in software engineering. They are not the same. Concurrency is about structure — designing a program so multiple tasks can be in progress at the same time, interleaved on one or more CPUs. Parallelism is about execution — actually running those tasks at the literal same instant on multiple CPU cores. Concurrency without parallelism is what async I/O gives you. Parallelism without concurrency is what SIMD and GPU shaders give you. The two compose: a well-designed concurrent program scales naturally to parallel hardware. This guide explains the distinction precisely, walks through the implementations in Python (asyncio, threads, multiprocessing), Node.js (event loop, worker threads), and Go (goroutines), and shows how to apply each pattern to web scraping with rotating residential proxies.

If you finish this guide remembering one thing: concurrency is dealing with many things at once; parallelism is doing many things at once. The phrase is from Rob Pike's 2012 talk and remains the cleanest one-liner on the topic.

Quick Definitions

Term	Definition	Example
Concurrency	Structuring a program as multiple independent tasks that can be paused and resumed.	asyncio event loop with 100 in-flight HTTP requests on one CPU.
Parallelism	Executing multiple tasks simultaneously on multiple CPU cores.	multiprocessing pool spawning 8 worker processes that all crunch CPU.
Asynchrony	An execution model where a task yields control on I/O and resumes when ready.	JavaScript `await fetch(url)` — the function pauses, the event loop runs other code.
Multithreading	Multiple OS threads inside one process, sharing memory.	Python `threading.Thread` running 50 worker threads.
Multiprocessing	Multiple OS processes, each with its own memory space.	Python `multiprocessing.Pool` with 8 workers.

Concurrency Without Parallelism

You can be concurrent without ever being parallel. JavaScript in the browser is the canonical example: a single-threaded event loop interleaves hundreds of in-flight network requests, timers, and DOM events. None of them run simultaneously — only one stack frame executes at a time — but the system is designed so any task can yield and let another make progress. The user-visible result is that the page stays responsive while 50 image downloads finish in the background.

This is the right model for I/O-bound work. A web scraper waiting on 1,000 HTTP responses spends 99.9% of its wall-clock time idle. Adding more CPU cores does not help — there is nothing to compute. Adding more concurrent in-flight requests does, because each idle slot can be filled with another pending response. asyncio in Python and the Node.js event loop both implement this pattern.

Parallelism Without Concurrency

The mirror image is parallelism without concurrency: SIMD instructions on a CPU, GPU shaders, vectorized NumPy operations. The program is structured as a single linear sequence — there is no notion of independent tasks — but at the silicon level, each operation runs on many lanes in parallel. The program is parallel without being concurrent because there is no scheduler choosing what runs next; the structure is fixed at compile time.

This is the right model for compute-bound work where you have a fixed pipeline (matrix multiply, image filter, hash function applied to N bytes) and want to push raw throughput. It does not apply to scraping.

Concurrency and Parallelism Together

Most real systems combine both. A Python scraper with asyncio and a multiprocessing pool runs N processes (parallelism), and inside each process the asyncio event loop juggles M concurrent in-flight requests. The total in-flight count is N×M. For a 16-core machine running 16 processes with 64 concurrent requests each, that is 1,024 simultaneous HTTP requests in flight — and this is a routine target shape for production scraping with rotating residential proxies.

Concurrency and Parallelism in Python

Python is the language where the distinction matters most because the GIL (Global Interpreter Lock) shapes everything.

asyncio (concurrency, no parallelism)

Single-threaded event loop. Best for I/O-bound work like HTTP scraping, database queries, file I/O. Add SpyderProxy proxies via aiohttp:

import asyncio
import aiohttp

PROXY = "http://USER:[email protected]:7777"

async def fetch(session, url):
    async with session.get(url, proxy=PROXY) as r:
        return await r.text()

async def main():
    urls = [f"https://httpbin.org/ip?n={i}" for i in range(100)]
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(*[fetch(session, u) for u in urls])
    print(len(results), "responses")

asyncio.run(main())

One Python process. Hundreds of in-flight requests. Total memory: ~50 MB. This is the right tool for 90% of scraping workloads.

threading (concurrency, no parallelism for CPU work)

Threads in Python share memory but the GIL ensures only one thread executes Python bytecode at a time. For I/O-bound work this is fine — the GIL is released during I/O syscalls — but for CPU work, threading provides zero speedup:

import requests
from concurrent.futures import ThreadPoolExecutor

PROXIES = {"http": "http://USER:[email protected]:7777",
           "https": "http://USER:[email protected]:7777"}

def fetch(url):
    return requests.get(url, proxies=PROXIES, timeout=10).text

with ThreadPoolExecutor(max_workers=64) as ex:
    results = list(ex.map(fetch, urls))

Use threading when async libraries are not available for your I/O code path (rare in 2026).

multiprocessing (real parallelism)

Each worker is an independent OS process with its own GIL. Use for CPU-bound work — parsing large HTML, regex extraction across many GB of text, image processing on scraped media. Memory cost is high (each process holds a full Python interpreter):

from multiprocessing import Pool

def parse(html):
    # CPU-bound: BeautifulSoup parse, regex extract
    return extract_fields(html)

with Pool(processes=8) as pool:
    parsed = pool.map(parse, html_pages)

Combined (asyncio inside multiprocessing)

The serious scraping pattern: N processes (parallelism), each running asyncio (concurrency):

from multiprocessing import Pool
import asyncio

def worker(url_chunk):
    return asyncio.run(scrape_chunk(url_chunk))  # asyncio inside

if __name__ == "__main__":
    chunks = split_into_chunks(all_urls, n=8)
    with Pool(processes=8) as pool:
        results = pool.map(worker, chunks)

This is the topology most production scrapers use behind SpyderProxy Premium Residential at $2.75/GB.

Concurrency and Parallelism in Node.js

Node has one main event loop per process — the canonical concurrent runtime — and adds parallelism via worker threads (CPU work) or cluster mode (multiple processes binding to the same port).

Event loop (concurrency)

Native to the runtime. Use Promise.all() or async iteration to fan out hundreds of HTTP requests on one core:

const urls = Array.from({ length: 100 }, (_, i) => `https://httpbin.org/ip?n=${i}`);

const results = await Promise.all(
  urls.map((u) => fetch(u, { dispatcher: spyderproxyAgent }))
);

worker_threads (parallelism for CPU work)

Run CPU-bound work (heavy JSON parse, crypto, image work) on background threads:

const { Worker } = require("worker_threads");
const w = new Worker("./parser.js", { workerData: { html } });
w.on("message", (parsed) => { /* ... */ });

cluster (process-level parallelism)

Spawn N child processes that share an inbound socket. Standard for scaling Node web servers across cores; less common for scrapers (usually superseded by spawning multiple processes via a higher-level orchestrator like PM2 or BullMQ).

Concurrency and Parallelism in Go

Go was designed around the distinction. Goroutines are lightweight tasks scheduled by the Go runtime onto a pool of OS threads. By default, Go uses GOMAXPROCS equal to the number of cores, so goroutines naturally run in parallel when CPU is available, and yield to other goroutines on I/O. This is why Go is so popular for high-throughput scrapers and proxies:

import "sync"

var wg sync.WaitGroup
sem := make(chan struct{}, 64)  // limit concurrency

for _, url := range urls {
    wg.Add(1)
    sem <- struct{}{}
    go func(u string) {
        defer wg.Done()
        defer func() { <-sem }()
        fetch(u)
    }(url)
}
wg.Wait()

Goroutines start at 2 KB of stack and grow as needed; you can spawn 100,000 of them on a laptop. Channels handle the structured-concurrency patterns asyncio handles in Python.

Decision Table: Which Model for Which Workload

Workload	Right Model	Tools
Web scraping, API polling, HTTP fan-out	Concurrency (async)	Python asyncio + aiohttp, Node Fetch, Go goroutines
HTML/JSON parsing of GB-scale dumps	Parallelism (processes)	Python multiprocessing, Node worker_threads
Image / video processing	Parallelism (processes or GPU)	multiprocessing, OpenCV, ffmpeg-pipe
Real-time chat servers (10K+ connections)	Concurrency (async event loop)	Node, Go, Tokio (Rust), asyncio
ML training / inference at scale	Parallelism (GPU + multi-node)	PyTorch DDP, JAX pjit, vllm
Batch ETL with mixed I/O and CPU	Both (asyncio inside multiprocessing)	Python: `asyncio.run()` in pool workers
Cron jobs with one task	Neither	Plain synchronous code is fine
SIMD math kernels	Parallelism (no concurrency)	NumPy, BLAS, SSE/AVX intrinsics

Concurrency, Parallelism, and Proxy Rotation

The reason scrapers care about this distinction: residential proxy economics scale with concurrency, not just request count. SpyderProxy Premium Residential at $2.75/GB allows unlimited concurrent sessions out of one set of credentials, but the residential IP you get per-request rotates by default. Common patterns:

Pure concurrency, fully rotating: 200 in-flight asyncio requests, every request gets a fresh residential IP. Best for stateless GET-heavy scraping (Amazon product pages, SERP scraping). Low ban risk per IP, high throughput.
Concurrency with sticky sessions: Use the session-XXX username flag to bind a residential IP for up to 24 hours. Useful for multi-step flows (login → search → checkout). Lower throughput per session but higher per-account success rate.
Parallelism per geography: 1 process per country, each running asyncio with country-locked username flags. Simplifies geo-targeted scraping and isolates failures per region.
Static IP per worker (account work): Use Static Residential at $3.90/day, one IP bound per long-running worker process. Best for account farms, Mercari, Ticketmaster queue camping, and any session that needs hours of IP stability.
Mobile carrier IPs for the hardest targets: LTE Mobile at $2/IP, paired one-process-per-IP. Lowest concurrency per IP but highest trust score on Instagram, TikTok, Mercari, PayPay.

Anti-Patterns and Common Mistakes

Using threads for CPU work in Python — the GIL prevents speedup. Use multiprocessing.
Spawning unlimited concurrent requests — your kernel runs out of file descriptors at ~1024 by default. Use a semaphore or async limiter.
Sharing mutable state across processes without IPC — multiprocessing has separate memory spaces; use Queue, Pipe, or shared memory primitives.
Blocking inside an async function — calling requests.get() inside an async function blocks the entire event loop. Use aiohttp or httpx.AsyncClient instead.
One Python process for everything — for CPU-heavy parsing, you are leaving 7 cores idle. Multiprocessing or Rust extension is the answer.
Single TCP connection for all requests — even with concurrency, if your HTTP client serializes through one socket you bottleneck on it. aiohttp.TCPConnector(limit=128) or httpx.Limits.
Confusing parallelism with speedup — parallelism only helps when work is CPU-bound. For I/O-bound scraping, concurrency alone gets you 100× before parallelism adds anything.

Real Numbers: 1,000-URL Scrape

Same 1,000 GET requests through SpyderProxy Premium Residential. Median target latency 250 ms.

Approach	Wall Time	CPU Used	Memory
Synchronous (1 thread, no concurrency)	4 min 23 s	3%	34 MB
ThreadPoolExecutor, 32 workers	14.2 s	11%	87 MB
asyncio + aiohttp, 128 concurrent	9.1 s	16%	54 MB
multiprocessing×8 + asyncio×64 each	2.4 s	87%	410 MB

The 4-min synchronous baseline collapses to 9 seconds with concurrency alone. Adding parallelism on top gets it to 2.4 seconds — but for I/O-bound work, the diminishing returns are obvious. For CPU-heavy parsing on top of fetching, the parallelism payoff is much larger.

For HTTP library choice in Node.js scraping, see our Axios vs Fetch API comparison. For Python-specific rotating proxy patterns, see rotating proxies with Python requests. For headless-browser concurrency patterns (Playwright, Puppeteer), see Puppeteer vs Playwright vs Selenium. For full scraping-stack overview see best proxies for web scraping. For the underlying protocol details when scaling concurrent connections, see SOCKS5 vs HTTP proxies.

Frequently Asked Questions

What is the difference between concurrency and parallelism?

Concurrency is the structure of a program — designing it so multiple tasks can be in progress at the same time, interleaved on one or more CPUs. Parallelism is the execution — actually running tasks at the literal same instant on multiple cores. Concurrency is about dealing with many things at once; parallelism is about doing many things at once. They compose: a concurrent program scales naturally to parallel hardware.

Is asyncio concurrent or parallel?

asyncio is concurrent but not parallel. The Python asyncio event loop runs on a single thread and interleaves coroutines as they yield on I/O. It scales to hundreds of in-flight requests on one core. To get parallelism, run multiple asyncio event loops in separate processes via multiprocessing.

Does Python have true parallelism?

Yes, via multiprocessing. Each Python process has its own GIL and runs in parallel on separate CPU cores. Threading in CPython does not provide CPU parallelism because the GIL serializes Python bytecode execution. Python 3.13 introduced an experimental no-GIL build, and Python 3.14+ continues to evolve this, but multiprocessing remains the reliable answer in 2026.

What is the GIL and why does it matter?

The Global Interpreter Lock is a mutex in CPython that ensures only one thread executes Python bytecode at a time. It exists to make CPython's memory management thread-safe. The GIL means CPU-bound Python threads do not run in parallel; for CPU work, use multiprocessing or compiled extensions (NumPy, Cython, Rust). For I/O-bound work, the GIL releases during syscalls and threading or asyncio works fine.

Is Node.js concurrent or parallel?

Node.js is concurrent on the main event loop and parallel via worker threads or cluster mode. The event loop interleaves async I/O on one thread (concurrency). worker_threads run CPU-bound JavaScript on background OS threads (parallelism). cluster mode spawns multiple Node processes that share a port (parallelism).

How does Go handle concurrency?

Go uses goroutines — lightweight tasks scheduled by the Go runtime onto a pool of OS threads sized to GOMAXPROCS (default: number of cores). Goroutines start at 2 KB stack, can spawn by the hundreds of thousands, and naturally use parallelism when CPU is available. Channels coordinate goroutines without explicit locks.

Should I use threads, async, or multiprocessing for web scraping?

For pure I/O-bound HTTP scraping, asyncio is the right choice in Python (or native Fetch in Node, or goroutines in Go). For CPU-heavy parsing on top of fetching, layer asyncio inside multiprocessing — N processes, each running an asyncio event loop. Threading is rarely the right answer in 2026; async libraries cover almost every I/O case.

How does concurrency interact with rotating proxy services?

Concurrency is the lever you pull to spend proxy bandwidth efficiently. SpyderProxy Premium Residential supports unlimited concurrent sessions per credential, so 200 simultaneous in-flight requests on one set of auth is normal. For sticky sessions (account work), pin one IP per worker via the session-XXX username flag. For LTE Mobile at $2/IP, scale parallelism (one process per IP) rather than concurrency per IP.

What is the practical difference for a web scraping project?

For 99% of scrapers, concurrency alone (asyncio + aiohttp, or Node Fetch) is the right answer. It collapses a 4-minute synchronous scrape to 10 seconds with no parallelism cost. Add multiprocessing on top only when CPU-heavy parsing or extraction becomes the bottleneck after fetching is fast.

Conclusion

Concurrency vs parallelism is not a contest — they solve different problems. Concurrency is how you structure programs that wait on many things; parallelism is how you execute many things at once. For I/O-heavy work like web scraping, concurrency on a single CPU is overwhelmingly the right starting point. For CPU-heavy work, parallelism is the lever. The most powerful real-world architecture combines both: N parallel processes, each running a concurrent event loop.

Whichever model fits your workload, pair it with the right proxy. Start with SpyderProxy Premium Residential at $2.75/GB or Budget Residential at $1.75/GB for general scraping concurrency, scale up to Static Residential at $3.90/day for parallel account workers, or LTE Mobile at $2/IP for the hardest mobile-only targets.

Scale Your Concurrent Scrapers With Real Residential IPs

Unlimited concurrent sessions across 130M+ residential IPs. SpyderProxy Premium Residential from $2.75/GB, Budget Residential from $1.75/GB, static ISP from $3.90/day, LTE mobile at $2/IP. SOCKS5 included.