Playwright is a modern browser-automation library from Microsoft that has become one of the best tools for scraping JavaScript-heavy websites. Unlike a plain HTTP client, Playwright drives a real browser (Chromium, Firefox, or WebKit), so it executes JavaScript, renders single-page apps, and sees exactly what a human visitor sees. This guide shows how to scrape with Playwright in Python — install it, launch a browser, wait for dynamic content, extract data, and route everything through residential proxies so you do not get blocked.
If your target ships data in the initial HTML, a lighter stack is fine — see how to build a web scraper in Python. Reach for Playwright when the content loads via JavaScript and a headless browser is required.
pip install playwright
playwright install chromium
The second command downloads the browser binaries Playwright drives. You can install all browsers or just the one you need (Chromium is the usual choice for scraping).
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://example.com/products", wait_until="networkidle")
for card in page.query_selector_all("article.product"):
name = card.query_selector("h3").inner_text()
price = card.query_selector(".price").inner_text()
print(name, price)
browser.close()
The key difference from requests is that page.goto loads and renders the page like a browser. wait_until="networkidle" waits until network activity settles, so JavaScript-injected content is present before you read it.
Single-page apps often load data after the initial render. Do not guess with fixed sleeps — wait for the element you actually need:
# wait for a specific element before extracting
page.wait_for_selector("article.product", timeout=15000)
# or wait for a network response (e.g., an API call the page makes)
with page.expect_response(lambda r: "/api/products" in r.url) as resp:
page.goto("https://example.com/products")
data = resp.value.json()
Waiting for the right signal makes scrapers far more reliable than arbitrary delays, and it is faster too.
Run a real crawl and the target will rate-limit or block your IP. Playwright takes a proxy at launch, so every request from that browser exits through a residential IP:
from playwright.sync_api import sync_playwright
PROXY = {
"server": "http://pr.spyderproxy.com:7777",
"username": "USER",
"password": "PASS",
}
with sync_playwright() as p:
browser = p.chromium.launch(headless=True, proxy=PROXY)
page = browser.new_page()
page.goto("https://example.com", wait_until="networkidle")
print(page.title())
browser.close()
Because the proxy endpoint rotates the exit IP, you can spread a crawl across thousands of residential addresses without managing a proxy list yourself. See rotating proxies for the concept and best proxies for web scraping for picking the right pool.
A residential IP is the foundation, but a headless browser still has tells. Reduce them:
For heavily protected targets, combine all of the above — see how to bypass Cloudflare.
Pick the tool by the job: Playwright and Selenium drive real browsers for JavaScript-heavy sites (Playwright is newer, faster, and has cleaner async support); Scrapy is best for large static crawls without a browser. The full three-way comparison is in Puppeteer vs Playwright vs Selenium.
Yes, especially for JavaScript-heavy and single-page sites. Playwright drives a real browser, so it renders content that a plain HTTP client cannot see, and it has reliable waiting, multiple browser engines, and clean Python and Node APIs. For static HTML at scale, a lighter tool like Scrapy is faster.
Pass a proxy dict to launch: set server, username, and password. Every request from that browser then exits through the proxy. With a rotating residential endpoint, the exit IP changes automatically, so you do not manage a proxy list in code.
Yes, if you do not harden it. A headless browser exposes automation flags and a datacenter IP is an instant giveaway. Route through residential proxies, apply stealth patches, set a real user agent, and pace requests like a human to stay undetected.
Headless is standard for scraping at scale because it uses fewer resources. Headed mode is useful for debugging and for the rare site that treats headless differently. Either way, anti-detection depends far more on your IP and fingerprint than on the headless flag.
Both drive real browsers. Playwright is newer, generally faster, has built-in waiting and network interception, and a cleaner async API, which makes it the better default for new scraping projects. Selenium has a larger ecosystem and longer history. The choice rarely affects whether you get blocked — proxies and fingerprints do.
Yes. Playwright can scroll, click, fill forms, and wait for resulting content, which is exactly what infinite-scroll and interactive pages need. Use wait_for_selector or expect_response after each action so you extract only once the new content has loaded.
Playwright is the modern answer to scraping JavaScript-heavy sites: install it, launch a browser, wait for the content you need, and extract. What turns a working script into one that survives real websites is routing through residential proxies and hardening the browser so anti-bot systems treat it like an ordinary visitor.
To keep your Playwright crawls unblocked, SpyderProxy residential proxies start at $1.75/GB with 10M+ IPs across 195+ countries, automatic rotation, and city-level targeting — drop the endpoint into your launch config and scale.