Quick verdict: For headless browsers in Python in 2026, default to Playwright — cleanest API, best maintained, supports Chromium + Firefox + WebKit. Selenium remains the legacy choice for large existing codebases. Pyppeteer (Python port of Node's Puppeteer) is largely abandoned — skip it. This tutorial: install Playwright, first scrape, with proxies, stealth mode, async pattern, and the gotchas you will hit.
pip install playwright
python -m playwright install chromiumThe second command downloads the actual browser binary (~150 MB for Chromium). For Firefox or WebKit: python -m playwright install firefox or python -m playwright install webkit.
On Linux servers, also install system dependencies:
python -m playwright install-deps chromiumfrom playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://example.com")
print(page.title())
print(page.content()[:500])
browser.close()That is a working headless scrape in 7 lines. page.content() returns the full HTML after JavaScript execution completes. page.title() returns the document title.
import asyncio
from playwright.async_api import async_playwright
async def scrape(url):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(url)
title = await page.title()
html = await page.content()
await browser.close()
return title, html
async def main():
urls = ["https://example.com", "https://playwright.dev"]
results = await asyncio.gather(*(scrape(u) for u in urls))
for t, h in results:
print(t)
asyncio.run(main())Spawns a fresh browser per URL. For higher concurrency with one browser and multiple pages, see below.
The most common bug: scraping before JavaScript finishes loading content. Three wait strategies:
# 1. Wait for a specific selector
page.goto("https://target.com")
page.wait_for_selector("div.products-loaded")
products = page.query_selector_all("div.product")
# 2. Wait for network idle (no requests for 500ms)
page.goto("https://target.com", wait_until="networkidle")
# 3. Wait for a fixed time (last resort)
import time; time.sleep(3)Prefer (1) over (2) over (3). Selector waits are most reliable; network idle works on JS-heavy sites; fixed sleeps are fragile.
page.goto("https://target.com/products")
page.wait_for_selector("li.product")
# Multiple elements
cards = page.query_selector_all("li.product")
products = []
for c in cards:
title = c.query_selector("h3").inner_text()
price = c.query_selector("span.price").inner_text()
url = c.query_selector("a").get_attribute("href")
products.append({"title": title, "price": price, "url": url})
# Single element
h1 = page.query_selector("h1.page-title").inner_text()
# Multiple text contents (faster)
all_titles = page.locator("h3.title").all_text_contents()For CSS selectors, see CSS selector cheat sheet. Playwright also supports text= filter: page.locator("button", has_text="Submit").
browser = p.chromium.launch(
headless=True,
proxy={
"server": "http://gw.spyderproxy.com:8000",
"username": "YOUR_USER",
"password": "YOUR_PASS",
},
)Proxy is set at browser launch — all requests from that browser route through it. For rotating IPs, launch a fresh browser per scrape or use sticky-session syntax:
import random
def fresh_browser(p):
sid = random.randint(0, 100000)
return p.chromium.launch(
headless=True,
proxy={
"server": "http://gw.spyderproxy.com:8000",
"username": f"YOUR_USER-session-{sid}",
"password": "YOUR_PASS",
},
)Each session-{sid} gives a fresh sticky-session IP that stays consistent for up to 8 hours. Premium Residential ($2.75/GB) or LTE Mobile ($2/IP) are recommended for sites that would justify needing a headless browser at all.
Default Playwright leaves several headless tells: navigator.webdriver = true, missing browser-features like permissions.query, etc. Sites that fingerprint will flag you. Install the community stealth plugin:
pip install playwright-stealthfrom playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context()
page = context.new_page()
stealth_sync(page) # patches detection signals
page.goto("https://target.com")
print(page.title())
browser.close()Stealth handles ~80% of common detection checks. For Cloudflare Turnstile and DataDome, you also need fresh residential IPs and slow-down delays. See FlareSolverr guide for the heaviest cases.
# Full page screenshot
page.goto("https://target.com")
page.screenshot(path="screen.png", full_page=True)
# Just a specific element
el = page.query_selector("div.main-content")
el.screenshot(path="content.png")
# PDF (Chromium only)
page.pdf(path="page.pdf", format="A4")page.goto("https://example.com/login")
page.fill("input[name=\"username\"]", "alice")
page.fill("input[name=\"password\"]", "secret123")
page.click("button[type=\"submit\"]")
page.wait_for_url("**/dashboard")The ** in wait_for_url is a glob; it matches any URL ending in /dashboard. Use for asynchronous form submissions where the redirect URL varies.
import asyncio
from playwright.async_api import async_playwright
async def scrape_url(browser, url):
page = await browser.new_page()
await page.goto(url)
title = await page.title()
await page.close()
return title
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
urls = [f"https://example.com/p/{i}" for i in range(20)]
sem = asyncio.Semaphore(5) # max 5 concurrent
async def bounded(u):
async with sem:
return await scrape_url(browser, u)
results = await asyncio.gather(*(bounded(u) for u in urls))
await browser.close()
print(results)
asyncio.run(main())One browser, multiple pages, capped concurrency. More memory-efficient than launching N browsers, but pages share cookies/storage — use browser.new_context() per scrape if you need full isolation.
with or try/finally.page.goto(url, timeout=60000).navigator.webdriver is true. Install playwright-stealth.headless=True (the default in 1.x). Avoid headless=False on CI.For greenfield Python projects in 2026, Playwright wins. Selenium's only remaining advantage is the install base.
Related: What is a headless browser, Puppeteer vs Playwright vs Selenium, Cheerio vs Puppeteer.