Selenium is the long-established way to drive a real browser from Python, which makes it a solid choice for scraping JavaScript-heavy sites that a plain HTTP client cannot read. Selenium loads a page in Chrome (or Firefox), runs its JavaScript, and lets you find and extract elements just as a user would see them. This guide covers the full workflow — install, load a page, find elements, wait for dynamic content, run headless, and the part that trips people up most: routing Selenium through an authenticated residential proxy so you do not get blocked.
If the site serves data in the initial HTML, the lighter requests + BeautifulSoup approach is faster. Use Selenium (or Playwright) when you need a real headless browser to render JavaScript.
pip install selenium
Modern Selenium (4.6+) includes Selenium Manager, which downloads the matching browser driver automatically — no more manual chromedriver downloads.
from selenium import webdriver
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
driver.get("https://example.com/products")
for card in driver.find_elements(By.CSS_SELECTOR, "article.product"):
name = card.find_element(By.CSS_SELECTOR, "h3").text
price = card.find_element(By.CSS_SELECTOR, ".price").text
print(name, price)
driver.quit()
find_elements returns a list; find_element returns one. Selectors use the By class — CSS or XPath both work (see our CSS selector cheat sheet and XPath cheat sheet).
Never scrape immediately after get() on a JavaScript site — the data may not be there yet. Use an explicit wait for the element you need instead of a blind sleep:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "article.product"))
)
Explicit waits make scrapers reliable and faster than fixed delays, because they continue the moment the content appears.
This is the step that matters at scale. A no-auth proxy is a one-line flag:
options.add_argument("--proxy-server=http://pr.spyderproxy.com:7777")
But Chrome's flag does not accept a username and password, and commercial proxies are authenticated. The cleanest fix is selenium-wire, which supports proxy credentials directly:
pip install selenium-wire
from seleniumwire import webdriver # selenium-wire's drop-in driver
seleniumwire_options = {
"proxy": {
"http": "http://USER:[email protected]:7777",
"https": "http://USER:[email protected]:7777",
}
}
driver = webdriver.Chrome(seleniumwire_options=seleniumwire_options)
driver.get("https://api.ipify.org")
print(driver.page_source) # shows the residential exit IP
With a rotating residential endpoint, each session can exit from a different real household IP — see rotating proxies and best proxies for web scraping.
Selenium is mature, widely documented, and supports every major browser, which makes it a safe choice for browser-based scraping. For new projects, Playwright is often faster with cleaner waiting; for large static crawls without a browser, use Scrapy. The full comparison is in Puppeteer vs Playwright vs Selenium.
Yes, for JavaScript-heavy sites where you need a real browser to render content. Selenium loads pages in Chrome or Firefox, runs their JavaScript, and lets you extract what a user sees. For static HTML, requests plus BeautifulSoup is lighter and faster; for large crawls, Scrapy scales better.
Chrome's --proxy-server flag does not accept a username and password, so for authenticated commercial proxies use selenium-wire, which takes proxy credentials directly in its options. Pass the http and https proxy URLs with user:pass and Selenium routes through them.
Use an explicit wait: WebDriverWait with expected_conditions to wait for a specific element to appear, rather than a fixed time.sleep. This is both more reliable on JavaScript sites and faster, because it proceeds the instant the element is present.
Usually two reasons: a datacenter IP and visible automation flags. Route through residential proxies, use undetected-chromedriver to hide automation signals, set a current user agent, and pace requests like a human. The IP is the biggest single factor.
It is a drop-in replacement for the standard Chrome driver that patches the most common automation tells (like navigator.webdriver) so a Selenium-controlled browser looks more like a normal one. It helps against anti-bot fingerprinting, but it is not a substitute for residential IPs.
Yes — use the new headless mode (--headless=new) for scraping at scale; it is lighter on resources. A few sites treat headless differently, but detection depends far more on your IP and fingerprint than on whether the browser is headless.
Selenium gives you a real browser in a few lines of Python: install, get the page, wait for the element, extract. The make-or-break details are routing through an authenticated residential proxy (selenium-wire makes that painless) and hiding automation tells so anti-bot systems treat you like a real visitor.
To keep your Selenium scrapers running without bans, SpyderProxy residential proxies start at $1.75/GB with 10M+ IPs across 195+ countries, automatic rotation, and city-level targeting.