spyderproxy

Web Scraping With Selenium and Python: Full Guide + Proxies

D

Daniel K.

|
Published date

Sun May 31 2026

|10 min read

Selenium is the long-established way to drive a real browser from Python, which makes it a solid choice for scraping JavaScript-heavy sites that a plain HTTP client cannot read. Selenium loads a page in Chrome (or Firefox), runs its JavaScript, and lets you find and extract elements just as a user would see them. This guide covers the full workflow — install, load a page, find elements, wait for dynamic content, run headless, and the part that trips people up most: routing Selenium through an authenticated residential proxy so you do not get blocked.

If the site serves data in the initial HTML, the lighter requests + BeautifulSoup approach is faster. Use Selenium (or Playwright) when you need a real headless browser to render JavaScript.

1. Install Selenium

pip install selenium

Modern Selenium (4.6+) includes Selenium Manager, which downloads the matching browser driver automatically — no more manual chromedriver downloads.

2. A Basic Selenium Scrape

from selenium import webdriver
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()
options.add_argument("--headless=new")

driver = webdriver.Chrome(options=options)
driver.get("https://example.com/products")
for card in driver.find_elements(By.CSS_SELECTOR, "article.product"):
    name = card.find_element(By.CSS_SELECTOR, "h3").text
    price = card.find_element(By.CSS_SELECTOR, ".price").text
    print(name, price)
driver.quit()

find_elements returns a list; find_element returns one. Selectors use the By class — CSS or XPath both work (see our CSS selector cheat sheet and XPath cheat sheet).

3. Wait for Dynamic Content

Never scrape immediately after get() on a JavaScript site — the data may not be there yet. Use an explicit wait for the element you need instead of a blind sleep:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

WebDriverWait(driver, 15).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, "article.product"))
)

Explicit waits make scrapers reliable and faster than fixed delays, because they continue the moment the content appears.

4. Route Selenium Through a Residential Proxy

This is the step that matters at scale. A no-auth proxy is a one-line flag:

options.add_argument("--proxy-server=http://pr.spyderproxy.com:7777")

But Chrome's flag does not accept a username and password, and commercial proxies are authenticated. The cleanest fix is selenium-wire, which supports proxy credentials directly:

pip install selenium-wire

from seleniumwire import webdriver  # selenium-wire's drop-in driver

seleniumwire_options = {
    "proxy": {
        "http":  "http://USER:[email protected]:7777",
        "https": "http://USER:[email protected]:7777",
    }
}
driver = webdriver.Chrome(seleniumwire_options=seleniumwire_options)
driver.get("https://api.ipify.org")
print(driver.page_source)  # shows the residential exit IP

With a rotating residential endpoint, each session can exit from a different real household IP — see rotating proxies and best proxies for web scraping.

5. Avoid Detection

  • Residential IPs first. Datacenter IPs are flagged before the page loads; a residential IP is the single biggest factor.
  • Hide automation flags. Vanilla Selenium leaks navigator.webdriver and other tells. The undetected-chromedriver package patches the most common ones automatically.
  • Set a real user agent. See best user agents for scraping.
  • Pace like a human. The full discipline is in how to avoid detection while scraping, and for protected targets see how to bypass Cloudflare.

When to Use Selenium vs the Alternatives

Selenium is mature, widely documented, and supports every major browser, which makes it a safe choice for browser-based scraping. For new projects, Playwright is often faster with cleaner waiting; for large static crawls without a browser, use Scrapy. The full comparison is in Puppeteer vs Playwright vs Selenium.

Frequently Asked Questions

Is Selenium good for web scraping?

Yes, for JavaScript-heavy sites where you need a real browser to render content. Selenium loads pages in Chrome or Firefox, runs their JavaScript, and lets you extract what a user sees. For static HTML, requests plus BeautifulSoup is lighter and faster; for large crawls, Scrapy scales better.

How do I use an authenticated proxy with Selenium?

Chrome's --proxy-server flag does not accept a username and password, so for authenticated commercial proxies use selenium-wire, which takes proxy credentials directly in its options. Pass the http and https proxy URLs with user:pass and Selenium routes through them.

How do I make Selenium wait for content to load?

Use an explicit wait: WebDriverWait with expected_conditions to wait for a specific element to appear, rather than a fixed time.sleep. This is both more reliable on JavaScript sites and faster, because it proceeds the instant the element is present.

Why does my Selenium scraper get blocked?

Usually two reasons: a datacenter IP and visible automation flags. Route through residential proxies, use undetected-chromedriver to hide automation signals, set a current user agent, and pace requests like a human. The IP is the biggest single factor.

What is undetected-chromedriver?

It is a drop-in replacement for the standard Chrome driver that patches the most common automation tells (like navigator.webdriver) so a Selenium-controlled browser looks more like a normal one. It helps against anti-bot fingerprinting, but it is not a substitute for residential IPs.

Can I run Selenium headless for scraping?

Yes — use the new headless mode (--headless=new) for scraping at scale; it is lighter on resources. A few sites treat headless differently, but detection depends far more on your IP and fingerprint than on whether the browser is headless.

Conclusion

Selenium gives you a real browser in a few lines of Python: install, get the page, wait for the element, extract. The make-or-break details are routing through an authenticated residential proxy (selenium-wire makes that painless) and hiding automation tells so anti-bot systems treat you like a real visitor.

To keep your Selenium scrapers running without bans, SpyderProxy residential proxies start at $1.75/GB with 10M+ IPs across 195+ countries, automatic rotation, and city-level targeting.

Selenium Needs a Residential IP Layer

selenium-wire handles the auth; SpyderProxy handles the IPs. Residential proxies from $1.75/GB — 10M+ IPs, 195+ countries, rotation, and city-level targeting.