spyderproxy

How to Scrape Glassdoor (2026): Reviews, Salaries, Jobs

A

Alex R.

|
Published date

Sun May 10 2026

Quick verdict: Glassdoor uses DataDome at the edge + a soft login wall on full reviews/salaries. Public preview data (top of page) is scrape-able with rotating residential proxies + Playwright. Full review text and detailed salary data require an account. Use Playwright (not requests) because Glassdoor is heavily JS-rendered. Plan for ~30-40% block rate without LTE mobile proxies.

What You Can Scrape

Data typeLogin needed?Difficulty
Job listings (search results)NoMedium — DataDome on /Job/
Company overview (size, industry, HQ)NoEasy
Average salary by role (preview)NoMedium
Detailed salary breakdown (full distribution)YesHard
Top 3 reviews (preview)NoMedium
Full reviews (paginated)YesHard — account ban risk
Interview questionsPartialHard

Public data (no login) follows the HiQ v. LinkedIn precedent — legal in the US, contractually prohibited by Glassdoor ToS. Scraping behind a login is far riskier: account creation under false pretenses, ToS violations, and possible CFAA exposure if the login is gated. For commercial use, Glassdoor offers a paid API for partners.

Never scrape personal information of reviewers (even if visible). GDPR / CCPA risk.

Anti-Bot Setup

Glassdoor uses DataDome (the same protection as Reddit and Hermes). DataDome scores requests at four layers; see bypass DataDome for the full breakdown. For Glassdoor specifically:

  • Residential or LTE mobile proxies are mandatory. Datacenter IPs are blocked at the first request.
  • Real browser is required. Plain requests fails the TLS fingerprint check.
  • Slow it down. 3-7s between actions, scroll the page, mimic mouse movement.
  • Sticky sessions help. Reuse the same IP for a session (Glassdoor sets cookies; switching IP mid-session is suspicious).

Playwright Setup

pip install playwright
python -m playwright install chromium
from playwright.sync_api import sync_playwright
import time, random, json

PROXY_USER = "your_user"
PROXY_PASS = "your_pass"
PROXY_HOST = "gw.spyderproxy.com"
PROXY_PORT = 8000

def scrape_glassdoor_jobs(query, location, max_pages=3):
    session_id = random.randint(0, 100000)
    jobs = []

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            proxy={
                "server": f"http://{PROXY_HOST}:{PROXY_PORT}",
                "username": f"{PROXY_USER}-session-{session_id}",
                "password": PROXY_PASS,
            },
        )
        ctx = browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                       "(KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = ctx.new_page()

        url = f"https://www.glassdoor.com/Job/{location}-{query}-jobs.htm"
        page.goto(url, wait_until="domcontentloaded", timeout=45000)
        time.sleep(random.uniform(3, 6))

        for _ in range(max_pages):
            page_jobs = extract_jobs_from_page(page)
            jobs.extend(page_jobs)

            # Scroll to load more (Glassdoor uses infinite scroll on jobs)
            page.evaluate("window.scrollBy(0, 1500)")
            time.sleep(random.uniform(2, 4))

        browser.close()

    return jobs


def extract_jobs_from_page(page):
    """Glassdoor JSON-LD has the job data."""
    cards = page.query_selector_all("li.JobsList_jobListItem")
    out = []
    for c in cards:
        title_el = c.query_selector("a.JobCard_jobTitle")
        company_el = c.query_selector("span.EmployerProfile_compactEmployerName")
        location_el = c.query_selector("div.JobCard_location")
        salary_el = c.query_selector("div.JobCard_salaryEstimate")
        if not title_el:
            continue
        out.append({
            "title": title_el.inner_text().strip(),
            "company": company_el.inner_text().strip() if company_el else None,
            "location": location_el.inner_text().strip() if location_el else None,
            "salary_estimate": salary_el.inner_text().strip() if salary_el else None,
            "url": title_el.get_attribute("href"),
        })
    return out


if __name__ == "__main__":
    jobs = scrape_glassdoor_jobs("python-developer", "San-Francisco-CA", max_pages=3)
    print(json.dumps(jobs[:5], indent=2))

Scraping Company Reviews (Preview)

Without login, Glassdoor shows the top 3 reviews per company. Bigger samples need an account.

def scrape_company_preview(company_url):
    """Public preview: top reviews + ratings."""
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            proxy={
                "server": f"http://{PROXY_HOST}:{PROXY_PORT}",
                "username": f"{PROXY_USER}-session-{random.randint(0,99999)}",
                "password": PROXY_PASS,
            },
        )
        page = browser.new_page()
        page.goto(company_url, wait_until="domcontentloaded")
        time.sleep(random.uniform(3, 6))

        rating = page.query_selector("[data-test="rating"]")
        review_count = page.query_selector("[data-test="reviewCount"]")
        reviews = page.query_selector_all("[data-test="employer-review"]")

        result = {
            "rating": rating.inner_text() if rating else None,
            "review_count": review_count.inner_text() if review_count else None,
            "reviews": [{
                "headline": r.query_selector("h2").inner_text() if r.query_selector("h2") else None,
                "rating": r.query_selector("[data-test="review-rating"]").inner_text() if r.query_selector("[data-test="review-rating"]") else None,
                "pros": r.query_selector("[data-test="pros"]").inner_text() if r.query_selector("[data-test="pros"]") else None,
                "cons": r.query_selector("[data-test="cons"]").inner_text() if r.query_selector("[data-test="cons"]") else None,
            } for r in reviews[:3]],
        }
        browser.close()
    return result

The Login Wall

After 3-5 page views as an anonymous user, Glassdoor often shows a "Get more reviews" interstitial that requires login. Hitting it kills your scrape. Mitigations:

  • Rotate sessions aggressively — new IP + new browser context every 3-5 pages
  • Clear cookies between rotations (Playwright: ctx.clear_cookies() or new context)
  • Pull from SERP previews — sometimes the data you need is in Google's cached preview

Logging in to scrape more is a ToS violation and risks account bans + creates personal-data exposure (your account is tied to the scraping). Not recommended.

Proxy Recommendation

Glassdoor + DataDome is a tough target. Recommended:

  • Default: Premium Residential ($2.75/GB) with sticky sessions. Good for most companies/searches.
  • Hard targets / high volume: LTE Mobile ($2/IP). Lowest DataDome score impact.
  • Avoid: Datacenter, ISP/Static Residential (DataDome blocks both quickly).

Alternatives

  • Glassdoor Partner API — the official, paid path. Reliable but requires application approval.
  • Indeed — owned by the same company; sometimes overlapping data is easier to scrape from Indeed (see Indeed scraping guide).
  • Levels.fyi — for tech salary data specifically; cleaner schema, less anti-bot.

Related: scrape Indeed, bypass DataDome, scrape LinkedIn safely.