Can I scrape Glassdoor without logging in?

Yes — public preview data (top 3 reviews per company, average salary, job listings) is visible without login. Detailed reviews and full salary distributions require an account, and scraping behind a login carries account-ban + ToS risks. Stick to public data unless you have a partnership.

Why does Glassdoor block my scraper?

DataDome + IP reputation + missing browser fingerprints. Plain requests fails immediately because of TLS fingerprint mismatch. Even with Playwright, datacenter IPs get blocked. Use rotating residential or LTE mobile proxies + 3-7s delays + realistic headers.

Does Glassdoor have an API?

Yes — the Glassdoor Partner Program (paid, application-only). It's the safe production path. For research/one-off analysis, Playwright + residential proxies is the practical alternative.

Should I use Selenium or Playwright for Glassdoor?

Playwright is better in 2026: cleaner async API, built-in proxy support, less prone to WebDriver detection. Selenium can work with undetected-chromedriver but is older. If you already have Selenium infrastructure, stick with it; greenfield use Playwright.

Why does Glassdoor show me a login wall after a few pages?

Soft anti-scrape mechanism — anonymous users hit a registration interstitial after 3-5 page views in a session. Mitigation: rotate IP + clear cookies + new browser context every 3-5 pages. Don't actually log in to bypass — that's a ToS violation.

What proxy type works best for Glassdoor?

LTE mobile is best (lowest DataDome score). Premium residential is the practical choice for most use cases. Avoid datacenter (instantly blocked) and Static Residential / ISP (blocked within 100-200 requests).

How many requests can I do per IP?

Rough numbers: residential IPs ~100-300 requests before DataDome score climbs. Mobile IPs ~500-1,000+. Always include 3-7 second delays between requests. Rotate sessions every ~50 requests for safest behavior.

Is scraping Glassdoor reviews legal?

Public reviews (visible without login) are legally similar to other public web data — protected under HiQ v. LinkedIn (US 2022). But Glassdoor's ToS prohibits automated access. For commercial use, the Partner API is the safe path. Never scrape personal-data fields (reviewer names, photos) — GDPR/CCPA risk.

How to Scrape Glassdoor (2026): Reviews, Salaries, Jobs

Alex R.

Sun May 10 2026

Quick verdict: Glassdoor uses DataDome at the edge + a soft login wall on full reviews/salaries. Public preview data (top of page) is scrape-able with rotating residential proxies + Playwright. Full review text and detailed salary data require an account. Use Playwright (not requests) because Glassdoor is heavily JS-rendered. Plan for ~30-40% block rate without LTE mobile proxies.

What You Can Scrape

Data type	Login needed?	Difficulty
Job listings (search results)	No	Medium — DataDome on /Job/
Company overview (size, industry, HQ)	No	Easy
Average salary by role (preview)	No	Medium
Detailed salary breakdown (full distribution)	Yes	Hard
Top 3 reviews (preview)	No	Medium
Full reviews (paginated)	Yes	Hard — account ban risk
Interview questions	Partial	Hard

Legal & Ethical Notes

Public data (no login) follows the HiQ v. LinkedIn precedent — legal in the US, contractually prohibited by Glassdoor ToS. Scraping behind a login is far riskier: account creation under false pretenses, ToS violations, and possible CFAA exposure if the login is gated. For commercial use, Glassdoor offers a paid API for partners.

Never scrape personal information of reviewers (even if visible). GDPR / CCPA risk.

Anti-Bot Setup

Glassdoor uses DataDome (the same protection as Reddit and Hermes). DataDome scores requests at four layers; see bypass DataDome for the full breakdown. For Glassdoor specifically:

Residential or LTE mobile proxies are mandatory. Datacenter IPs are blocked at the first request.
Real browser is required. Plain requests fails the TLS fingerprint check.
Slow it down. 3-7s between actions, scroll the page, mimic mouse movement.
Sticky sessions help. Reuse the same IP for a session (Glassdoor sets cookies; switching IP mid-session is suspicious).

Playwright Setup

pip install playwright
python -m playwright install chromium

from playwright.sync_api import sync_playwright
import time, random, json

PROXY_USER = "your_user"
PROXY_PASS = "your_pass"
PROXY_HOST = "gw.spyderproxy.com"
PROXY_PORT = 8000

def scrape_glassdoor_jobs(query, location, max_pages=3):
    session_id = random.randint(0, 100000)
    jobs = []

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            proxy={
                "server": f"http://{PROXY_HOST}:{PROXY_PORT}",
                "username": f"{PROXY_USER}-session-{session_id}",
                "password": PROXY_PASS,
            },
        )
        ctx = browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                       "(KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = ctx.new_page()

        url = f"https://www.glassdoor.com/Job/{location}-{query}-jobs.htm"
        page.goto(url, wait_until="domcontentloaded", timeout=45000)
        time.sleep(random.uniform(3, 6))

        for _ in range(max_pages):
            page_jobs = extract_jobs_from_page(page)
            jobs.extend(page_jobs)

            # Scroll to load more (Glassdoor uses infinite scroll on jobs)
            page.evaluate("window.scrollBy(0, 1500)")
            time.sleep(random.uniform(2, 4))

        browser.close()

    return jobs


def extract_jobs_from_page(page):
    """Glassdoor JSON-LD has the job data."""
    cards = page.query_selector_all("li.JobsList_jobListItem")
    out = []
    for c in cards:
        title_el = c.query_selector("a.JobCard_jobTitle")
        company_el = c.query_selector("span.EmployerProfile_compactEmployerName")
        location_el = c.query_selector("div.JobCard_location")
        salary_el = c.query_selector("div.JobCard_salaryEstimate")
        if not title_el:
            continue
        out.append({
            "title": title_el.inner_text().strip(),
            "company": company_el.inner_text().strip() if company_el else None,
            "location": location_el.inner_text().strip() if location_el else None,
            "salary_estimate": salary_el.inner_text().strip() if salary_el else None,
            "url": title_el.get_attribute("href"),
        })
    return out


if __name__ == "__main__":
    jobs = scrape_glassdoor_jobs("python-developer", "San-Francisco-CA", max_pages=3)
    print(json.dumps(jobs[:5], indent=2))

Scraping Company Reviews (Preview)

Without login, Glassdoor shows the top 3 reviews per company. Bigger samples need an account.

def scrape_company_preview(company_url):
    """Public preview: top reviews + ratings."""
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            proxy={
                "server": f"http://{PROXY_HOST}:{PROXY_PORT}",
                "username": f"{PROXY_USER}-session-{random.randint(0,99999)}",
                "password": PROXY_PASS,
            },
        )
        page = browser.new_page()
        page.goto(company_url, wait_until="domcontentloaded")
        time.sleep(random.uniform(3, 6))

        rating = page.query_selector("[data-test="rating"]")
        review_count = page.query_selector("[data-test="reviewCount"]")
        reviews = page.query_selector_all("[data-test="employer-review"]")

        result = {
            "rating": rating.inner_text() if rating else None,
            "review_count": review_count.inner_text() if review_count else None,
            "reviews": [{
                "headline": r.query_selector("h2").inner_text() if r.query_selector("h2") else None,
                "rating": r.query_selector("[data-test="review-rating"]").inner_text() if r.query_selector("[data-test="review-rating"]") else None,
                "pros": r.query_selector("[data-test="pros"]").inner_text() if r.query_selector("[data-test="pros"]") else None,
                "cons": r.query_selector("[data-test="cons"]").inner_text() if r.query_selector("[data-test="cons"]") else None,
            } for r in reviews[:3]],
        }
        browser.close()
    return result

After 3-5 page views as an anonymous user, Glassdoor often shows a "Get more reviews" interstitial that requires login. Hitting it kills your scrape. Mitigations:

Rotate sessions aggressively — new IP + new browser context every 3-5 pages
Clear cookies between rotations (Playwright: ctx.clear_cookies() or new context)
Pull from SERP previews — sometimes the data you need is in Google's cached preview

Logging in to scrape more is a ToS violation and risks account bans + creates personal-data exposure (your account is tied to the scraping). Not recommended.

Proxy Recommendation

Glassdoor + DataDome is a tough target. Recommended:

Default: Premium Residential ($2.75/GB) with sticky sessions. Good for most companies/searches.
Hard targets / high volume: LTE Mobile ($2/IP). Lowest DataDome score impact.
Avoid: Datacenter, ISP/Static Residential (DataDome blocks both quickly).

Alternatives

Glassdoor Partner API — the official, paid path. Reliable but requires application approval.
Indeed — owned by the same company; sometimes overlapping data is easier to scrape from Indeed (see Indeed scraping guide).
Levels.fyi — for tech salary data specifically; cleaner schema, less anti-bot.