Quick verdict: Email scraping is the automated extraction of email addresses from web pages using HTTP scrapers and regex patterns. Scraping public-facing addresses (from About pages, contact directories, conference attendee lists) is generally legal in the US under hiQ v. LinkedIn. Sending unsolicited email to those addresses is where compliance risk starts — CAN-SPAM, GDPR, and CCPA apply.
This explainer covers what email scraping actually is at the technical level, the four common use cases, the legal landscape across US/EU/California, the tools people use, ethical alternatives, and the risks of doing it wrong. For the implementation tutorial, see our companion Python email scraping guide.
An email scraper has three components:
[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}.Optionally, a fourth stage validates the extracted emails (DNS MX lookups, syntax checks via the email-validator package) and a fifth stage deduplicates and saves to CSV or database.
For sites with anti-bot defenses, the HTTP fetcher needs a residential proxy with rotating IPs — datacenter scrapers get blocked within minutes.
| Use case | What's collected | Compliance risk |
|---|---|---|
| Sales prospecting | B2B contact emails from company sites, directories | Medium-high if sending cold email to EU/CA contacts |
| Recruiter sourcing | Candidate emails from portfolios, GitHub, conference lists | Medium for personal-data emails |
| Security audits (defensive) | Your own organization's exposed emails — for risk assessment | Low (you have authority over your own domain) |
| Research / journalism | Communication-pattern analysis, breach impact studies | Low if not contacting subjects, with IRB approval for academic |
The leading case is hiQ Labs v. LinkedIn (Ninth Circuit, 2022). The court ruled that scraping publicly accessible web data — including emails on publicly visible profiles — does not violate the Computer Fraud and Abuse Act (CFAA). LinkedIn-style Terms of Service violations create civil but not criminal liability.
Where the analysis changes: if you SEND email to scraped addresses, CAN-SPAM applies. CAN-SPAM doesn't ban cold email outright, but requires:
Violation penalties: up to $50,120 per email under FTC enforcement.
GDPR treats personal email addresses (e.g., [email protected]) as personal data even if they appear on a public website. Article 6 requires a "lawful basis" for processing — most scraped-and-emailed campaigns rely on "legitimate interest", which requires:
Cold sales email to EU contacts almost never satisfies the balance test. Penalties: up to €20 million or 4% of annual revenue, whichever is higher.
Generic business emails ([email protected]) are typically NOT personal data under GDPR — they identify the company, not a person — so they're substantially safer to email.
CCPA grants consumers the right to know what personal data businesses collect about them and to request deletion. If you scrape and store California-resident emails, you must:
CASL (Canada's anti-spam law) is even stricter: explicit opt-in consent is required before sending commercial email. Cold email to Canadian addresses without prior consent is presumptively illegal.
| Tool | Type | Best for |
|---|---|---|
| Hunter.io | SaaS — domain + name → email | Sales prospecting at scale |
| Apollo.io | SaaS — B2B contact database | Same, with sequencing built in |
| Snov.io / RocketReach | SaaS | Specific industries / regions |
| Custom Python scraper | Open-source / DIY | Niche industries SaaS doesn't cover |
| Browser extensions (e.g., Mailtastic) | Browser plugin | Manual page-by-page extraction |
The SaaS tools handle compliance and verification but cost $50-$500/month. A custom scraper using residential proxies is more cost-effective for one-off jobs or industry verticals the SaaS tools don't cover well.
If your goal is contacting prospects without legal exposure, three approaches work better than scraping + cold email: