When should I use CSS selectors vs XPath?

Use CSS by default — simpler syntax, faster in browsers, easier to read. Switch to XPath when you need: text matching (contains(text(),...)), parent/ancestor traversal, or complex sibling logic. Rule of thumb: ~70% of scraping is CSS, ~30% is XPath when CSS can't express what you need.

Does :contains() work in standard CSS?

No — :contains() is a jQuery/Sizzle extension, never made it into the CSS spec. Native browsers and lxml don't support it. BeautifulSoup's soupsieve supports :-soup-contains() (note the prefix). Playwright has its own text= filter. For lxml/scrapy, use XPath contains(text(),...) instead.

What's the difference between :nth-child and :nth-of-type?

:nth-child(2) selects an element if it's the 2nd child of its parent, regardless of element type. :nth-of-type(2) selects the 2nd element of THAT specific tag type among its siblings. Example: 'p:nth-of-type(1)' = first sibling; 'p:nth-child(1)' = a only if it's the 1st child.

How do I select a parent element with CSS?

Use the :has() pseudo-class (CSS4, supported in Chrome 105+, Safari 15.4+, Firefox 121+, Playwright, soupsieve 2.5+). Example: 'div.card:has(.badge.sale)' selects div.card containing a .badge.sale element. Before :has(), CSS couldn't go upward — XPath was the only way.

Can CSS selectors find elements by text content?

Not in standard CSS. Workarounds: jQuery/Cheerio use :contains('text'); BeautifulSoup uses :-soup-contains('text'); Playwright has page.locator('button', has_text='Submit'). For Selenium and lxml, use XPath contains(text(), 'foo') instead.

What does the > combinator do in CSS?

Direct child: 'div > a' selects elements that are immediate children of . Without it, 'div a' selects any inside any at any depth. Use > when you need to avoid grandchildren matching.

How fast are CSS selectors compared to XPath?

In browsers: CSS is 2-5x faster (heavily optimized in browser engines). In BeautifulSoup with lxml: similar speed. In lxml directly: XPath is native (CSS gets translated). Overall: CSS for browser-based scraping (Playwright/Selenium), XPath for libxml2-based scraping where text matching matters.

How do I test CSS selectors in the browser?

DevTools console: $$('your.css.selector') returns all matches as a NodeList. $('your.css.selector') returns the first match. These are DevTools shortcuts (not jQuery). Inspect the result before pasting into your scraper code. For element highlighting, use the inspector's 'Find' (Ctrl/Cmd+F).

CSS Selector Cheat Sheet (2026): Web Scraping Reference

Alex R.

Sun May 10 2026

Quick verdict: CSS selectors are the standard query language for HTML. They are simpler than XPath for class/id selection and faster in browsers. Use CSS for ~70% of scraping needs; reach for XPath when you need text matching (contains(text(),...)), parent traversal (ancestor::), or sibling logic that CSS cannot express cleanly. This cheat sheet covers everything from .class to :has().

Basic Selectors

Selector	Matches
`*`	Any element
`tag`	`<tag>` elements
`#id`	Element with id="id"
`.class`	Element with class containing "class"
`tag.class`	Both: `<tag class="class">`
`tag#id`	Both
`tag, tag2`	Either (comma = OR)

Examples:

p              # all 
.warning       # any element with class="warning"
#main          # element with id="main"
div.card       # 
a, button      # all  or

Combinators (Tree Relationships)

Combinator	Meaning
`a b`	Descendant: `b` anywhere inside `a`
`a > b`	Direct child: `b` is immediate child of `a`
`a + b`	Adjacent sibling: `b` immediately after `a`
`a ~ b`	General sibling: `b` anywhere after `a` at same level

Examples:

nav a              # any  inside any 
ul > li            #  directly under  (not nested)
h2 + p             #  immediately following 
label ~ input      #  after  (any distance)

Attribute Selectors

Selector	Matches
`[attr]`	Has the attribute (any value)
`[attr="val"]`	Exactly `val`
`[attr*="val"]`	Contains `val`
`[attr^="val"]`	Starts with `val`
`[attr$="val"]`	Ends with `val`
`[attr~="val"]`	Whitespace-separated list contains `val`
`[attr\|="val"]`	Equal to `val` or starts `val-` (i18n langs)

Examples:

a[href]                    # links with href
a[href^="https://"]        # links starting with https://
img[src$=".png"]           # images ending .png
div[data-id*="user-"]      # divs whose data-id contains "user-"
input[type="email"]        # email inputs

Pseudo-Classes

Pseudo-class	Meaning
`:first-child`	First child of its parent
`:last-child`	Last child
`:nth-child(N)`	Nth child (1-indexed). `:nth-child(2)` = 2nd
`:nth-child(2n)`	Even children
`:nth-child(odd)`	Odd children
`:nth-of-type(N)`	Nth of THAT element type
`:first-of-type`	First of that type among siblings
`:not(sel)`	Not matching sel
`:has(sel)`	Contains an element matching sel (CSS4, supported in browsers + Playwright)
`:is(s1, s2)`	Matches any of the listed selectors
`:where(...)`	Like :is but zero specificity
`:empty`	Has no children (no text either)
`:checked`	Checked input
`:disabled`	Disabled input

Examples:

tr:nth-child(odd)              # zebra rows
li:first-child                 # first item in any list
button:not(.disabled)          # non-disabled buttons
div.card:has(.badge.sale)      # cards containing a sale badge
ul li:nth-of-type(3)           # 3rd  sibling

The :has() Selector (CSS4)

The killer modern feature CSS used to lack. :has() selects parents based on what they contain — the equivalent of XPath's ancestor traversal:

div:has(img)                  # divs containing an 
article:has(h2 + p)            # articles with h2 followed by p
form:has(input:invalid)        # forms with any invalid input
tr:has(td:contains("$99"))     # NOTE: :contains() is jQuery, not standard CSS

Browser support: Chrome 105+, Safari 15.4+, Firefox 121+. Playwright supports it. BeautifulSoup's soup.select() with soupsieve 2.5+ supports it.

Text Matching: Where CSS Falls Short

Standard CSS has NO text-content selector. The pseudo-class :contains("text") is a jQuery/Sizzle extension, NOT real CSS. Library support varies:

Library	`:contains()`
Native browser	No (use Playwright text= filter)
Playwright	Yes (Playwright extension)
BeautifulSoup (soupsieve)	Yes (`:-soup-contains()`)
jQuery / Cheerio	Yes
lxml	No — use XPath `contains(text(),...)`

For native CSS, you cannot select by text content. Either use XPath, or post-filter in your scraping language.

Python Examples

BeautifulSoup with CSS selectors

from bs4 import BeautifulSoup
import requests

r = requests.get("https://example.com")
soup = BeautifulSoup(r.text, "lxml")

# Single result
title = soup.select_one("h1.page-title").text

# Multiple
links = [a["href"] for a in soup.select("nav a[href]")]

# Combinators
rows = soup.select("table.data > tbody > tr:not(.header)")

# Text contains (soupsieve)
badges = soup.select(".product:-soup-contains('Sold Out')")

Playwright with CSS selectors

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    page = p.chromium.launch().new_page()
    page.goto("https://example.com")

    # Get text
    title = page.locator("h1.title").text_content()

    # Multiple elements
    prices = page.locator("span.price").all_text_contents()

    # CSS4 :has()
    sale_cards = page.locator("div.card:has(.badge.sale)").all()

    # Playwright text= shortcut (not standard CSS)
    submit = page.locator("button", has_text="Submit")

CSS vs XPath

Goal	CSS	XPath
By id	`#x`	`//*[@id="x"]`
By class	`.x`	`//*[contains(@class, "x")]`
Direct child	`div > a`	`/div/a`
Adjacent sibling	`h2 + p`	`h2/following-sibling::p[1]`
Text contains	(jQuery extension only)	`//a[contains(text(), "Sign up")]`
Parent	(impossible in pre-CSS4)	`//span/parent::div`
Ancestor	`:has()` (CSS4)	`//x/ancestor::form`

For scraping, use CSS by default; switch to XPath for text matching or complex tree traversal. See the XPath cheat sheet for the XPath equivalent.

Performance Notes

Browser: CSS is 2-5x faster than XPath in browser engines (heavily optimized).
BeautifulSoup: CSS and XPath are similar speed; both fast on lxml backend.
Selector specificity: id (#x) is fastest, then tag (a), then class (.x), then attribute ([href]).
Avoid * at the start of long chains — full traversal.

Testing in Browser DevTools

$$("div.card")            # all matches (NodeList)
$("h1")                   # first match
document.querySelectorAll("nav a[href]")

$() and $$() are DevTools shortcuts (not jQuery). Use them to iterate selectors before pasting into your scraper.