spyderproxy

CSS Selector Cheat Sheet (2026): Web Scraping Reference

A

Alex R.

|
Published date

Sun May 10 2026

Quick verdict: CSS selectors are the standard query language for HTML. They are simpler than XPath for class/id selection and faster in browsers. Use CSS for ~70% of scraping needs; reach for XPath when you need text matching (contains(text(),...)), parent traversal (ancestor::), or sibling logic that CSS cannot express cleanly. This cheat sheet covers everything from .class to :has().

Basic Selectors

SelectorMatches
*Any element
tag<tag> elements
#idElement with id="id"
.classElement with class containing "class"
tag.classBoth: <tag class="class">
tag#idBoth
tag, tag2Either (comma = OR)

Examples:

p              # all 

.warning # any element with class="warning" #main # element with id="main" div.card #

a, button # all or

Combinators (Tree Relationships)

CombinatorMeaning
a bDescendant: b anywhere inside a
a > bDirect child: b is immediate child of a
a + bAdjacent sibling: b immediately after a
a ~ bGeneral sibling: b anywhere after a at same level

Examples:

nav a              # any  inside any 

Attribute Selectors

SelectorMatches
[attr]Has the attribute (any value)
[attr="val"]Exactly val
[attr*="val"]Contains val
[attr^="val"]Starts with val
[attr$="val"]Ends with val
[attr~="val"]Whitespace-separated list contains val
[attr|="val"]Equal to val or starts val- (i18n langs)

Examples:

a[href]                    # links with href
a[href^="https://"]        # links starting with https://
img[src$=".png"]           # images ending .png
div[data-id*="user-"]      # divs whose data-id contains "user-"
input[type="email"]        # email inputs

Pseudo-Classes

Pseudo-classMeaning
:first-childFirst child of its parent
:last-childLast child
:nth-child(N)Nth child (1-indexed). :nth-child(2) = 2nd
:nth-child(2n)Even children
:nth-child(odd)Odd children
:nth-of-type(N)Nth of THAT element type
:first-of-typeFirst of that type among siblings
:not(sel)Not matching sel
:has(sel)Contains an element matching sel (CSS4, supported in browsers + Playwright)
:is(s1, s2)Matches any of the listed selectors
:where(...)Like :is but zero specificity
:emptyHas no children (no text either)
:checkedChecked input
:disabledDisabled input

Examples:

tr:nth-child(odd)              # zebra rows
li:first-child                 # first item in any list
button:not(.disabled)          # non-disabled buttons
div.card:has(.badge.sale)      # cards containing a sale badge
ul li:nth-of-type(3)           # 3rd 
  • sibling
  • The :has() Selector (CSS4)

    The killer modern feature CSS used to lack. :has() selects parents based on what they contain — the equivalent of XPath's ancestor traversal:

    div:has(img)                  # divs containing an 
    article:has(h2 + p)            # articles with h2 followed by p
    form:has(input:invalid)        # forms with any invalid input
    tr:has(td:contains("$99"))     # NOTE: :contains() is jQuery, not standard CSS

    Browser support: Chrome 105+, Safari 15.4+, Firefox 121+. Playwright supports it. BeautifulSoup's soup.select() with soupsieve 2.5+ supports it.

    Text Matching: Where CSS Falls Short

    Standard CSS has NO text-content selector. The pseudo-class :contains("text") is a jQuery/Sizzle extension, NOT real CSS. Library support varies:

    Library:contains()
    Native browserNo (use Playwright text= filter)
    PlaywrightYes (Playwright extension)
    BeautifulSoup (soupsieve)Yes (:-soup-contains())
    jQuery / CheerioYes
    lxmlNo — use XPath contains(text(),...)

    For native CSS, you cannot select by text content. Either use XPath, or post-filter in your scraping language.

    Python Examples

    BeautifulSoup with CSS selectors

    from bs4 import BeautifulSoup
    import requests
    
    r = requests.get("https://example.com")
    soup = BeautifulSoup(r.text, "lxml")
    
    # Single result
    title = soup.select_one("h1.page-title").text
    
    # Multiple
    links = [a["href"] for a in soup.select("nav a[href]")]
    
    # Combinators
    rows = soup.select("table.data > tbody > tr:not(.header)")
    
    # Text contains (soupsieve)
    badges = soup.select(".product:-soup-contains('Sold Out')")

    Playwright with CSS selectors

    from playwright.sync_api import sync_playwright
    
    with sync_playwright() as p:
        page = p.chromium.launch().new_page()
        page.goto("https://example.com")
    
        # Get text
        title = page.locator("h1.title").text_content()
    
        # Multiple elements
        prices = page.locator("span.price").all_text_contents()
    
        # CSS4 :has()
        sale_cards = page.locator("div.card:has(.badge.sale)").all()
    
        # Playwright text= shortcut (not standard CSS)
        submit = page.locator("button", has_text="Submit")

    CSS vs XPath

    GoalCSSXPath
    By id#x//*[@id="x"]
    By class.x//*[contains(@class, "x")]
    Direct childdiv > a/div/a
    Adjacent siblingh2 + ph2/following-sibling::p[1]
    Text contains(jQuery extension only)//a[contains(text(), "Sign up")]
    Parent(impossible in pre-CSS4)//span/parent::div
    Ancestor:has() (CSS4)//x/ancestor::form

    For scraping, use CSS by default; switch to XPath for text matching or complex tree traversal. See the XPath cheat sheet for the XPath equivalent.

    Performance Notes

    • Browser: CSS is 2-5x faster than XPath in browser engines (heavily optimized).
    • BeautifulSoup: CSS and XPath are similar speed; both fast on lxml backend.
    • Selector specificity: id (#x) is fastest, then tag (a), then class (.x), then attribute ([href]).
    • Avoid * at the start of long chains — full traversal.

    Testing in Browser DevTools

    $$("div.card")            # all matches (NodeList)
    $("h1")                   # first match
    document.querySelectorAll("nav a[href]")

    $() and $$() are DevTools shortcuts (not jQuery). Use them to iterate selectors before pasting into your scraper.

    Related: XPath cheat sheet, PyQuery tutorial, Cheerio vs Puppeteer.