Combinators (Tree Relationships) Combinator Meaning a bDescendant: b anywhere inside a a > bDirect child: b is immediate child of a a + bAdjacent sibling: b immediately after a a ~ bGeneral sibling: b anywhere after a at same level
Examples:
nav a # any inside any
ul > li # directly under Attribute Selectors Selector Matches [attr]Has the attribute (any value) [attr="val"]Exactly val [attr*="val"]Contains val [attr^="val"]Starts with val [attr$="val"]Ends with val [attr~="val"]Whitespace-separated list contains val [attr|="val"]Equal to val or starts val- (i18n langs)
Examples:
a[href] # links with href
a[href^="https://"] # links starting with https://
img[src$=".png"] # images ending .png
div[data-id*="user-"] # divs whose data-id contains "user-"
input[type="email"] # email inputsPseudo-Classes Pseudo-class Meaning :first-childFirst child of its parent :last-childLast child :nth-child(N)Nth child (1-indexed). :nth-child(2) = 2nd :nth-child(2n)Even children :nth-child(odd)Odd children :nth-of-type(N)Nth of THAT element type :first-of-typeFirst of that type among siblings :not(sel)Not matching sel :has(sel)Contains an element matching sel (CSS4, supported in browsers + Playwright) :is(s1, s2)Matches any of the listed selectors :where(...)Like :is but zero specificity :emptyHas no children (no text either) :checkedChecked input :disabledDisabled input
Examples:
tr:nth-child(odd) # zebra rows
li:first-child # first item in any list
button:not(.disabled) # non-disabled buttons
div.card:has(.badge.sale) # cards containing a sale badge
ul li:nth-of-type(3) # 3rd sibling The :has() Selector (CSS4) The killer modern feature CSS used to lack. :has() selects parents based on what they contain — the equivalent of XPath's ancestor traversal:
div:has(img) # divs containing an
article:has(h2 + p) # articles with h2 followed by p
form:has(input:invalid) # forms with any invalid input
tr:has(td:contains("$99")) # NOTE: :contains() is jQuery, not standard CSSBrowser support: Chrome 105+, Safari 15.4+, Firefox 121+. Playwright supports it. BeautifulSoup's soup.select() with soupsieve 2.5+ supports it.
Text Matching: Where CSS Falls Short Standard CSS has NO text-content selector. The pseudo-class :contains("text") is a jQuery/Sizzle extension, NOT real CSS. Library support varies:
Library :contains()Native browser No (use Playwright text= filter) Playwright Yes (Playwright extension) BeautifulSoup (soupsieve) Yes (:-soup-contains()) jQuery / Cheerio Yes lxml No — use XPath contains(text(),...)
For native CSS, you cannot select by text content. Either use XPath, or post-filter in your scraping language.
Python Examples BeautifulSoup with CSS selectors from bs4 import BeautifulSoup
import requests
r = requests.get("https://example.com")
soup = BeautifulSoup(r.text, "lxml")
# Single result
title = soup.select_one("h1.page-title").text
# Multiple
links = [a["href"] for a in soup.select("nav a[href]")]
# Combinators
rows = soup.select("table.data > tbody > tr:not(.header)")
# Text contains (soupsieve)
badges = soup.select(".product:-soup-contains('Sold Out')")Playwright with CSS selectors from playwright.sync_api import sync_playwright
with sync_playwright() as p:
page = p.chromium.launch().new_page()
page.goto("https://example.com")
# Get text
title = page.locator("h1.title").text_content()
# Multiple elements
prices = page.locator("span.price").all_text_contents()
# CSS4 :has()
sale_cards = page.locator("div.card:has(.badge.sale)").all()
# Playwright text= shortcut (not standard CSS)
submit = page.locator("button", has_text="Submit")CSS vs XPath Goal CSS XPath By id #x//*[@id="x"]By class .x//*[contains(@class, "x")]Direct child div > a/div/aAdjacent sibling h2 + ph2/following-sibling::p[1]Text contains (jQuery extension only) //a[contains(text(), "Sign up")]Parent (impossible in pre-CSS4) //span/parent::divAncestor :has() (CSS4)//x/ancestor::form
For scraping, use CSS by default; switch to XPath for text matching or complex tree traversal. See the XPath cheat sheet for the XPath equivalent.
Browser: CSS is 2-5x faster than XPath in browser engines (heavily optimized).BeautifulSoup: CSS and XPath are similar speed; both fast on lxml backend.Selector specificity: id (#x) is fastest, then tag (a), then class (.x), then attribute ([href]).Avoid * at the start of long chains — full traversal.$$("div.card") # all matches (NodeList)
$("h1") # first match
document.querySelectorAll("nav a[href]")$() and $$() are DevTools shortcuts (not jQuery). Use them to iterate selectors before pasting into your scraper.
Related: XPath cheat sheet , PyQuery tutorial , Cheerio vs Puppeteer .