spyderproxy

XPath Cheat Sheet (2026): Selectors, Functions & Examples

D

Daniel K.

|
Published date

Sun May 10 2026

Quick verdict: XPath wins over CSS selectors when you need text matching (contains(text(),...)), traversal up the DOM (parent::, ancestor::), or sibling logic (following-sibling::). CSS wins for class/id selection (.foo, #bar) and is faster in browsers. For scraping, learn both — you will use XPath ~30% of the time when CSS cannot do the job.

Basics

ExpressionMeaning
/Root
//Anywhere in the document
.Current node
..Parent
*Any element
@Attribute
text()Text content of node
node()Any node (element + text + comment)

Common Selectors

# Every  tag anywhere
//a

#  tags inside 

Predicates (Filters in [])

Predicates filter nodes:

# Multiple conditions (AND)
//a[@href and @target="_blank"]

# OR
//a[@target="_blank" or @rel="noopener"]

# Negation
//a[not(@target="_blank")]

# Comparison
//tr[position() > 1]
//product[@price >= 100 and @price < 200]

# Text predicate
//button[text()="Submit"]
//div[contains(., "$99")]   # . = string value of node (text + descendants)

Axes (Tree Traversal)

The killer XPath feature CSS does not have. Format: axis::node-test[predicate].

AxisSelects
parent::Parent node
ancestor::All ancestors
ancestor-or-self::Self + all ancestors
child::Direct children (default axis)
descendant::All descendants
descendant-or-self::Self + all descendants (this is what // means)
following::Everything after current in document order
following-sibling::Siblings after current
preceding::Everything before current in document order
preceding-sibling::Siblings before current
self::The current node
attribute::Attributes (shorthand: @)

Examples:

# From a , get the parent 
//span[@class="price"]/parent::div # From a label, get the next input (sibling) //label[text()="Email"]/following-sibling::input[1] # All

tags after the

//h2[@id="news"]/following-sibling::p # The ancestor of a
//td[contains(text(), "Total")]/ancestor::table[1] # Self with predicate (rare but legal) //div[@class="card"]/self::*[contains(., "Sale")]

String & Number Functions

FunctionUse
contains(s1, s2)True if s1 contains s2
starts-with(s1, s2)True if s1 starts with s2
ends-with(s1, s2)XPath 2.0+ only (Python lxml: NO)
normalize-space(s)Trim + collapse whitespace
string-length(s)Length
substring(s, start, len)1-indexed substring
substring-before(s, sep)Text before separator
substring-after(s, sep)Text after separator
translate(s, from, to)Character-by-character map (poor man's lowercase)
lower-case(s)XPath 2.0+ only
count(nodeset)Number of matched nodes
position()Index of current node in match set
last()Index of last node in match set

Case-insensitive matching (XPath 1.0 lacks lower-case):

//a[contains(translate(text(), "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"), "submit")]

This is hideous but it is the XPath 1.0 way. lxml in Python uses XPath 1.0 by default; libxml2 supports 2.0 with a flag.

Real Scraping Examples

Extract all article titles where the publication date is 2026:

//article[contains(.//time/@datetime, "2026")]//h2/text()

Extract data-href from cards that contain a "Sold" badge:

//div[@class="card" and .//span[@class="badge sold"]]/@data-href

Get the price next to a label "Total:":

//*[text()="Total:"]/following-sibling::*[1]/text()

Get the value of the input that follows a label:

//label[text()="Email"]/following::input[1]/@value

(following:: is broader than following-sibling:: — it catches inputs in different parent nodes too.)

Python lxml Quickstart

from lxml import html, etree
import requests

r = requests.get("https://example.com")
tree = html.fromstring(r.content)

# Single result
title = tree.xpath("//h1/text()")[0]

# Multiple results
links = tree.xpath("//a/@href")

# Element nodes (not just text)
cards = tree.xpath('//div[contains(@class, "card")]')
for c in cards:
    title = c.xpath(".//h2/text()")[0]   # NOTE the leading dot
    print(title)

Critical: when XPathing within an element, prefix with . to scope to the element. Without the dot, //h2 means "search the whole document," not "search inside this element."

XPath vs CSS Selectors

NeedXPathCSS
By id//*[@id="x"]#x
By class//*[contains(@class, "x")].x
By attribute//a[@href]a[href]
Descendant//div//adiv a
Direct child/div/adiv > a
nth-child//ul/li[3]ul li:nth-child(3)
Adjacent sibling//h2/following-sibling::p[1]h2 + p
Text match//a[contains(text(), "Sign up")](impossible)
Parent//span[@class="x"]/parent::div(impossible)
Ancestor//x/ancestor::form[1](impossible)

For scraping, expect to use XPath ~30-40% of the time when text or upward-traversal is needed.

Testing XPath in Browser DevTools

Open DevTools console:

$x("//h1/text()")

$x() is built into Chrome and Firefox DevTools. Returns matching nodes. Use it to iterate on selectors before pasting into your scraper.

Related: PyQuery (jQuery-like CSS scraping), Cheerio vs Puppeteer, Scrape text from any website.