When should I use XPath instead of CSS selectors?

Use XPath when you need text matching (contains(text(),...)), tree traversal upward (parent::, ancestor::), sibling logic (following-sibling::), or complex predicates. CSS selectors are simpler for class/id selection and faster in browsers — use them by default, switch to XPath when CSS can't express what you need.

What's the difference between // and / in XPath?

// means 'anywhere in the document at any depth' (descendant-or-self axis). / means 'direct child only.' //div//a finds every inside any . /html/body/div/a only finds as a direct child of that exact path. // is what you usually want for scraping.

How do I match an element with a specific class in XPath?

Use contains(@class, 'classname') because the class attribute is a space-separated list. //div[contains(@class, 'card')] matches , , etc. Don't use [@class='card'] because it requires exact match — fails on multi-class elements.

Why does my XPath return nothing?

Top causes: 1) you're matching on whitespace (use normalize-space() to trim). 2) the element is JS-rendered and your scraper sees only initial HTML. 3) you used / instead of // for descendant. 4) namespace issue (XHTML and SVG nodes need namespace handling). Test in browser DevTools with $x().

Can XPath match by text content?

Yes — //h1[text()='Welcome'] for exact match, //h1[contains(text(), 'Wel')] for substring. To match across nested children, use //h1[contains(., 'Wel')] (the dot is the string value of the whole node).

What XPath version does Python lxml support?

XPath 1.0 by default. Some XPath 2.0 features (lower-case(), ends-with()) are NOT available. For case-insensitive matching, use translate() with both alphabets. For 2.0+, use Saxon-HE Python bindings or move to JavaScript-based parsers.

How do I select the parent of an element in XPath?

//child/parent::tagname for typed parent, or //child/.. for any parent. Example: //span[@class='price']/parent::div finds the div containing a price span. CSS selectors cannot do this — XPath's parent/ancestor axes are why scrapers prefer XPath for upward DOM traversal.

What's the fastest way to test XPath expressions?

Browser DevTools console: $x('//your/xpath') in Chrome or Firefox. Returns the matched nodes immediately. Iterate the expression there before pasting into your scraper. For Python, lxml's tree.xpath() returns the result list — print it to verify before parsing.

XPath Cheat Sheet (2026): Selectors, Functions & Examples

Daniel K.

Sun May 10 2026

Quick verdict: XPath wins over CSS selectors when you need text matching (contains(text(),...)), traversal up the DOM (parent::, ancestor::), or sibling logic (following-sibling::). CSS wins for class/id selection (.foo, #bar) and is faster in browsers. For scraping, learn both — you will use XPath ~30% of the time when CSS cannot do the job.

Basics

Expression	Meaning
`/`	Root
`//`	Anywhere in the document
`.`	Current node
`..`	Parent
`*`	Any element
`@`	Attribute
`text()`	Text content of node
`node()`	Any node (element + text + comment)

Common Selectors

# Every  tag anywhere
//a

#  tags inside 
//div[@class="results"]//a

# Element with specific id
//*[@id="main"]

# Element with class containing "btn-primary"
//*[contains(@class, "btn-primary")]

#  with href attribute
//a[@href]

#  whose href starts with "/blog/"
//a[starts-with(@href, "/blog/")]

#  with exact text
//h1[text()="Welcome"]

#  containing "Welcome" (substring)
//h1[contains(text(), "Welcome")]

# Third  in any 
//ul/li[3]

# Last 
//ul/li[last()]

# All but first
//ul/li[position() > 1]

Predicates (Filters in [])

Predicates filter nodes:

# Multiple conditions (AND)
//a[@href and @target="_blank"]

# OR
//a[@target="_blank" or @rel="noopener"]

# Negation
//a[not(@target="_blank")]

# Comparison
//tr[position() > 1]
//product[@price >= 100 and @price < 200]

# Text predicate
//button[text()="Submit"]
//div[contains(., "$99")]   # . = string value of node (text + descendants)

Axes (Tree Traversal)

The killer XPath feature CSS does not have. Format: axis::node-test[predicate].

Axis	Selects
`parent::`	Parent node
`ancestor::`	All ancestors
`ancestor-or-self::`	Self + all ancestors
`child::`	Direct children (default axis)
`descendant::`	All descendants
`descendant-or-self::`	Self + all descendants (this is what `//` means)
`following::`	Everything after current in document order
`following-sibling::`	Siblings after current
`preceding::`	Everything before current in document order
`preceding-sibling::`	Siblings before current
`self::`	The current node
`attribute::`	Attributes (shorthand: `@`)

Examples:

# From a , get the parent 
//span[@class="price"]/parent::div

# From a label, get the next input (sibling)
//label[text()="Email"]/following-sibling::input[1]

# All  tags after the 

//h2[@id="news"]/following-sibling::p

# The  ancestor of a 
//td[contains(text(), "Total")]/ancestor::table[1]

# Self with predicate (rare but legal)
//div[@class="card"]/self::*[contains(., "Sale")]String & Number Functions
Function Use
contains(s1, s2) True if s1 contains s2
starts-with(s1, s2) True if s1 starts with s2
ends-with(s1, s2) XPath 2.0+ only (Python lxml: NO)
normalize-space(s) Trim + collapse whitespace
string-length(s) Length
substring(s, start, len) 1-indexed substring
substring-before(s, sep) Text before separator
substring-after(s, sep) Text after separator
translate(s, from, to) Character-by-character map (poor man's lowercase)
lower-case(s) XPath 2.0+ only
count(nodeset) Number of matched nodes
position() Index of current node in match set
last() Index of last node in match set
Case-insensitive matching (XPath 1.0 lacks lower-case):
//a[contains(translate(text(), "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"), "submit")]
This is hideous but it is the XPath 1.0 way. lxml in Python uses XPath 1.0 by default; libxml2 supports 2.0 with a flag.
Real Scraping Examples
Extract all article titles where the publication date is 2026:
//article[contains(.//time/@datetime, "2026")]//h2/text()
Extract data-href from cards that contain a "Sold" badge:
//div[@class="card" and .//span[@class="badge sold"]]/@data-href
Get the price next to a label "Total:":
//*[text()="Total:"]/following-sibling::*[1]/text()
Get the value of the input that follows a label:
//label[text()="Email"]/following::input[1]/@value
(following:: is broader than following-sibling:: — it catches inputs in different parent nodes too.)
Python lxml Quickstart
from lxml import html, etree
import requests

r = requests.get("https://example.com")
tree = html.fromstring(r.content)

# Single result
title = tree.xpath("//h1/text()")[0]

# Multiple results
links = tree.xpath("//a/@href")

# Element nodes (not just text)
cards = tree.xpath('//div[contains(@class, "card")]')
for c in cards:
    title = c.xpath(".//h2/text()")[0]   # NOTE the leading dot
    print(title)
Critical: when XPathing within an element, prefix with . to scope to the element. Without the dot, //h2 means "search the whole document," not "search inside this element."
XPath vs CSS Selectors
Need XPath CSS
By id //*[@id="x"] #x
By class //*[contains(@class, "x")] .x
By attribute //a[@href] a[href]
Descendant //div//a div a
Direct child /div/a div > a
nth-child //ul/li[3] ul li:nth-child(3)
Adjacent sibling //h2/following-sibling::p[1] h2 + p
Text match //a[contains(text(), "Sign up")] (impossible)
Parent //span[@class="x"]/parent::div (impossible)
Ancestor //x/ancestor::form[1] (impossible)
For scraping, expect to use XPath ~30-40% of the time when text or upward-traversal is needed.
Testing XPath in Browser DevTools
Open DevTools console:
$x("//h1/text()")
$x() is built into Chrome and Firefox DevTools. Returns matching nodes. Use it to iterate on selectors before pasting into your scraper.
Related: PyQuery (jQuery-like CSS scraping), Cheerio vs Puppeteer, Scrape text from any website.
Do you still not have an account?
Then register it now
Products
Budget ProxiesRotating Residential ProxiesStatic Datacenter ProxiesStatic Residential ProxiesLTE ProxiesSneaker ProxiesRotating Datacenter ProxiesRotating Mobile Proxies
Use Cases
Web ScrapingPrice MonitoringAd VerificationSEO MonitoringSneaker BottingSocial MediaMarket ResearchBrand ProtectionView All Use Cases
General
AboutBlogFAQReferral programReseller program
Social media
DiscordTelegram
Free Tools
Free Proxy ListIP LookupDNS Leak TestHTTP HeadersIPv6 CheckerWebRTC Leak TestBrowser FingerprintProxy Checker
Spyderproxy.com. All Rights Reserved
Terms of ServicePrivacy Policy

XPath Cheat Sheet (2026): Selectors, Functions & Examples

Basics

Common Selectors

with exact text //h1[text()="Welcome"] #

containing "Welcome" (substring) //h1[contains(text(), "Welcome")] # Third in any //ul/li[3] # Last //ul/li[last()] # All but first //ul/li[position() > 1]

Predicates (Filters in [])

Axes (Tree Traversal)

String & Number Functions

Real Scraping Examples

Python lxml Quickstart

XPath vs CSS Selectors

Testing XPath in Browser DevTools

`containing "Welcome" (substring) //h1[contains(text(), "Welcome")] # Third`
`in any`
`//ul/li[3] # Last`
`//ul/li[last()] # All but first //ul/li[position() > 1]`