What is PyQuery and why use it?

PyQuery is a Python library that wraps lxml and exposes a jQuery-style API. If you've used jQuery in JavaScript, PyQuery feels identical: $('div.card').find('.title').text(). The advantage over BeautifulSoup is syntax familiarity for jQuery users; the advantage over lxml is more readable code for CSS-selector-heavy parsing.

How does PyQuery compare to BeautifulSoup?

Same use case, different ergonomics. PyQuery uses jQuery syntax (chained selectors, .find(), .text(), .attr()). BeautifulSoup uses Python-idiomatic syntax (find_all(), select(), [text]). Performance is similar because both can use lxml under the hood. Pick PyQuery if you're porting a jQuery scraper; pick BeautifulSoup for new Python code.

Is PyQuery faster than BeautifulSoup?

About the same. PyQuery uses lxml as its backend; BeautifulSoup with lxml parser uses the same engine. Differences are in the wrapper overhead (small) and the selector implementation. On a 100 KB HTML page with 50 CSS selector queries, both finish in under 50 ms. For high-throughput parsing, both lose to raw lxml or selectolax.

Can PyQuery handle JavaScript-rendered pages?

No. Like all static parsers, PyQuery only sees the HTML you give it. For JS-rendered pages, render with Playwright or Selenium first, then pass the rendered HTML to PyQuery: PyQuery(page.content()). For SPAs, the rendering step is mandatory.

How do I install PyQuery?

pip install pyquery. It pulls lxml as a dependency, which on Linux requires libxml2-dev and libxslt-dev (apt install libxml2-dev libxslt-dev). On Windows and Mac, the wheel includes pre-built binaries.

Does PyQuery support XPath?

Yes. PyQuery exposes lxml's XPath engine via .xpath(): doc.xpath('//div[@class="card"]/h2'). For complex selectors that go beyond CSS, XPath is more expressive. Most workloads stick to CSS selectors for readability.

Can I use PyQuery to MODIFY HTML?

Yes — PyQuery supports DOM manipulation just like jQuery: .append(), .remove(), .attr('class', 'new'). For scraping you usually only read, but for HTML transformation pipelines (e.g., cleaning user-submitted HTML), the manipulation API is useful.

Does PyQuery work with proxies for scraping?

PyQuery doesn't fetch HTML — it parses HTML you've already fetched. The proxy is set in the request layer (requests, httpx, etc.). Pattern: r = requests.get(url, proxies={...}); doc = PyQuery(r.text). For scaled scraping, use rotating residential proxies in the request step.

PyQuery Tutorial: HTML Parsing in Python

Alex R.

Mon May 04 2026

Quick verdict: PyQuery brings jQuery-style chaining to Python HTML parsing. Same use case as BeautifulSoup, different ergonomics. Pick PyQuery if you're porting a jQuery scraper or prefer chained selector syntax; pick BeautifulSoup for new Python-idiomatic code; pick raw lxml for max throughput. Performance is comparable across all three for typical scraping workloads.

This guide covers PyQuery's API, when to pick it over alternatives, performance benchmarks against BeautifulSoup and selectolax, and 8 working scraping examples.

Install

pip install pyquery

# Linux: also need lxml dependencies
sudo apt install libxml2-dev libxslt-dev

Basic Usage

from pyquery import PyQuery as pq
import requests

r = requests.get("https://example.com")
doc = pq(r.text)

# All h2 elements
print(doc("h2").text())

# All links
for a in doc("a"):
    href = pq(a).attr("href")
    text = pq(a).text()
    print(text, "—>", href)

# Chained selectors (jQuery-style)
doc(".article").find(".title").each(lambda i, el: print(pq(el).text()))

PyQuery vs BeautifulSoup vs lxml

Library	Syntax style	Speed (relative)	Best for
PyQuery	jQuery chaining	~1.0×	Porting jQuery code, readable selector chains
BeautifulSoup + lxml	Pythonic methods	~1.0×	New Python projects, default choice
lxml direct	XPath / CSS	~2-3×	High-throughput scraping
selectolax	CSS selectors	~5-10×	Maximum-throughput batch processing

8 Working Examples

1. Extract all article titles

doc = pq(html)
titles = [pq(t).text() for t in doc("article h2.title")]

2. Get attribute value

img_src = doc("img.hero").attr("src")
all_links = [pq(a).attr("href") for a in doc("a")]

3. Multi-class selector

# Both classes required
items = doc(".product.featured")

# Either class
items = doc(".product, .featured")

4. Filter by attribute

# Links to external sites
external = doc("a[href^='http']").not_("a[href*='example.com']")

# Inputs of type "email"
emails = doc("input[type='email']")

5. XPath

# Get every h2 whose parent has class 'main'
elems = doc.xpath("//div[@class='main']//h2")

6. Iteration with .each()

def process(i, el):
    e = pq(el)
    print(i, e.find(".title").text(), e.find(".price").text())

doc(".product-card").each(process)

7. Modify HTML

doc("a").attr("rel", "nofollow")
doc("script").remove()
print(doc.outer_html())

8. Through a residential proxy

proxies = {"https": "http://USER:[email protected]:8080"}
r = requests.get("https://target.com", proxies=proxies, timeout=20)
doc = pq(r.text)
items = [pq(x).text() for x in doc(".item-title")]

For scaled scraping behind anti-bot defenses, use a rotating residential proxy at the request layer. PyQuery doesn't care which proxy is in use — it just parses what it receives.

When to Pick PyQuery

You're porting a jQuery-based scraper from Node or browser-side and want the same selector syntax.
Your team is more familiar with jQuery than with Python's iteration patterns.
You need both reading AND writing/modifying the DOM (HTML transformation pipelines).
You like chained method calls more than nested function calls.

Pick BeautifulSoup instead if: you're starting fresh in Python and want the most idiomatic syntax, you need its more lenient HTML parsing for malformed pages (html5lib parser), or you're following a tutorial that uses it.