Quick verdict: Four working ways to search HTML by class in BeautifulSoup. soup.find_all(class_='card') is shortest, soup.select('.card') is most flexible (full CSS selector syntax), attrs={'class': 'card'} is best when combining with other attributes, and regex matching handles dynamic Tailwind-style class names. Pick by what your HTML actually looks like, not by Stack Overflow voting.
This guide covers the four methods, the gotchas around multi-class elements and case sensitivity, performance comparisons (lxml vs html.parser), and how to handle the JavaScript-rendered pages where naive class search returns nothing.
find_all(class_=) — the canonical shorthandfrom bs4 import BeautifulSoup
import requests
html = requests.get('https://example.com').text
soup = BeautifulSoup(html, 'lxml')
# Find all
cards = soup.find_all('div', class_='card')
for card in cards:
print(card.get_text(strip=True))
Note the trailing underscore on class_ — that's because class is a reserved Python keyword.
2. select() with CSS selectors — most flexible
# All elements with class "card"
cards = soup.select('.card')
# All with class "card"
cards = soup.select('div.card')
# Cards inside a section
cards = soup.select('section .card')
# Cards with two classes
cards = soup.select('.card.highlighted')
# Partial match: any class starting with "card-"
cards = soup.select('[class^="card-"]')
If you've ever used jQuery or DevTools, the syntax is identical. Most scrapers default to select() because the same selector copies cleanly between browser console testing and Python code.
3. attrs={'class': ...} — when combining attributes
# Elements with class="card" AND data-id="12"
match = soup.find_all('div', attrs={'class': 'card', 'data-id': '12'})
Less common but the cleanest option when you need to filter by class plus a custom data attribute.
4. Regex matching — for dynamic class names
import re
# All Tailwind blue-shade classes
elems = soup.find_all(class_=re.compile(r'^bg-blue-d+$'))
# Anything containing "product"
elems = soup.find_all(class_=re.compile(r'product'))
Multiple Classes: Any vs All
Goal
Code
Match elements with ANY of these classes find_all(class_=['card', 'product'])
Match elements with ALL of these classes select('.card.product')
Match exactly one specific class set find_all(attrs={'class': 'card highlighted'}) (note: order matters)
Class containing "card" select('[class*="card"]')
Common Pitfalls
- Case sensitivity. HTML classes are case-sensitive in BeautifulSoup.
class_='Card' won't match class='card'. CSS treats class names case-sensitively too.
- Multi-class elements. An element with
class='card highlighted promo' has THREE class tokens. class_='card' matches it (because 'card' is one of its tokens), but class_='card highlighted promo' does NOT — that's looking for an element with literally that whole string as one class.
- Whitespace. HTML normalizes whitespace inside class attributes.
class=' card ' still has just one token, 'card'.
- JavaScript-rendered pages. If you scrape a React/Vue/Angular site, classes from DevTools won't appear in
requests.get() output. Use Playwright to render first.
- Tailwind / utility CSS. Class names like
bg-blue-500 px-4 py-2 rounded change between page versions. Use regex matching or a more stable structural locator (parent ID, data-attribute).
- Encoding gotcha. If your HTML uses HTML entities like
", BeautifulSoup decodes them. Match against decoded class names, not raw HTML.
Performance: Parser Choice Matters
BeautifulSoup's parser choice has a 5–10× impact on speed for class searches:
Parser
Install
Relative speed (100 KB page, 50 class searches)
html.parser (default)Built-in 1.0× (baseline)
lxmlpip install lxml~7× faster
html5libpip install html5lib~0.5× (slower but most lenient)
For production scrapers behind a residential proxy pool, lxml is the standard choice. html5lib is for malformed HTML where lxml fails.
Class Search With Proxies for Scaled Scraping
BeautifulSoup itself doesn't make HTTP requests — it parses HTML you've already fetched. So adding proxies happens in the request layer:
import requests
from bs4 import BeautifulSoup
proxies = {
'http': 'http://USER:[email protected]:8080',
'https': 'http://USER:[email protected]:8080',
}
# Each request rotates through the residential pool
for url in target_urls:
r = requests.get(url, proxies=proxies, timeout=20)
soup = BeautifulSoup(r.text, 'lxml')
cards = soup.select('.product-card')
# ... extract data
For high-volume scraping, see our full guide on rotating proxies with Python requests. For Playwright-based scraping of dynamic sites, see scraping behind login walls.
Related Reading