spyderproxy

BeautifulSoup Search by Class: Python Guide

A

Alex R.

|
Published date

Fri May 01 2026

Quick verdict: Four working ways to search HTML by class in BeautifulSoup. soup.find_all(class_='card') is shortest, soup.select('.card') is most flexible (full CSS selector syntax), attrs={'class': 'card'} is best when combining with other attributes, and regex matching handles dynamic Tailwind-style class names. Pick by what your HTML actually looks like, not by Stack Overflow voting.

This guide covers the four methods, the gotchas around multi-class elements and case sensitivity, performance comparisons (lxml vs html.parser), and how to handle the JavaScript-rendered pages where naive class search returns nothing.

The 4 Ways to Search by Class

1. find_all(class_=) — the canonical shorthand

from bs4 import BeautifulSoup
import requests

html = requests.get('https://example.com').text
soup = BeautifulSoup(html, 'lxml')

# Find all 
cards = soup.find_all('div', class_='card') for card in cards: print(card.get_text(strip=True))

Note the trailing underscore on class_ — that's because class is a reserved Python keyword.

2. select() with CSS selectors — most flexible

# All elements with class "card"
cards = soup.select('.card')

# All 
with class "card" cards = soup.select('div.card') # Cards inside a section cards = soup.select('section .card') # Cards with two classes cards = soup.select('.card.highlighted') # Partial match: any class starting with "card-" cards = soup.select('[class^="card-"]')

If you've ever used jQuery or DevTools, the syntax is identical. Most scrapers default to select() because the same selector copies cleanly between browser console testing and Python code.

3. attrs={'class': ...} — when combining attributes

# Elements with class="card" AND data-id="12"
match = soup.find_all('div', attrs={'class': 'card', 'data-id': '12'})

Less common but the cleanest option when you need to filter by class plus a custom data attribute.

4. Regex matching — for dynamic class names

import re

# All Tailwind blue-shade classes
elems = soup.find_all(class_=re.compile(r'^bg-blue-d+$'))

# Anything containing "product"
elems = soup.find_all(class_=re.compile(r'product'))

Multiple Classes: Any vs All

Goal Code
Match elements with ANY of these classesfind_all(class_=['card', 'product'])
Match elements with ALL of these classesselect('.card.product')
Match exactly one specific class setfind_all(attrs={'class': 'card highlighted'}) (note: order matters)
Class containing "card"select('[class*="card"]')

Common Pitfalls

  1. Case sensitivity. HTML classes are case-sensitive in BeautifulSoup. class_='Card' won't match class='card'. CSS treats class names case-sensitively too.
  2. Multi-class elements. An element with class='card highlighted promo' has THREE class tokens. class_='card' matches it (because 'card' is one of its tokens), but class_='card highlighted promo' does NOT — that's looking for an element with literally that whole string as one class.
  3. Whitespace. HTML normalizes whitespace inside class attributes. class=' card ' still has just one token, 'card'.
  4. JavaScript-rendered pages. If you scrape a React/Vue/Angular site, classes from DevTools won't appear in requests.get() output. Use Playwright to render first.
  5. Tailwind / utility CSS. Class names like bg-blue-500 px-4 py-2 rounded change between page versions. Use regex matching or a more stable structural locator (parent ID, data-attribute).
  6. Encoding gotcha. If your HTML uses HTML entities like ", BeautifulSoup decodes them. Match against decoded class names, not raw HTML.

Performance: Parser Choice Matters

BeautifulSoup's parser choice has a 5–10× impact on speed for class searches:

Parser Install Relative speed (100 KB page, 50 class searches)
html.parser (default)Built-in1.0× (baseline)
lxmlpip install lxml~7× faster
html5libpip install html5lib~0.5× (slower but most lenient)

For production scrapers behind a residential proxy pool, lxml is the standard choice. html5lib is for malformed HTML where lxml fails.

Class Search With Proxies for Scaled Scraping

BeautifulSoup itself doesn't make HTTP requests — it parses HTML you've already fetched. So adding proxies happens in the request layer:

import requests
from bs4 import BeautifulSoup

proxies = {
    'http': 'http://USER:[email protected]:8080',
    'https': 'http://USER:[email protected]:8080',
}

# Each request rotates through the residential pool
for url in target_urls:
    r = requests.get(url, proxies=proxies, timeout=20)
    soup = BeautifulSoup(r.text, 'lxml')
    cards = soup.select('.product-card')
    # ... extract data

For high-volume scraping, see our full guide on rotating proxies with Python requests. For Playwright-based scraping of dynamic sites, see scraping behind login walls.