Python BeautifulSoup: Select element by Class (CSS Selectors)

BeautifulSoup is a popular Python library used for web scraping and data extraction. It provides an easy way to parse HTML and XML documents and extract information from them. One of the most common tasks in web scraping is to find elements by their assigned class. In this tutorial, we will show you how to select elements by class in BeautifulSoup using CSS selectors, the select() method.


Selecting Elements by Class in BeautifulSoup using select()

Before we start, let’s first understand what CSS selectors are. CSS selectors are patterns used to select elements in an HTML or XML document. They allow you to select elements based on their tag name, attributes, classes, and other criteria. For example, you can select all <p> elements with a specific class using the following CSS selector:

p.class_name

In this selector, p is the tag name, and '.class_name' is the class selector. This selector will match all <p> elements with the class 'class_name'.

With this basic premise in mind, let’s take a look at an actual example. We have written some simple HTML content here ourselves which we feed directly into BeautifulSoup.

from bs4 import BeautifulSoup

html = """
<div class="example">
    <p class="first">First paragraph</p>
    <p class="second">Second paragraph</p>
    <p class="first">Another first paragraph</p>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

# select all <p> elements with class "first"
elements = soup.select('p.first')

# print the text content of each matching element
for elem in elements:
    print(elem.text)

Output:

First paragraph
Another first paragraph

In the above example, we use the .first selector to select all <p> elements with the class first. The select() method returns a list of all matching elements, which we loop through and print the text content of each element.


Selecting Elements with Multiple Classes

To select elements with multiple classes using the select() method, you can use the dot (.) character to concatenate the class names. Here is an example:

# select all <p> elements with both classes "first" and "second"
elements = soup.select('p.first.second')

In this example, we use the dot (.) character to concatenate the class names first and second. This will select all <p> elements that have both classes.

You can also select elements with multiple classes using CSS selectors with the comma (,) character. Here is an example:

# select all <p> elements with classes "first" or "second"
elements = soup.select('p.first, p.second')

In this example, we use the comma (,) character to select all <p> elements that have either the class first or the class second.


Example

Here is an example illustrating the concept of multi-class selection using select().

from bs4 import BeautifulSoup

html = """
<div class="example">
    <p class="first second">First paragraph</p>
    <p class="second">Second paragraph</p>
    <p class="first">Another first paragraph</p>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

# select elements with both classes "first" and "second"
elements = soup.select('.first.second')

# print the text content of each matching element
for elem in elements:
    print(elem.text)

Output:

First paragraph

This is because only the first <p> element has both the classes “first” and “second”.


This marks the end of the Python BeautifulSoup: Select element by Class (CSS Selectors) Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the article content can be asked in the comments section below.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments