Beautiful Soup Tutorial #3: Web Scraping Craigslist (One Page)

So let’s assume we want to scrape the titles of jobs available in Boston from Craigslist. For now, we will work on one page only.

We will simply search the website, and get the URL: https://boston.craigslist.org/search/sof

If you have not already, please revise the previous BeautifulSoup Tutorials:

Tutorial Contents
- Beautiful Soup Tutorial #1: Install BeautifulSoup, Requests & LXML
- Beautiful Soup Tutorial #2: Extracting URLs
Beautiful Soup Tutorial #1: Install BeautifulSoup, Requests & LXML
Beautiful Soup Tutorial #2: Extracting URLs

In the current tutorial, the only new part is:

titles = soup.findAll('a', {'class': 'result-title'})

for title in titles:
    print(title.text)

titles = soup.findAll('a', {'class': 'result-title'})

for title in titles:

print(title.text)

Now, it will find all <a> tags whose class name is ‘result-title’.

How can you know the class name? You can find it by opening the URL in your browser, moving the cursor on a job title, right-clicking, and selecting “Inspect“. You can see now the HTML code like this:

<a href="/gbs/sof/d/senior-ui-developer/6255464367.html" data-id="6255464367" class="result-title hdrlnk">Senior UI Developer</a>

1 2	<a href="/gbs/sof/d/senior-ui-developer/6255464367.html" data-id="6255464367" class="result-title hdrlnk">Senior UI Developer</a>

You can try the same for addresses. The following code will extract all <span> tags whose class name is ‘result-hood’.

addresses = soup.findAll('span', {'class': 'result-hood'})

for address in addresses:
   print(address.text)

addresses = soup.findAll('span', {'class': 'result-hood'})

for address in addresses:

print(address.text)

Copy the code below, run it on your machine, and let me know if you have questions.

from bs4 import BeautifulSoup
import requests

url = "https://boston.craigslist.org/search/sof"

# Getting the webpage, creating a Response object.
response = requests.get(url)

# Extracting the source code of the page.
data = response.text

# Passing the source code to Beautiful Soup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')

# Extracting all the <a> tags whose class name is 'result-title' into a list.
titles = soup.findAll('a', {'class': 'result-title'})

# Extracting text from the the <a> tags, i.e. class titles.
for title in titles:
    print(title.text)

from bs4 import BeautifulSoup

import requests

url = "https://boston.craigslist.org/search/sof"

# Getting the webpage, creating a Response object.

response = requests.get(url)

# Extracting the source code of the page.

data = response.text

# Passing the source code to Beautiful Soup to create a BeautifulSoup object for it.

soup = BeautifulSoup(data, 'lxml')

# Extracting all the <a> tags whose class name is 'result-title' into a list.

titles = soup.findAll('a', {'class': 'result-title'})

# Extracting text from the the <a> tags, i.e. class titles.

for title in titles:

print(title.text)

✅ ✅ ✅ If you want to learn more about web scraping, you can join this online video course:

Web Scraping with Python: BeautifulSoup, Requests & Selenium 👈

👆

Rating: 5.0/5. From 2 votes.

Please wait...

Beautiful Soup Tutorial #3: Web Scraping Craigslist (One Page)

Beautiful Soup Tutorial #1: Install BeautifulSoup, Requests & LXML

Beautiful Soup Tutorial #2: Extracting URLs

Related

Leave a Reply Cancel reply

Beautiful Soup Tutorial #1: Install BeautifulSoup, Requests & LXML

Beautiful Soup Tutorial #2: Extracting URLs

Share this tutorial:

Related

Leave a Reply Cancel reply

Want to learn more?