So let’s assume we want to scrape the titles of jobs available in Boston from Craigslist. For now, we will work on one page only.
We will simply search the website, and get the URL: https://boston.craigslist.org/search/sof
If you have not already, please revise the previous BeautifulSoup Tutorials:
-
Beautiful Soup Tutorial #1: Install BeautifulSoup, Requests & LXML
-
Beautiful Soup Tutorial #2: Extracting URLs
In the current tutorial, the only new part is:
1 2 3 4 5 |
titles = soup.findAll('a', {'class': 'result-title'}) for title in titles: print(title.text) |
Now, it will find all <a> tags whose class name is ‘result-title’.
How can you know the class name? You can find it by opening the URL in your browser, moving the cursor on a job title, right-clicking, and selecting “Inspect“. You can see now the HTML code like this:
1 2 |
<a href="/gbs/sof/d/senior-ui-developer/6255464367.html" data-id="6255464367" class="result-title hdrlnk">Senior UI Developer</a> |
You can try the same for addresses. The following code will extract all <span> tags whose class name is ‘result-hood’.
1 2 3 4 5 |
addresses = soup.findAll('span', {'class': 'result-hood'}) for address in addresses: print(address.text) |
Copy the code below, run it on your machine, and let me know if you have questions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
from bs4 import BeautifulSoup import requests url = "https://boston.craigslist.org/search/sof" # Getting the webpage, creating a Response object. response = requests.get(url) # Extracting the source code of the page. data = response.text # Passing the source code to Beautiful Soup to create a BeautifulSoup object for it. soup = BeautifulSoup(data, 'lxml') # Extracting all the <a> tags whose class name is 'result-title' into a list. titles = soup.findAll('a', {'class': 'result-title'}) # Extracting text from the the <a> tags, i.e. class titles. for title in titles: print(title.text) |
✅ ✅ ✅ If you want to learn more about web scraping, you can join this online video course:
Web Scraping with Python: BeautifulSoup, Requests & Selenium 👈
👆