Selenium: Web Scraping Booking.com Accommodations

Booking.com is a travel fare aggregator website and travel metasearch engine for lodging reservations. This websites has more than 29,094,365 listings in 230 countries and territories worldwide.

Websites like Booking.com contains a lot of data that can be scraped and processes that can be automatized.

In this Selenium tutorial, will learn how to automate an accommodation search and to scrape the results using Python with Selenium.

We could use the Booking API to make all this process, but in this tutorial is want to help you learn Selenium in a practical way so you can build something useful and learn at the same time.

Let’s start working!

Tutorial Contents

Prepare Workspace

For this tutorial, we will be using Python 3.7.1 and Selenium. You will also need Firefox or Google Chrome in order to run the Selenium WebDriver.

Create Virtual Environment

Although it is optional, it is recommended you create a virtual environment for this project using virtualenv

virtualenv bookingSelenium

1 2	virtualenv bookingSelenium

Inside your folder bookingSelenium, activate the virtual environment using:

source bin/activate

1 2	source bin/activate

Install Selenium

All what you need for this project is Selenium. You can install Selenium with any Python package manager as pip :

pip install selenium

1 2	pip install selenium

Scraping Process

There are several ways in which you can scrape websites. Since we are working with Selenium, we can handle Javascript in the pages and scrape them in a very direct way. Let’s see how many steps will be taking for this scraping process:

Let your Selenium WebDriver enter in the domain (booking.com).
Perform a search in the main page with the parameters that the script will be receiving.
When the search results are ready, scrape all the data in those links.
When you reach the amount of results needed, stop the scraping and import those results to JSON format.

Prepare WebDriver

What is WebDriver?

Selenium is a browser automation tool that controls web browser instances and make it easy to do repetitive tasks. The Python Selenium API has the WebDriver class that helps you to write the instructions for the browser in Python.

A WebDriver object is just a Python class that is linked to a browser process and gives the programmer the ease to control the browser state through Python code.

Download GeckoDriver

What is Gecko and GeckoDriver? Gecko is a web browser engine used in some browsers such as Firefox. GeckoDriver acts as the link between your scripts in Selenium and the Firefox browser.

Download your operating system compatible GeckoDriver at : https://github.com/mozilla/geckodriver/releases

If you are in a Archlinux derived distribution you can use any package manager to download the Geckodriver package:

sudo pacman -S geckodriver

1 2	sudo pacman -S geckodriver

After the installation you need to know where your geckodriver is located. In Linux you can use the which command to the location of any script or program in your system:

$ which geckodriver
/usr/bin/geckodriver

$ which geckodriver

/usr/bin/geckodriver

This location needs to be in the system path to be able to use the geckodriver:

export PATH=$PATH:/usr/bin/geckodriver

1 2	export PATH=$PATH:/usr/bin/geckodriver

Now we can use the geckodriver in our script.

Import Required Classes

To use Selenium WebDriver class, import these classes from this package:

import selenium
from selenium.webdriver import Firefox
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

import selenium

from selenium.webdriver import Firefox

from selenium.webdriver.common.by import By

from selenium.webdriver.firefox.options import Options

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.wait import WebDriverWait

Let’s explain the Python packages needed to use Selenium API one by one:

from selenium.webdriver import Firefox specifies that the browser that you want to automate will be an instance of Firefox web browser. For using a Chrome browser, import selenium.webdriver.Chrome instead.
from selenium.webdriver.common.by import By helps you locate elements in a webpage by its tag name, class name, css selector and xpath.
from selenium.webdriver.firefox.options import Options can hold a list of arguments that will be passed to your Firefox WebDriver.
from selenium.webdriver.support import expected_conditions as EC allows you to define conditions for our browser.
from selenium.webdriver.support.wait import WebDriverWait allows you to define implicit and explicit waits.

1. Browse Website with Selenium WebDriver

def prepare_driver(url):
    '''Returns a Firefox Webdriver.'''
    options = Options()
    options.add_argument('-headless')
    driver = Firefox(executable_path='geckodriver', options=options)
    driver.get(url)
    wait = WebDriverWait(driver, 10).until(EC.presence_of_element_located(
        (By.ID, 'ss')))
    return driver

def prepare_driver(url):

'''Returns a Firefox Webdriver.'''

options = Options()

options.add_argument('-headless')

driver = Firefox(executable_path='geckodriver', options=options)

driver.get(url)

wait = WebDriverWait(driver, 10).until(EC.presence_of_element_located(

(By.ID, 'ss')))

return driver

• Headless: In this tutorial, we will use our browser in the headless mode, this way the browser will run normally but without any visible graphical user interface components. Though not useful for surfing the web, it comes into its own with automatization.

In order to Firefox to run in headless mode we’ll need to create an Options object and add the -headless argument to it.

• GeckoDriver Path: Specify the GeckoDriver location (that you have downloaded in the Preparing WebDriver section of this tutorial) passing it in the executable_path argument. With this, we’ll have our WebDriver ready and waiting for instructions.

• URL: Our Selenium WebDriver object is just like a normal browser, so it can do everything a normal browser do. One of the most common task is visit a URL, this can be performed just with one line of code:

driver.get(url)

1 2	driver.get(url)

The get() method tells our WebDriver to visit a URL and nothing more.

Our WebDriver will be visiting booking.com and from there we’ll start the scraping process.

• Wait: We need to wait until the main page’s search bar is available to continue, for that we’re using the WebDriverWait class which defines an explicit wait in our WebDriver. How do we know that we need to wait for the element with the ID ss ? Well, since we need the search bar ready to make the search, that’s the element we are telling our WebDriver to wait. We know that it has the ss ID because we perform a simple “Inspect element” on it (this will be explained in detail later).

2. Perform a Search in Booking.com Homepage

How can you tell your WebDriver where to click or where to insert text? The WebDriver class has the find_element functions which allows you to find any element inside the current page.

There are several different functions depending on the way you look for elements in the page. You can look for elements using its class name, ID, tag name, XPath selector, link text, partial link text, name and css selector.

find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

To know which of these functions is better to use, you need to take a look at the page’s HTML code. This will tell you the more precise way to find the element you want.

You just need to do a right-click in the search form and click “Inspect element”. This will lead you to the element’s HTML code:

The highlighted line is the search form’s HTML code, let’s take a closer look:

<input type="search" name="ss" id="ss" class="c-autocomplete__input sb-searchbox__input sb-destination__input" 
placeholder="Where are you going?" value="" autocomplete="off" 
data-component="search/destination/input-placeholder" data-sb-id="main" data-input="" 
aria-autocomplete="both" aria-label="Type your destination">

<input type="search" name="ss" id="ss" class="c-autocomplete__input sb-searchbox__input sb-destination__input"

placeholder="Where are you going?" value="" autocomplete="off"

data-component="search/destination/input-placeholder" data-sb-id="main" data-input=""

aria-autocomplete="both" aria-label="Type your destination">

Here, you can see that the element’s ID is ss , knowing this you can use one of the find_element functions to tell our WebDriver the element it needs to locate.

Let’s define a function that will perform a search in the Booking.com main page:

def fill_form(driver, search_argument):
    '''Receives a search_argument to insert it in the search bar and 
    then clicks the search button.'''

    search_field = driver.find_element_by_id('ss')
    search_field.send_keys(search_argument)
    # We look for the search button and click it
    driver.find_element_by_class_name('sb-searchbox__button')\
        .click()
    wait = WebDriverWait(driver, timeout=10).until(
        EC.presence_of_all_elements_located(
            (By.CLASS_NAME, 'sr-hotel__title')))

def fill_form(driver, search_argument):

'''Receives a search_argument to insert it in the search bar and

then clicks the search button.'''

search_field = driver.find_element_by_id('ss')

search_field.send_keys(search_argument)

# We look for the search button and click it

driver.find_element_by_class_name('sb-searchbox__button')\

.click()

wait = WebDriverWait(driver, timeout=10).until(

EC.presence_of_all_elements_located(

(By.CLASS_NAME, 'sr-hotel__title')))

Pass your web driver and a string as arguments; this string will be the city in which you want to look for accommodations. Let’s review each line inside this function:

search_field = driver.find_element_by_id('ss') makes use of the driver.find_element_by_id() function that looks for an element with the ID passed as argument; as we know the IDs are unique so it’ll return the search bar.
search_field.send_keys(search_argument) since you already have the search bar selected, you have to tell your WebDriver to put some text on it. Here we use the send_keys(string) function which takes a string as argument and put it in the search form.
driver.find_element_by_class_name('sb-searchbox__button').click() since you have inserted the city you want to search in the search bar, you need to click the “Search” button to perform the search. Here we use the find_element_by_class_name() function which receives a string representing the class element we’re looking for and then we call the click() function which simply perform a click on the selected element.
wait = WebDriverWait(driver, timeout=10).until(EC.presence_of_all_elements_located( (By.CLASS_NAME, 'sr-hotel__title'))) here you are telling your WebDriver to wait until the elements with the class name sr-hotel-title (the one containing the accommodations titles) appear.

After this function completes its process, your WebDriver will be in the search results page showing something like this:

3. Scrape the Results

Since you have already performed your search, you can start to visit each hotel link and extract the data you need.

For the accommodations we’ll be extracting:

Name
Location
Popular Facilities
Review Score

Create a function that will extract a predetermined number of accommodation links and then scrape the data you want from them.

Let’s define other two functions, one will extract the links and the other will scrape the data from each link.

Extract accommodation links

You need to know how to extract all of the accommodations’ links in the search results page. Fortunately, Selenium have the find_elements functions that work just as find_element functions but finding all the elements with the specified feature instead of just one.

The syntax of find_elements functions is very similar; the only word that changes is “element” to “elements”:

find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector

We don’t have find_elements_by_id() function since the IDs are unique and there cannot be two elements with the same ID.

Using find_elements functions, you can now extract the accommodations links in the search results page. Inspecting one of the accommodations title, we find out that they share a common class which is sr-hotel__title

<h3 class="sr-hotel__title ">
    <a class="hotel_name_link url" href="
/hotel/es/ramblashotel.html?label=gen173nr-1FCAEoggI46AdIM1gEaPEBiAEBmAExuAEZyAEM2AEB6AEB-AECiAIBqAID&amp;sid=1e154445674ab2efff732c570110d2bc&amp;ucfs=1&amp;srpvid=cda75884d26c016e&amp;srepoch=1546950921&amp;hpos=1&amp;hapos=1&amp;dest_id=-372490&amp;dest_type=city&amp;sr_order=popularity&amp;from=searchresults
;highlight_room=#hotelTmpl" target="_blank" rel="noopener">
        <span class="sr-hotel__name" data-et-click="">
            Ramblas Hotel
        </span>
        <span class="invisible_spoken">Opens in new window</span>
    </a>
</h3>

<a class="hotel_name_link url" href="

/hotel/es/ramblashotel.html?label=gen173nr-1FCAEoggI46AdIM1gEaPEBiAEBmAExuAEZyAEM2AEB6AEB-AECiAIBqAID&sid=1e154445674ab2efff732c570110d2bc&ucfs=1&srpvid=cda75884d26c016e&srepoch=1546950921&hpos=1&hapos=1&dest_id=-372490&dest_type=city&sr_order=popularity&from=searchresults

;highlight_room=#hotelTmpl" target="_blank" rel="noopener">

Ramblas Hotel

</span>

<span class="invisible_spoken">Opens in new window</span>

</a>

</h3>

Use find_elements_by_class_name to select all of the h3 elements and then find the anchor tag inside these to extract the accommodation url :

accommodations_titles = driver.find_elements_by_class_name('sr-hotel__title')

1 2	accommodations_titles = driver.find_elements_by_class_name('sr-hotel__title')

That line will return a list of elements (h3 elements) and for each element will use again the find_element_by_class_name() function to find the anchor tag and extract its href attribute:

def scrape_results(driver, n_results):
    '''Returns the data from n_results amount of results.'''

    accommodations_urls = list()
    accommodations_data = list()
    
    # Get the accommodations links
    for accomodation_title in driver.find_elements_by_class_name('sr-hotel__title'):
        accommodations_urls.append(accomodation_title.find_element_by_class_name(
            'hotel_name_link').get_attribute('href'))

    # Iterate over a defined range and scrape the links
    for url in range(0, n_results):
        if url == n_results:
            break
        url_data = scrape_accommodation_data(driver, accommodations_urls[url])
        accommodations_data.append(url_data)
    
    return accommodations_data

def scrape_results(driver, n_results):

'''Returns the data from n_results amount of results.'''

accommodations_urls = list()

accommodations_data = list()

# Get the accommodations links

for accomodation_title in driver.find_elements_by_class_name('sr-hotel__title'):

accommodations_urls.append(accomodation_title.find_element_by_class_name(

'hotel_name_link').get_attribute('href'))

# Iterate over a defined range and scrape the links

for url in range(0, n_results):

if url == n_results:

break

url_data = scrape_accommodation_data(driver, accommodations_urls[url])

accommodations_data.append(url_data)

return accommodations_data

So scrape just the number of results passed as argument to your function. Here scrape_accommodation_data(url) will visit the accommodation link and extract the data you want returning it as a Python dictionary.

Scrape Data from Accommodation Links

As we said earlier, the data that we’re going to scrape from each accommodation is the following:

Name
Location
Popular Facilities
Review Score

We will need to use the find_element and find_elements functions in order to achieve this.

First, create a Python dictionary so you can store the data there.

accommodation_fields = dict()

1 2	accommodation_fields = dict()

Then, tell your WebDriver to visit the accommodation url:

driver.get(accommodation_url)
time.sleep(10)

driver.get(accommodation_url)

time.sleep(10)

Here, use time.sleep(10) to tell Python that wait 10 seconds so the webpage can load correctly. We could use WebDriverWait, but we are going to scrape several similar pages and the elements that the WebDriverWait is waiting will always be ready, so it’s a better option to use the time library.

The next code is the one we’re going to use to extract each piece of information we want from the accommodations:

    # Get the accommodation name
    accommodation_fields['name'] = driver.find_element_by_id('hp_hotel_name')\
        .text.strip('Hotel')

    # Get the accommodation score
    accommodation_fields['score'] = driver.find_element_by_class_name(
        'bui-review-score--end').find_element_by_class_name(
        'bui-review-score__badge').text
    
    # Get the accommodation location
    accommodation_fields['location'] = driver.find_element_by_id('showMap2')\
        .find_element_by_class_name('hp_address_subtitle').text

    # Get the most popular facilities
    accommodation_fields['popular_facilities'] = list()
    facilities = driver.find_element_by_class_name('hp_desc_important_facilities')

    for facility in facilities.find_elements_by_class_name('important_facility'):
        accommodation_fields['popular_facilities'].append(facility.text)

# Get the accommodation name

accommodation_fields['name'] = driver.find_element_by_id('hp_hotel_name')\

.text.strip('Hotel')

# Get the accommodation score

accommodation_fields['score'] = driver.find_element_by_class_name(

'bui-review-score--end').find_element_by_class_name(

'bui-review-score__badge').text

# Get the accommodation location

accommodation_fields['location'] = driver.find_element_by_id('showMap2')\

.find_element_by_class_name('hp_address_subtitle').text

# Get the most popular facilities

accommodation_fields['popular_facilities'] = list()

facilities = driver.find_element_by_class_name('hp_desc_important_facilities')

for facility in facilities.find_elements_by_class_name('important_facility'):

accommodation_fields['popular_facilities'].append(facility.text)

Let’s explain the code line by line:

Python

accommodation_fields['name'] = driver.find_element_by_id('hp_hotel_name')\ .text.strip('Hotel')

1
2
3

accommodation_fields['name'] = driver.find_element_by_id('hp_hotel_name')\
.text.strip('Hotel')

To find the accomodation name, use the find_element_by_id(id) function, after that call its text attribute and strip the word “Hotel” from it.

accommodation_fields['score'] = driver.find_element_by_class_name(
        'bui-review-score--end').find_element_by_class_name(
        'bui-review-score__badge').text

accommodation_fields['score'] = driver.find_element_by_class_name(

'bui-review-score--end').find_element_by_class_name(

'bui-review-score__badge').text

The accommodation score is located in a kind of floating element, here use the find_element_by_class_name(class_name) function to find the outer element and then an inner element that is the one that contains the accommodation score.

Python

accommodation_fields['location'] = driver.find_element_by_id('showMap2')\ .find_element_by_class_name('hp_address_subtitle').text

1
2
3

accommodation_fields['location'] = driver.find_element_by_id('showMap2')\
.find_element_by_class_name('hp_address_subtitle').text

The accommodation’s location value is just below its name; if you inspect the HTML code, you will find out that it has a unique ID that you can use to find it.

for facility in facilities.find_elements_by_class_name('important_facility'):
        accommodation_fields['popular_facilities'].append(facility.text)

for facility in facilities.find_elements_by_class_name('important_facility'):

accommodation_fields['popular_facilities'].append(facility.text)

For the facilities, you need to extract all the elements with the class name “important_facility”, that is why we use the find_elements_by_class_name(class_name) function, we iterate over the list that this is going to return and extract the text from each element.

Let’s see the complete code for this function:

def scrape_accommodation_data(driver, accommodation_url):
    '''Visits an accommodation page and extracts the data.'''

    if driver == None:
        driver = prepare_driver(accommodation_url)

    driver.get(accommodation_url)
    time.sleep(12)

    accommodation_fields = dict()

    # Get the accommodation name
    accommodation_fields['name'] = driver.find_element_by_id('hp_hotel_name')\
        .text.strip('Hotel')

    # Get the accommodation score
    accommodation_fields['score'] = driver.find_element_by_class_name(
        'bui-review-score--end').find_element_by_class_name(
        'bui-review-score__badge').text
    
    # Get the accommodation location
    accommodation_fields['location'] = driver.find_element_by_id('showMap2')\
        .find_element_by_class_name('hp_address_subtitle').text

    # Get the most popular facilities
    accommodation_fields['popular_facilities'] = list()
    facilities = driver.find_element_by_class_name('hp_desc_important_facilities')

    for facility in facilities.find_elements_by_class_name('important_facility'):
        accommodation_fields['popular_facilities'].append(facility.text)
    
    return accommodation_fields

def scrape_accommodation_data(driver, accommodation_url):

'''Visits an accommodation page and extracts the data.'''

if driver == None:

driver = prepare_driver(accommodation_url)

driver.get(accommodation_url)

time.sleep(12)

accommodation_fields = dict()

# Get the accommodation name

accommodation_fields['name'] = driver.find_element_by_id('hp_hotel_name')\

.text.strip('Hotel')

# Get the accommodation score

accommodation_fields['score'] = driver.find_element_by_class_name(

'bui-review-score--end').find_element_by_class_name(

'bui-review-score__badge').text

# Get the accommodation location

accommodation_fields['location'] = driver.find_element_by_id('showMap2')\

.find_element_by_class_name('hp_address_subtitle').text

# Get the most popular facilities

accommodation_fields['popular_facilities'] = list()

facilities = driver.find_element_by_class_name('hp_desc_important_facilities')

for facility in facilities.find_elements_by_class_name('important_facility'):

accommodation_fields['popular_facilities'].append(facility.text)

return accommodation_fields

Run the Script

Since you have all the functions you need for your scraping process, it is time to tell your script the order in which they need to be executed.

if __name__ == '__main__':

    try:
        driver = prepare_driver(domain)
        fill_form(driver, 'Barcelona')
        accommodations_data = scrape_results(driver, 10)
        accommodations_data = json.dumps(accommodations_data, indent=4)
        with open('booking_data.json', 'w') as f:
            f.write(accommodations_data)
    finally:
        driver.quit()

if __name__ == '__main__':

try:

driver = prepare_driver(domain)

fill_form(driver, 'Barcelona')

accommodations_data = scrape_results(driver, 10)

accommodations_data = json.dumps(accommodations_data, indent=4)

with open('booking_data.json', 'w') as f:

f.write(accommodations_data)

finally:

driver.quit()

Here, call all your functions and receive the data you want from the accommodations. Then, using the json Python module, convert it in a json object and write it into a file.

Also, here you can change all of the functions parameters if you want; you can search for another city or another number of accommodations.

Complete Code of Selenium Web Scraping Tutorial

import selenium
import json
import time
import re
import string
import requests
import bs4
from selenium.webdriver import Firefox
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

domain =  'https://www.booking.com'

def prepare_driver(url):
    '''Returns a Firefox Webdriver.'''
    options = Options()
    # options.add_argument('-headless')
    driver = Firefox(executable_path='geckodriver', options=options)
    driver.get(url)
    wait = WebDriverWait(driver, 10).until(EC.presence_of_element_located(
        (By.ID, 'ss')))
    return driver

def fill_form(driver, search_argument):
    '''Finds all the input tags in form and makes a POST requests.'''
    search_field = driver.find_element_by_id('ss')
    search_field.send_keys(search_argument)
    # We look for the search button and click it
    driver.find_element_by_class_name('sb-searchbox__button')\
        .click()
    wait = WebDriverWait(driver, timeout=10).until(
        EC.presence_of_all_elements_located(
            (By.CLASS_NAME, 'sr-hotel__title')))

def scrape_results(driver, n_results):
    '''Returns the data from n_results amount of results.'''

    accommodations_urls = list()
    accommodations_data = list()

    for accomodation_title in driver.find_elements_by_class_name('sr-hotel__title'):
        accommodations_urls.append(accomodation_title.find_element_by_class_name(
            'hotel_name_link').get_attribute('href'))

    for url in range(0, n_results):
        if url == n_results:
            break
        url_data = scrape_accommodation_data(driver, accommodations_urls[url])
        accommodations_data.append(url_data)
    
    return accommodations_data

def scrape_accommodation_data(driver, accommodation_url):
    '''Visits an accommodation page and extracts the data.'''

    if driver == None:
        driver = prepare_driver(accommodation_url)

    driver.get(accommodation_url)
    time.sleep(12)

    accommodation_fields = dict()

    # Get the accommodation name
    accommodation_fields['name'] = driver.find_element_by_id('hp_hotel_name')\
        .text.strip('Hotel')

    # Get the accommodation score
    accommodation_fields['score'] = driver.find_element_by_class_name(
        'bui-review-score--end').find_element_by_class_name(
        'bui-review-score__badge').text
    
    # Get the accommodation location
    accommodation_fields['location'] = driver.find_element_by_id('showMap2')\
        .find_element_by_class_name('hp_address_subtitle').text

    # Get the most popular facilities
    accommodation_fields['popular_facilities'] = list()
    facilities = driver.find_element_by_class_name('hp_desc_important_facilities')

    for facility in facilities.find_elements_by_class_name('important_facility'):
        accommodation_fields['popular_facilities'].append(facility.text)
    
    return accommodation_fields

if __name__ == '__main__':

    try:
        driver = prepare_driver(domain)
        fill_form(driver, 'Barcelona')
        accommodations_data = scrape_results(driver, 10)
        accommodations_data = json.dumps(accommodations_data, indent=4)
        with open('booking_data.json', 'w') as f:
            f.write(accommodations_data)
    finally:
        driver.quit()

100

import selenium

import json

import time

import re

import string

import requests

import bs4

from selenium.webdriver import Firefox

from selenium.webdriver.common.by import By

from selenium.webdriver.common.keys import Keys

from selenium.webdriver.firefox.options import Options

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.wait import WebDriverWait

domain = 'https://www.booking.com'

def prepare_driver(url):

'''Returns a Firefox Webdriver.'''

options = Options()

# options.add_argument('-headless')

driver = Firefox(executable_path='geckodriver', options=options)

driver.get(url)

wait = WebDriverWait(driver, 10).until(EC.presence_of_element_located(

(By.ID, 'ss')))

return driver

def fill_form(driver, search_argument):

'''Finds all the input tags in form and makes a POST requests.'''

search_field = driver.find_element_by_id('ss')

search_field.send_keys(search_argument)

# We look for the search button and click it

driver.find_element_by_class_name('sb-searchbox__button')\

.click()

wait = WebDriverWait(driver, timeout=10).until(

EC.presence_of_all_elements_located(

(By.CLASS_NAME, 'sr-hotel__title')))

def scrape_results(driver, n_results):

'''Returns the data from n_results amount of results.'''

accommodations_urls = list()

accommodations_data = list()

for accomodation_title in driver.find_elements_by_class_name('sr-hotel__title'):

accommodations_urls.append(accomodation_title.find_element_by_class_name(

'hotel_name_link').get_attribute('href'))

for url in range(0, n_results):

if url == n_results:

break

url_data = scrape_accommodation_data(driver, accommodations_urls[url])

accommodations_data.append(url_data)

return accommodations_data

def scrape_accommodation_data(driver, accommodation_url):

'''Visits an accommodation page and extracts the data.'''

if driver == None:

driver = prepare_driver(accommodation_url)

driver.get(accommodation_url)

time.sleep(12)

accommodation_fields = dict()

# Get the accommodation name

accommodation_fields['name'] = driver.find_element_by_id('hp_hotel_name')\

.text.strip('Hotel')

# Get the accommodation score

accommodation_fields['score'] = driver.find_element_by_class_name(

'bui-review-score--end').find_element_by_class_name(

'bui-review-score__badge').text

# Get the accommodation location

accommodation_fields['location'] = driver.find_element_by_id('showMap2')\

.find_element_by_class_name('hp_address_subtitle').text

# Get the most popular facilities

accommodation_fields['popular_facilities'] = list()

facilities = driver.find_element_by_class_name('hp_desc_important_facilities')

for facility in facilities.find_elements_by_class_name('important_facility'):

accommodation_fields['popular_facilities'].append(facility.text)

return accommodation_fields

if __name__ == '__main__':

try:

driver = prepare_driver(domain)

fill_form(driver, 'Barcelona')

accommodations_data = scrape_results(driver, 10)

accommodations_data = json.dumps(accommodations_data, indent=4)

with open('booking_data.json', 'w') as f:

f.write(accommodations_data)

finally:

driver.quit()

After our script finishes its execution, you will have a booking_data.json file in our working folder.

I hope this tutorial has helped you learn more about Selenium, Python and web scraping in general.

Oswaldo Alcala

Hello! My name is Oswaldo; I’m a Mathematics student from Venezuela. I’m a Python programmer interested in Web Scraping, Machine learning and Mobile Development.

I like maths, coding and problem solving!

Rating: 4.3/5. From 4 votes.

Please wait...

Selenium: Web Scraping Booking.com Accommodations

Prepare Workspace

Create Virtual Environment

Install Selenium

Scraping Process

Prepare WebDriver

What is WebDriver?

Download GeckoDriver

Import Required Classes

1. Browse Website with Selenium WebDriver

2. Perform a Search in Booking.com Homepage

3. Scrape the Results

Extract accommodation links

Scrape Data from Accommodation Links

Run the Script

Complete Code of Selenium Web Scraping Tutorial

Related

Leave a Reply Cancel reply

Prepare Workspace

Create Virtual Environment

Install Selenium

Scraping Process

Prepare WebDriver

What is WebDriver?

Download GeckoDriver

Import Required Classes

1. Browse Website with Selenium WebDriver

2. Perform a Search in Booking.com Homepage

3. Scrape the Results

Extract accommodation links

Scrape Data from Accommodation Links

Run the Script

Complete Code of Selenium Web Scraping Tutorial

Share this tutorial:

Related

Leave a Reply Cancel reply

Want to learn more?