Extracting YouTube Comments with YouTube API & Python

YouTube is the world’s largest video-sharing site with about 1.9 billion monthly active users. People use it to share info, teach, entertain, advertise and much more.

So YouTube has so much data that one can utilize to carry out research and analysis. For example, extracting YouTube video comments can be useful to run Sentiment Analysis and other Natural Language Processing tasks. YouTube API enables you to search for videos matching specific search criteria.

In this tutorial, you will learn how to extract comments from YouTube videos and store them in a CSV file using Python. It will cover setting up a project on Google console, enabling the necessary YouTube API and finally writing the script that interacts with the YouTube API.

YouTube Data API

Project Setup

In order to access the YouTube Data API, you need to have a project on Google Console. This is because you need to obtain authorization credentials to make API calls in your application.

Head over to the Google Console and create a new project. One thing to note is that you will need a Google account to access the console.

Click Select a project then New Project where you will get to enter the name of the project.

 

Enter the project name and click Create. It will take a couple of seconds for the project to be created.

 

API Activation

Now that you have created the project, you need to enable the YouTube Data API.

Click Enable APIs and Services in order to enable the necessary API.

 

Type the word “youtube” in the search  box, then click the card with YouTube Data API v3 text.

 

Finally, click Enable.

 

Credentials Setup

Now that you have enabled the YouTube Data API, you need to setup the necessary credentials.

Click Create Credentials.

In the next page click Cancel.

 

Click the OAuth consent screen tab and fill in the application and email address. .

 

Scroll down and click Save.

 

Select the Credentials tab, click Create Credentials and select OAuth client ID.

 

 

Select the application type Other, enter the name “YouTube Comment Extractor”, and click the Create button.

Click OK to dismiss the resulting dialog.

Click the file download button (Download JSON) to the right of the client ID.

Finally, move the downloaded file to your working directory and rename it client_secret.json.

 

Client Installation

Now that you have setup the credentials to access the API, you need to install the Google API client library. You can do so by running:

pip install google-api-python-client

You need to install additional libraries which will handle authentication

pip install google-auth google-auth-oauthlib google-auth-httplib2

 

Client Setup

Since the Google API client is usually used to access to access all Google APIs, you need to restrict the scope the to YouTube.

First, you need to specify the credential file you downloaded earlier.

 

Next, you need to restrict access by specifying the scope.

 

Now that you have successfully defined the scope, you need to build a service that will be responsible for interacting with the API. The following function grabs the constants defined before, builds and returns the service that will interact with the API.

 

Now add the following lines and run your script to make sure the client has been setup properly.

 

When you run the script you will be presented with an authorization URL. Copy it and open it in your browser.

 

Select your desired account.

 

Grant your script the requested permissions.

 

Confirm your choice.

 

Copy and paste the code from the browser back in the Terminal / Command Prompt.

At this point, your script should exit successfully indicating that you have properly setup your client.

Cache Credentials

If you run the script again you will notice that you have to go through the entire authorization process. This can be quite annoying if you have to run your script multiple times. You will need to cache the credentials so that they are reused every time you run the script. Make the following changes to the

get_authenticated_service function.

What you have added is the caching of credentials retrieved  and storing them in a file using Python’s pickle format. The authorization flow is only launched if the stored file does not exist, or the credentials in the stored file are invalid or have expired.

If you run the script again you will notice that a file named token.pickle is created. Once this file is created, running the script again does not launch the authorization flow.

Search Videos by Keyword

The next step is to receive the keyword from the user.

 

You need to use the keyword received from the user in conjunction with the service to search for videos  that much the keyword. You’ll need to implement a function that does the searching.

 

If you run script again and use async python as the keyword input  you will get the following output.

 

Navigate Multiple Pages of Search Results

The size of the results will vary depending on the keyword. Note that the results returned are restricted  to the first page. YouTube API automatically paginates results in order to make it easier to consume them. If the results for a query span multiple pages, you can navigate each page  by using the pageToken parameter. For this tutorial you only need to get results from the first three pages.

Currently, the  search_videos_by_keyword function that we already created only returns from the first page so you need to modify it. In order to separate the logic, you will need to create a new function which fetches videos from the first three pages.

The get_pages  function does a couple of things. First of all it fetches the first page that has results that correspond to the keyword. Then it keeps fetching results as long as there are results to be fetched and the max pages has not been reached.

 

Get Video Comments

Now that you have gotten the videos that matched the keyword you can proceed to extract the comments for each video.

When dealing with comments in the YouTube API, there are couple of distinctions you have to make.

First of all there is a Comment Thread. This is basically the entire box. A comment thread is made up of one or  more comments. For each comment thread there is usually only one parent comment (Pointed by arrow). For this tutorial you only need to get the parent comment from each comment thread.

 

Like before, you will need to put this logic into function.

 

The part you really need to take note of is the following snippet:

 

Since you need to obtain all the top level comments of a video, you need to continuously check if there is more data to be loaded and fetch it till there is none left. Besides some minor modifications, it is quite similar to the logic used in the get_videos  function.

Modify the search_videos_by_keyword  function so that you call function you have just added.

 

If you run the script and use async python as the keyword, you should end up with the following output.

You will not that some videos had multiple top level comments, while others had one and others none.

 

Now that you’ve obtained the comments, you need to join them into a single list so that you can write the results to a file.

Modify the search_videos_by_keyword function again as follows.

Here, you creating a list which will hold all the comments and populating it using its extend method, from the contents of other lists.

 

Store Comments in CSV File

Now you need to write all the comments into a CSV file. Like before, you will put this logic in a separate function.

Modify the search_videos_by_keyword function and add  a call to write_to_csv  at the bottom.

If you run the script, the comments found will be stored in a file called comments.csv. Its contents will be similar to the following format:

Note. All google APIs have rate limiting so you should try not to make too many API calls.

 

Complete Project Code

Here is the final Python code for using YouTube API to search for a keyword and extract comments on resulted videos.

 

Course: REST API: Data Extraction with Python

Working with APIs is a skill requested for many jobs. Why?

APIs is the official way for data extraction and doing other automation stuff allowed by big websites. If there is an API allowing you to extract the data you need from a website, then you do not need regular web scraping.

Join our new course, REST APIs: Data Extraction and Automation with Python, for 90% OFF using this coupon:

https://www.udemy.com/course/rest-api-data-extraction-automation-python/?couponCode=REST-API-BLOG-YT

 

 

Rating: 4.8/5. From 24 votes.
Please wait...

8 Replies to “Extracting YouTube Comments with YouTube API & Python”

  1. Vevry usefull, and thanks.
    But I get a error :
    {
    “error”: {
    “errors”: [
    {
    “domain”: “usageLimits”,
    “reason”: “dailyLimitExceededUnreg”,
    “message”: “Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup.”,
    “extendedHelp”: “https://code.google.com/apis/console”
    }
    ],
    “code”: 403,
    “message”: “Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup.”
    }
    }
    How to solve this?

    Rating: 3.8/5. From 5 votes.
    Please wait...
    1. @ro mao – exactly as the error message said: “Daily Limit for Unauthenticated Use Exceeded.”
      Check this answer.

      Rating: 3.7/5. From 3 votes.
      Please wait...
  2. This code shows me invalid grant access error.

    Rating: 4.3/5. From 3 votes.
    Please wait...
  3. any way to not have to do that Oauth setup each time you run the script?

    Rating: 5.0/5. From 1 vote.
    Please wait...
  4. This was genuinely so helpful to me for a work project I was doing — I could not figure out the oauth bit for my LIFE and it prevented me from getting to all the fun stuff. Bless you!

    No votes yet.
    Please wait...
    1. Emma, glad the tutorial helped you. All the best!

      No votes yet.
      Please wait...
  5. Really helpful post and easy to follow. I’m running in to some character encoding issues when writing to csv, I think this is being triggered by emoji content in the comments, any recommended approaches for handling this?

    Rating: 5.0/5. From 1 vote.
    Please wait...
    1. Hi Tom! Thanks for your comment! Are you using Python 2 or 3. If you are not, please try using Python 3 and let us know if the issue was solved. Of course, Python 2 has ways to handle encoding, but Python 3 takes care of most of this by default. All the best!

      No votes yet.
      Please wait...

Leave a Reply