Extracting YouTube Comments with YouTube API & Python

YouTube is the world’s largest video-sharing site with about 1.9 billion monthly active users. People use it to share info, teach, entertain, advertise and much more.

So YouTube has so much data that one can utilize to carry out research and analysis. For example, extracting YouTube video comments can be useful to run Sentiment Analysis and other Natural Language Processing tasks. YouTube API enables you to search for videos matching specific search criteria.

In this tutorial, you will learn how to extract comments from YouTube videos and store them in a CSV file using Python. It will cover setting up a project on Google console, enabling the necessary YouTube API and finally writing the script that interacts with the YouTube API.

Tutorial Contents

YouTube Data API

Project Setup

In order to access the YouTube Data API, you need to have a project on Google Console. This is because you need to obtain authorization credentials to make API calls in your application.

Head over to the Google Console and create a new project. One thing to note is that you will need a Google account to access the console.

Click Select a project then New Project where you will get to enter the name of the project.

Enter the project name and click Create. It will take a couple of seconds for the project to be created.

API Activation

Now that you have created the project, you need to enable the YouTube Data API.

Click Enable APIs and Services in order to enable the necessary API.

Type the word “youtube” in the search box, then click the card with YouTube Data API v3 text.

Finally, click Enable.

Credentials Setup

Now that you have enabled the YouTube Data API, you need to setup the necessary credentials.

Click Create Credentials.

In the next page click Cancel.

Click the OAuth consent screen tab and fill in the application and email address. .

Scroll down and click Save.

Select the Credentials tab, click Create Credentials and select OAuth client ID.

Select the application type Other, enter the name “YouTube Comment Extractor”, and click the Create button.

Click OK to dismiss the resulting dialog.

Click the file download button (Download JSON) to the right of the client ID.

Finally, move the downloaded file to your working directory and rename it client_secret.json.

Client Installation

Now that you have setup the credentials to access the API, you need to install the Google API client library. You can do so by running:

pip install google-api-python-client

You need to install additional libraries which will handle authentication

pip install google-auth google-auth-oauthlib google-auth-httplib2

Client Setup

Since the Google API client is usually used to access to access all Google APIs, you need to restrict the scope the to YouTube.

First, you need to specify the credential file you downloaded earlier.

CLIENT_SECRETS_FILE = "client_secret.json"

1 2	CLIENT_SECRETS_FILE = "client_secret.json"

Next, you need to restrict access by specifying the scope.

SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'

SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']

API_SERVICE_NAME = 'youtube'

API_VERSION = 'v3'

Now that you have successfully defined the scope, you need to build a service that will be responsible for interacting with the API. The following function grabs the constants defined before, builds and returns the service that will interact with the API.

import google.oauth2.credentials

from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google_auth_oauthlib.flow import InstalledAppFlow

def get_authenticated_service():
    flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRETS_FILE, SCOPES)
    credentials = flow.run_console()
    return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

import google.oauth2.credentials

from googleapiclient.discovery import build

from googleapiclient.errors import HttpError

from google_auth_oauthlib.flow import InstalledAppFlow

def get_authenticated_service():

flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRETS_FILE, SCOPES)

credentials = flow.run_console()

return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

Now add the following lines and run your script to make sure the client has been setup properly.

if __name__ == '__main__':
    # When running locally, disable OAuthlib's HTTPs verification. When
    # running in production *do not* leave this option enabled.
    os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'
    service = get_authenticated_service()

if __name__ == '__main__':

# When running locally, disable OAuthlib's HTTPs verification. When

# running in production *do not* leave this option enabled.

os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'

service = get_authenticated_service()

When you run the script you will be presented with an authorization URL. Copy it and open it in your browser.

Select your desired account.

Grant your script the requested permissions.

Confirm your choice.

Copy and paste the code from the browser back in the Terminal / Command Prompt.

At this point, your script should exit successfully indicating that you have properly setup your client.

Cache Credentials

If you run the script again you will notice that you have to go through the entire authorization process. This can be quite annoying if you have to run your script multiple times. You will need to cache the credentials so that they are reused every time you run the script. Make the following changes to the

get_authenticated_service function.

import os
import pickle
import google.oauth2.credentials

from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

...
...

def get_authenticated_service():
    credentials = None
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            credentials = pickle.load(token)
    #  Check if the credentials are invalid or do not exist
    if not credentials or not credentials.valid:
        # Check if the credentials have expired
        if credentials and credentials.expired and credentials.refresh_token:
            credentials.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                CLIENT_SECRETS_FILE, SCOPES)
            credentials = flow.run_console()

        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(credentials, token)

    return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

import os

import pickle

import google.oauth2.credentials

from googleapiclient.discovery import build

from googleapiclient.errors import HttpError

from google_auth_oauthlib.flow import InstalledAppFlow

from google.auth.transport.requests import Request

...

def get_authenticated_service():

credentials = None

if os.path.exists('token.pickle'):

with open('token.pickle', 'rb') as token:

credentials = pickle.load(token)

# Check if the credentials are invalid or do not exist

if not credentials or not credentials.valid:

# Check if the credentials have expired

if credentials and credentials.expired and credentials.refresh_token:

credentials.refresh(Request())

else:

flow = InstalledAppFlow.from_client_secrets_file(

CLIENT_SECRETS_FILE, SCOPES)

credentials = flow.run_console()

# Save the credentials for the next run

with open('token.pickle', 'wb') as token:

pickle.dump(credentials, token)

return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

What you have added is the caching of credentials retrieved and storing them in a file using Python’s pickle format. The authorization flow is only launched if the stored file does not exist, or the credentials in the stored file are invalid or have expired.

If you run the script again you will notice that a file named token.pickle is created. Once this file is created, running the script again does not launch the authorization flow.

Search Videos by Keyword

The next step is to receive the keyword from the user.

keyword = input('Enter a keyword: ')

1 2	keyword = input('Enter a keyword: ')

You need to use the keyword received from the user in conjunction with the service to search for videos that much the keyword. You’ll need to implement a function that does the searching.

def search_videos_by_keyword(service, **kwargs):
    results = service.search().list(**kwargs).execute()
    for item in results['items']:
        print('%s - %s' % (item['snippet']['title'], item['id']['videoId']))


....
keyword = input('Enter a keyword: ')
search_videos_by_keyword(service, q=keyword, part='id,snippet', eventType='completed', type='video')

def search_videos_by_keyword(service, **kwargs):

results = service.search().list(**kwargs).execute()

for item in results['items']:

print('%s - %s' % (item['snippet']['title'], item['id']['videoId']))

....

keyword = input('Enter a keyword: ')

search_videos_by_keyword(service, q=keyword, part='id,snippet', eventType='completed', type='video')

If you run script again and use async python as the keyword input you will get the following output.

Hacking Livestream #64: async/await in Python 3 - CD8s0qwjpoQ
Asynchronous input with Python and Asyncio - DYhAoM1Kny0
In Python Threads != Async - GMewz5Pf2lU
4_05 You Might Not Want Async (in Python) - IBA89nFEQ8U
Python, Asynchronous Programming - qJJtGNL9VnM

Hacking Livestream #64: async/await in Python 3 - CD8s0qwjpoQ

Asynchronous input with Python and Asyncio - DYhAoM1Kny0

In Python Threads != Async - GMewz5Pf2lU

4_05 You Might Not Want Async (in Python) - IBA89nFEQ8U

Python, Asynchronous Programming - qJJtGNL9VnM

Navigate Multiple Pages of Search Results

The size of the results will vary depending on the keyword. Note that the results returned are restricted to the first page. YouTube API automatically paginates results in order to make it easier to consume them. If the results for a query span multiple pages, you can navigate each page by using the pageToken parameter. For this tutorial you only need to get results from the first three pages.

Currently, the search_videos_by_keyword function that we already created only returns from the first page so you need to modify it. In order to separate the logic, you will need to create a new function which fetches videos from the first three pages.

def get_videos(service, **kwargs):
    final_results = []
    results = service.search().list(**kwargs).execute()

    i = 0
    max_pages = 3
    while results and i < max_pages:
        final_results.extend(results['items'])

        # Check if another page exists
        if 'nextPageToken' in results:
            kwargs['pageToken'] = results['nextPageToken']
            results = service.search().list(**kwargs).execute()
            i += 1
        else:
            break

    return final_results

def search_videos_by_keyword(service, **kwargs):
    results = get_videos(service, **kwargs)
    for item in results:
        print('%s - %s' % (item['snippet']['title'], item['id']['videoId']))


....
keyword = input('Enter a keyword: ')
search_videos_by_keyword(service, q=keyword, part='id,snippet', eventType='completed', type='video')

def get_videos(service, **kwargs):

final_results = []

results = service.search().list(**kwargs).execute()

i = 0

max_pages = 3

while results and i < max_pages:

final_results.extend(results['items'])

# Check if another page exists

if 'nextPageToken' in results:

kwargs['pageToken'] = results['nextPageToken']

results = service.search().list(**kwargs).execute()

i += 1

else:

break

return final_results

def search_videos_by_keyword(service, **kwargs):

results = get_videos(service, **kwargs)

for item in results:

print('%s - %s' % (item['snippet']['title'], item['id']['videoId']))

....

keyword = input('Enter a keyword: ')

search_videos_by_keyword(service, q=keyword, part='id,snippet', eventType='completed', type='video')

The get_pages function does a couple of things. First of all it fetches the first page that has results that correspond to the keyword. Then it keeps fetching results as long as there are results to be fetched and the max pages has not been reached.

Get Video Comments

Now that you have gotten the videos that matched the keyword you can proceed to extract the comments for each video.

When dealing with comments in the YouTube API, there are couple of distinctions you have to make.

First of all there is a Comment Thread. This is basically the entire box. A comment thread is made up of one or more comments. For each comment thread there is usually only one parent comment (Pointed by arrow). For this tutorial you only need to get the parent comment from each comment thread.

Like before, you will need to put this logic into function.

def get_video_comments(service, **kwargs):
    comments = []
    results = service.commentThreads().list(**kwargs).execute()

    while results:
        for item in results['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            comments.append(comment)

        if 'nextPageToken' in results:
            kwargs['pageToken'] = results['nextPageToken']
            results = service.commentThreads().list(**kwargs).execute()
        else:
            break

    return comments

def get_video_comments(service, **kwargs):

comments = []

results = service.commentThreads().list(**kwargs).execute()

while results:

for item in results['items']:

comment = item['snippet']['topLevelComment']['snippet']['textDisplay']

comments.append(comment)

if 'nextPageToken' in results:

kwargs['pageToken'] = results['nextPageToken']

results = service.commentThreads().list(**kwargs).execute()

else:

break

return comments

The part you really need to take note of is the following snippet:

if 'nextPageToken' in results: 
    kwargs['pageToken'] = results['nextPageToken'] 
    results = service.commentThreads().list(**kwargs).execute() 
else: 
    break

if 'nextPageToken' in results:

kwargs['pageToken'] = results['nextPageToken']

results = service.commentThreads().list(**kwargs).execute()

else:

break

Since you need to obtain all the top level comments of a video, you need to continuously check if there is more data to be loaded and fetch it till there is none left. Besides some minor modifications, it is quite similar to the logic used in the get_videos function.

Modify the search_videos_by_keyword function so that you call function you have just added.

def search_videos_by_keyword(service, **kwargs):
    results = get_videos(service, **kwargs)
    for item in results:
        title = item['snippet']['title']
        video_id = item['id']['videoId']
        comments = get_video_comments(service, part='snippet', videoId=video_id, textFormat='plainText')
        
        print(comments)

def search_videos_by_keyword(service, **kwargs):

results = get_videos(service, **kwargs)

for item in results:

title = item['snippet']['title']

video_id = item['id']['videoId']

comments = get_video_comments(service, part='snippet', videoId=video_id, textFormat='plainText')

print(comments)

If you run the script and use async python as the keyword, you should end up with the following output.

['TIL: for/else Nice', 'You weren’t able to figure it out today, but I enjoyed the journey a lot. Keep up the great work.', 'Start @ 3:35', "AFAIK await  is still just like yield from and coroutines are just like generators, they just made yield from only compatible with generators and await - with coroutines.\nSeconding David Beazley recommendation, his presentations are amazing. He shows how to run a coroutine at https://youtu.be/E-1Y4kSsAFc?t=774 Other presentations (some are about async) are at dabeaz.com/talks.html\nAlso, if you want to read sources of an async library, I'd recommend David's Curio or more production-ready and less experimental Trio. Asyncio creates too many abstractions and entities to be easily comprehended."]
['good job Hoff i like the Asyncio video', "I came here after watching a video on Hall PC, the Windows NT and OS/2 Shoutout from 1993 and they described this as being a feature in Windows 3.11 NT and OS/2 that year. Before they had this you usually had to wait until after an hour glass ended before you could use your other application you had opened.  I didn't think it actually had other applications other than in system programming. Very interesting stuff, btw I really don't feel confident in writing my own operating system.", 'Is there a previous video, or are you just referencing offscreen stuff at the beginning?']
['Skip first 20 minutes']
[]
[]
[]
[]
.....

['TIL: for/else Nice', 'You weren’t able to figure it out today, but I enjoyed the journey a lot. Keep up the great work.', 'Start @ 3:35', "AFAIK await is still just like yield from and coroutines are just like generators, they just made yield from only compatible with generators and await - with coroutines.\nSeconding David Beazley recommendation, his presentations are amazing. He shows how to run a coroutine at https://youtu.be/E-1Y4kSsAFc?t=774 Other presentations (some are about async) are at dabeaz.com/talks.html\nAlso, if you want to read sources of an async library, I'd recommend David's Curio or more production-ready and less experimental Trio. Asyncio creates too many abstractions and entities to be easily comprehended."]

['good job Hoff i like the Asyncio video', "I came here after watching a video on Hall PC, the Windows NT and OS/2 Shoutout from 1993 and they described this as being a feature in Windows 3.11 NT and OS/2 that year. Before they had this you usually had to wait until after an hour glass ended before you could use your other application you had opened. I didn't think it actually had other applications other than in system programming. Very interesting stuff, btw I really don't feel confident in writing my own operating system.", 'Is there a previous video, or are you just referencing offscreen stuff at the beginning?']

['Skip first 20 minutes']

[]

.....

You will not that some videos had multiple top level comments, while others had one and others none.

Now that you’ve obtained the comments, you need to join them into a single list so that you can write the results to a file.

Modify the search_videos_by_keyword function again as follows.

def search_videos_by_keyword(service, **kwargs):
    results = get_videos(service, **kwargs)
    final_result = []
    for item in results:
        title = item['snippet']['title']
        video_id = item['id']['videoId']
        comments = get_video_comments(service, part='snippet', videoId=video_id, textFormat='plainText')
        final_result.extend([(video_id, title, comment) for comment in comments])

def search_videos_by_keyword(service, **kwargs):

results = get_videos(service, **kwargs)

final_result = []

for item in results:

title = item['snippet']['title']

video_id = item['id']['videoId']

comments = get_video_comments(service, part='snippet', videoId=video_id, textFormat='plainText')

final_result.extend([(video_id, title, comment) for comment in comments])

Here, you creating a list which will hold all the comments and populating it using its extend method, from the contents of other lists.

Store Comments in CSV File

Now you need to write all the comments into a CSV file. Like before, you will put this logic in a separate function.

import csv

def write_to_csv(comments):
    with open('comments.csv', 'w') as comments_file:
        comments_writer = csv.writer(comments_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        comments_writer.writerow(['Video ID', 'Title', 'Comment'])
        for row in comments:
            comments_writer.writerow(list(row))

import csv

def write_to_csv(comments):

with open('comments.csv', 'w') as comments_file:

comments_writer = csv.writer(comments_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

comments_writer.writerow(['Video ID', 'Title', 'Comment'])

for row in comments:

comments_writer.writerow(list(row))

Modify the search_videos_by_keyword function and add a call to write_to_csv at the bottom.

If you run the script, the comments found will be stored in a file called comments.csv. Its contents will be similar to the following format:

Video ID,Title,Comment
CD8s0qwjpoQ,Hacking Livestream #64: async/await in Python 3,TIL: for/else Nice
CD8s0qwjpoQ,Hacking Livestream #64: async/await in Python 3,"You weren’t able to figure it out today, but I enjoyed the journey a lot. Keep up the great work."
CD8s0qwjpoQ,Hacking Livestream #64: async/await in Python 3,Start @ 3:35
CD8s0qwjpoQ,Hacking Livestream #64: async/await in Python 3,"AFAIK await  is still just like yield from and coroutines are just like generators, they just made yield from only compatible with generators and await - with coroutines.
Seconding David Beazley recommendation, his presentations are amazing. He shows how to run a coroutine at https://youtu.be/E-1Y4kSsAFc?t=774 Other presentations (some are about async) are at dabeaz.com/talks.html
Also, if you want to read sources of an async library, I'd recommend David's Curio or more production-ready and less experimental Trio. Asyncio creates too many abstractions and entities to be easily comprehended."
DYhAoM1Kny0,Asynchronous input with Python and Asyncio,good job Hoff i like the Asyncio video
DYhAoM1Kny0,Asynchronous input with Python and Asyncio,"I came here after watching a video on Hall PC, the Windows NT and OS/2 Shoutout from 1993 and they described this as being a feature in Windows 3.11 NT and OS/2 that year. Before they had this you usually had to wait until after an hour glass ended before you could use your other application you had opened.  I didn't think it actually had other applications other than in system programming. Very interesting stuff, btw I really don't feel confident in writing my own operating system."
DYhAoM1Kny0,Asynchronous input with Python and Asyncio,"Is there a previous video, or are you just referencing offscreen stuff at the beginning?"
GMewz5Pf2lU,In Python Threads != Async,Skip first 20 minutes
2ukHDGLr9SI,Getting started with event loops: the magic of select,Thank you so much for the video! What terminal are you using? It looks so easy to change the size of the window
2ukHDGLr9SI,Getting started with event loops: the magic of select,need socket.setblocking(False) ?
2ukHDGLr9SI,Getting started with event loops: the magic of select,"Thank you for the tutorial. I am having some difficulty in getting the code to work.
.....

Video ID,Title,Comment

CD8s0qwjpoQ,Hacking Livestream #64: async/await in Python 3,TIL: for/else Nice

CD8s0qwjpoQ,Hacking Livestream #64: async/await in Python 3,"You weren’t able to figure it out today, but I enjoyed the journey a lot. Keep up the great work."

CD8s0qwjpoQ,Hacking Livestream #64: async/await in Python 3,Start @ 3:35

CD8s0qwjpoQ,Hacking Livestream #64: async/await in Python 3,"AFAIK await is still just like yield from and coroutines are just like generators, they just made yield from only compatible with generators and await - with coroutines.

Seconding David Beazley recommendation, his presentations are amazing. He shows how to run a coroutine at https://youtu.be/E-1Y4kSsAFc?t=774 Other presentations (some are about async) are at dabeaz.com/talks.html

Also, if you want to read sources of an async library, I'd recommend David's Curio or more production-ready and less experimental Trio. Asyncio creates too many abstractions and entities to be easily comprehended."

DYhAoM1Kny0,Asynchronous input with Python and Asyncio,good job Hoff i like the Asyncio video

DYhAoM1Kny0,Asynchronous input with Python and Asyncio,"I came here after watching a video on Hall PC, the Windows NT and OS/2 Shoutout from 1993 and they described this as being a feature in Windows 3.11 NT and OS/2 that year. Before they had this you usually had to wait until after an hour glass ended before you could use your other application you had opened. I didn't think it actually had other applications other than in system programming. Very interesting stuff, btw I really don't feel confident in writing my own operating system."

DYhAoM1Kny0,Asynchronous input with Python and Asyncio,"Is there a previous video, or are you just referencing offscreen stuff at the beginning?"

GMewz5Pf2lU,In Python Threads != Async,Skip first 20 minutes

2ukHDGLr9SI,Getting started with event loops: the magic of select,Thank you so much for the video! What terminal are you using? It looks so easy to change the size of the window

2ukHDGLr9SI,Getting started with event loops: the magic of select,need socket.setblocking(False) ?

2ukHDGLr9SI,Getting started with event loops: the magic of select,"Thank you for the tutorial. I am having some difficulty in getting the code to work.

.....

Note. All google APIs have rate limiting so you should try not to make too many API calls.

Complete Project Code

Here is the final Python code for using YouTube API to search for a keyword and extract comments on resulted videos.

import csv
import os

import google.oauth2.credentials

from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google_auth_oauthlib.flow import InstalledAppFlow

# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret.
CLIENT_SECRETS_FILE = "client_secret.json"

# This OAuth 2.0 access scope allows for full read/write access to the
# authenticated user's account and requires requests to use an SSL connection.
SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'


def get_authenticated_service():
    credentials = None
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            credentials = pickle.load(token)
    #  Check if the credentials are invalid or do not exist
    if not credentials or not credentials.valid:
        # Check if the credentials have expired
        if credentials and credentials.expired and credentials.refresh_token:
            credentials.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                CLIENT_SECRETS_FILE, SCOPES)
            credentials = flow.run_console()

        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(credentials, token)

    return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)


def get_video_comments(service, **kwargs):
    comments = []
    results = service.commentThreads().list(**kwargs).execute()

    while results:
        for item in results['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            comments.append(comment)

        # Check if another page exists
        if 'nextPageToken' in results:
            kwargs['pageToken'] = results['nextPageToken']
            results = service.commentThreads().list(**kwargs).execute()
        else:
            break

    return comments


def write_to_csv(comments):
    with open('comments.csv', 'w') as comments_file:
        comments_writer = csv.writer(comments_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        comments_writer.writerow(['Video ID', 'Title', 'Comment'])
        for row in comments:
            # convert the tuple to a list and write to the output file
            comments_writer.writerow(list(row))


def get_videos(service, **kwargs):
    final_results = []
    results = service.search().list(**kwargs).execute()

    i = 0
    max_pages = 3
    while results and i < max_pages:
        final_results.extend(results['items'])

        # Check if another page exists
        if 'nextPageToken' in results:
            kwargs['pageToken'] = results['nextPageToken']
            results = service.search().list(**kwargs).execute()
            i += 1
        else:
            break

    return final_results


def search_videos_by_keyword(service, **kwargs):
    results = get_videos(service, **kwargs)
    final_result = []
    for item in results:
        title = item['snippet']['title']
        video_id = item['id']['videoId']
        comments = get_video_comments(service, part='snippet', videoId=video_id, textFormat='plainText')
        # make a tuple consisting of the video id, title, comment and add the result to 
        # the final list
        final_result.extend([(video_id, title, comment) for comment in comments]) 

    write_to_csv(final_result)


if __name__ == '__main__':
    # When running locally, disable OAuthlib's HTTPs verification. When
    # running in production *do not* leave this option enabled.
    os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'
    service = get_authenticated_service()
    keyword = input('Enter a keyword: ')
    search_videos_by_keyword(service, q=keyword, part='id,snippet', eventType='completed', type='video')

100

101

102

103

104

105

106

107

108

109

110

111

112

113

import csv

import os

import google.oauth2.credentials

from googleapiclient.discovery import build

from googleapiclient.errors import HttpError

from google_auth_oauthlib.flow import InstalledAppFlow

# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains

# the OAuth 2.0 information for this application, including its client_id and

# client_secret.

CLIENT_SECRETS_FILE = "client_secret.json"

# This OAuth 2.0 access scope allows for full read/write access to the

# authenticated user's account and requires requests to use an SSL connection.

SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']

API_SERVICE_NAME = 'youtube'

API_VERSION = 'v3'

def get_authenticated_service():

credentials = None

if os.path.exists('token.pickle'):

with open('token.pickle', 'rb') as token:

credentials = pickle.load(token)

# Check if the credentials are invalid or do not exist

if not credentials or not credentials.valid:

# Check if the credentials have expired

if credentials and credentials.expired and credentials.refresh_token:

credentials.refresh(Request())

else:

flow = InstalledAppFlow.from_client_secrets_file(

CLIENT_SECRETS_FILE, SCOPES)

credentials = flow.run_console()

# Save the credentials for the next run

with open('token.pickle', 'wb') as token:

pickle.dump(credentials, token)

return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

def get_video_comments(service, **kwargs):

comments = []

results = service.commentThreads().list(**kwargs).execute()

while results:

for item in results['items']:

comment = item['snippet']['topLevelComment']['snippet']['textDisplay']

comments.append(comment)

# Check if another page exists

if 'nextPageToken' in results:

kwargs['pageToken'] = results['nextPageToken']

results = service.commentThreads().list(**kwargs).execute()

else:

break

return comments

def write_to_csv(comments):

with open('comments.csv', 'w') as comments_file:

comments_writer = csv.writer(comments_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

comments_writer.writerow(['Video ID', 'Title', 'Comment'])

for row in comments:

# convert the tuple to a list and write to the output file

comments_writer.writerow(list(row))

def get_videos(service, **kwargs):

final_results = []

results = service.search().list(**kwargs).execute()

i = 0

max_pages = 3

while results and i < max_pages:

final_results.extend(results['items'])

# Check if another page exists

if 'nextPageToken' in results:

kwargs['pageToken'] = results['nextPageToken']

results = service.search().list(**kwargs).execute()

i += 1

else:

break

return final_results

def search_videos_by_keyword(service, **kwargs):

results = get_videos(service, **kwargs)

final_result = []

for item in results:

title = item['snippet']['title']

video_id = item['id']['videoId']

comments = get_video_comments(service, part='snippet', videoId=video_id, textFormat='plainText')

# make a tuple consisting of the video id, title, comment and add the result to

# the final list

final_result.extend([(video_id, title, comment) for comment in comments])

write_to_csv(final_result)

if __name__ == '__main__':

# When running locally, disable OAuthlib's HTTPs verification. When

# running in production *do not* leave this option enabled.

os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'

service = get_authenticated_service()

keyword = input('Enter a keyword: ')

search_videos_by_keyword(service, q=keyword, part='id,snippet', eventType='completed', type='video')

Course: REST API: Data Extraction with Python

Working with APIs is a skill requested for many jobs. Why?

APIs is the official way for data extraction and doing other automation stuff allowed by big websites. If there is an API allowing you to extract the data you need from a website, then you do not need regular web scraping.

Join our new course, REST APIs: Data Extraction and Automation with Python, for 90% OFF using this coupon:

https://www.udemy.com/course/rest-api-data-extraction-automation-python/?couponCode=REST-API-BLOG-YT

Michael Bukachi

Software Engineer & Dancer. Or is it the other way around? 🙂

Rating: 4.8/5. From 24 votes.

Please wait...

8 Replies to “Extracting YouTube Comments with YouTube API & Python”

ro mao says:

May 7, 2019 at 4:51 am

Vevry usefull, and thanks.
But I get a error :
{
“error”: {
“errors”: [
{
“domain”: “usageLimits”,
“reason”: “dailyLimitExceededUnreg”,
“message”: “Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup.”,
“extendedHelp”: “https://code.google.com/apis/console”
}
],
“code”: 403,
“message”: “Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup.”
}
}
How to solve this?

Rate this item:

Rating: 3.8/5. From 5 votes.

Please wait...

1. GoTrained says:
  
  August 2, 2019 at 6:11 am
  
  @ro mao – exactly as the error message said: “Daily Limit for Unauthenticated Use Exceeded.”
  Check this answer.
  
  Rate this item:
  
  Rating: 3.7/5. From 3 votes.
  
  Please wait...
  
Shahnawaz Irfan says:

May 22, 2019 at 12:30 pm

This code shows me invalid grant access error.

Rate this item:

Rating: 4.3/5. From 3 votes.

Please wait...

steph says:

July 1, 2019 at 12:44 pm

any way to not have to do that Oauth setup each time you run the script?

Rate this item:

Rating: 5.0/5. From 1 vote.

Please wait...

Emma says:

July 17, 2019 at 12:12 pm

This was genuinely so helpful to me for a work project I was doing — I could not figure out the oauth bit for my LIFE and it prevented me from getting to all the fun stuff. Bless you!

Rate this item:

No votes yet.

Please wait...

1. GoTrained says:
  
  August 2, 2019 at 5:58 am
  
  Emma, glad the tutorial helped you. All the best!
  
  Rate this item:
  
  No votes yet.
  
  Please wait...
  
Tom says:

September 2, 2019 at 3:16 pm

Really helpful post and easy to follow. I’m running in to some character encoding issues when writing to csv, I think this is being triggered by emoji content in the comments, any recommended approaches for handling this?

Rate this item:

Rating: 5.0/5. From 1 vote.

Please wait...

1. GoTrained says:
  
  September 8, 2019 at 12:40 pm
  
  Hi Tom! Thanks for your comment! Are you using Python 2 or 3. If you are not, please try using Python 3 and let us know if the issue was solved. Of course, Python 2 has ways to handle encoding, but Python 3 takes care of most of this by default. All the best!
  
  Rate this item:
  
  No votes yet.
  
  Please wait...

Extracting YouTube Comments with YouTube API & Python

YouTube Data API

Project Setup

API Activation

Credentials Setup

Client Installation

Client Setup

Cache Credentials

Search Videos by Keyword

Navigate Multiple Pages of Search Results

Get Video Comments

Store Comments in CSV File

Complete Project Code

Course: REST API: Data Extraction with Python

Related

8 Replies to “Extracting YouTube Comments with YouTube API & Python”

Leave a Reply Cancel reply

YouTube Data API

Project Setup

API Activation

Credentials Setup

Client Installation

Client Setup

Cache Credentials

Search Videos by Keyword

Navigate Multiple Pages of Search Results

Get Video Comments

Store Comments in CSV File

Complete Project Code

Course: REST API: Data Extraction with Python

Share this tutorial:

Related

8 Replies to “Extracting YouTube Comments with YouTube API & Python”

Leave a Reply Cancel reply

Want to learn more?