Scraping Tweets and Performing Sentiment Analysis

Sentiment Analysis is a special case of text classification where users’ opinions or sentiments regarding a product are classified into predefined categories such as positive, negative, neutral etc.  Public sentiments can then be used for corporate decision making regarding a product which is being liked or disliked by the public.

Both rule-based and statistical techniques have been developed for sentimental analysis.  With the advancements in Machine Learning and natural language processing techniques, Sentiment Analysis techniques have improved a lot.

In this tutorial, you will see how Sentiment Analysis can be performed on live Twitter data. The tutorial is divided into two major sections: Scraping Tweets from Twitter and Performing Sentiment Analysis.

Scraping Tweets from Twitter

Following are the steps that you need to perform before you can scrape tweets from Twitter:

Creating a Twitter Developer Account

The first thing that you need to do is create a Twitter developer account. To do so, you need to go to Twitter developer website and create your account.

To create a developer account, you will have to verify your cell phone number and will have to answer a few basic questions regarding why you need a developer’s account etc. Next, you will have to agree to their terms and services. Finally, you will receive an email in your account for the verification of your account.  Got to your email and confirm your account. Your application for the developer account will be reviewed by the concerned authorities as shown below:

 

Creating a New Twitter Application

Once you have created your developer account with Twitter, follow these steps to create a new Twitter Application:

  1.  log into your account and go to this link to create a new Twitter application.  Click on the “Create an app” button on the top right.
  2. You will be presented with a form where you have to enter the App Name, Application description, Website URL. For the sake of this tutorial, I named my application “twitter-scraping-xyz”. For website URL, you can add any place holder name as well. I used “www.google.com” for website URL. Add any description for the app and click the “Create” button at the bottom. Leave the rest of the fields. They are not necessary.

Getting  API Keys and Access Tokens

To connect with your Twitter Server application from a client application such as Python, you will need consumer API keys and Access tokens. To do so, go to the application page; click on the “Keys and tokens” menu from the top. You will see API Key and API Secret Key on the page. For Access Token and Access Token Secret, you will have to click on the “Create” button as shown below:

 

Before proceeding to the next section, you should have Consumer API key, API Secret key, Access token, Access token secret.

Connecting Python Client Application to Twitter Server

Now as you have everything, you need to connect to the Twitter server and fetch live tweets. The library we will be using to connect to the Twitter server and scrape live tweets will be Tweepy. The library can be downloaded using the following command:

To connect to the Twitter Application server from a Python client, use the consumer API key, consumer API secret, Access token, and Access token secret. Execute the following script:

In this script, you store the consumer API key, consumer API secret, Access token and Access token secret in corresponding string variables. you will use these variables to connect with the Twitter application. It is important to remember that your application will have different values for these consumer API key and secret as well as access token and access token secret. You will use those values in your application.

It is also pertinent to mention that we imported OAuthHandler  from tweepy  library. The  OAuthHandler takes the Consumer API Key and Consumer API Secret as arguments. The Consumer API Key and Secret tell our client application which application to connect with, while the access tokens define the rights to access the application. Execute the following script:

Scraping Tweets

We have successfully connected to the Twitter API. The next step is to fetch tweets. Execute the following script to do so:

In the script above, you first specify that if no tweet is found after searching for 15 seconds, the application should time out. Next, create an empty list all_tweets  which will contain the scraped tweets. In the search_query  specify the string “microsoft” which means that you want to search the tweets that contain the word “microsoft”.

Next, execute a loop that uses tweepy’s  Cursor  object to fetch tweets. The Cursor  object takes several parameters which are as follows:

  • The first parameter is the type of operation you want to perform. We want to search tweets; therefore, specify api.search  as the first parameter.
  • The second parameter is the search query. In addition to the search query, specify that  -filter:retweets  which means: do not fetch retweets.
  • The third parameter is the language where we specify “en” since we only want English tweets.
  • Finally, the  result_type  parameter is set to “recent” since we only need recent tweets.
  • The item attribute sets the number of tweets to return. Here we return only the 200 recent most tweets.

Once you execute the script above, you will see 200 most recent tweets containing the string “microsoft” will be stored in the all_tweets  list and with that, we end the first part of the article.

Performing Sentimental Analysis

We have scraped live tweets from twitter.  In this section, you will learn how to create a sentimental analysis model using existing dataset and to use that model to predict sentiments for the 200 tweets that you scraped in the last section.

The process of creating a sentimental analysis model is very similar to the one I explained in my previous article Twitter Sentiment Analysis Using TF-IDF.

Follow these steps to perform sentiment analysis on scraped tweets:

Installing Required Libraries

In this tutorial, we will use multiple libraries that you have to install beforehand. To install them use pip in your Terminal or CMD as follows:

Note: If you are on Linux or Mac, you might need to use sudo before pip to avoid permissions issues.

Importing Libraries

Since you will be using Python for developing a sentiment analysis model, you need to import the required libraries. The following script does that:

In the script above, we import “Numpy”, “Pandas”, “NLTK” and “re” libraries.

Loading the Dataset

To create your sentiment analysis model, you can use the Twitter dataset that contains tweets about six united states airlines. The dataset is freely available at this Github Link.

Execute the following script to load the dataset:

Data Preprocessing

As we did in our previous article Twitter Sentiment Analysis Using TF-IDF, we will divide the data into the label and feature set and then will remove special characters and empty spaces from the tweets.  Execute the following script to do so:

TF-IDF for Text to Numeric Conversion

You can use the TFIDF scheme to convert text to numbers. The following script does that:

Training the Sentimental Analysis Model

Finally, to train the sentimental analysis model, execute the following script. It is important to mention that here we did not split our data into training and test set since we will be testing the performance of our algorithm on the scraped tweets.

 

Predicting Sentiment for the Scraped Tweets

Before you predict the sentiment for the scraped tweets, you need to remove special characters and empty spaces from them as you did with the training dataset. The following script preprocesses the scraped tweets, convert tweet text to a corresponding numeric representation using TFIDF approach and then predicts sentimental analysis of the tweet using the sentimental analysis model that we trained in the previous step:

In the output, you will see each of the 200 scraped tweets containing the word “microsoft” along with its sentiment. A screenshot of the output from the Spyder console is shown below:

Conclusion

The sentimental analysis is one of the most important tasks in corporate decision making. Being aware of the public sentiment about a product can play a crucial role in the success or failure of the product. In this tutorial, you saw how to scrape live tweets from Twitter and perform Sentiment Analysis on the tweets.

 

 

Rating: 5.0/5. From 2 votes.
Please wait...

4 Replies to “Scraping Tweets and Performing Sentiment Analysis”

  1. Thanks, brother!
    Plz I want to save the CSV file to my computer, and section 1 contains no save. Can you add the script.

    Rating: 5.0/5. From 1 vote.
    Please wait...
    1. @shitu – Yes, that’s because we are using the scrapped tweets in the section. Saving the tweets and loading them again in the second section would be redundant. Anyways, here is how you can create CSV from scrapped tweets:

      import pandas as pd
      df = pd.DataFrame(all_tweets)
      df.to_csv('filename.csv', index=False)

      Rating: 5.0/5. From 1 vote.
      Please wait...
  2. slm i have problem with X = tweets.iloc[:, 10].values
    y = tweets.iloc[:, 1].values

    IndexError: single positional indexer is out-of-bounds plz help
    but i realize that my dataset does not have column name. plz help me on how i can scrap tweets with columns name like the one u used eg (tweet_id,tweet,etc) thanks

    No votes yet.
    Please wait...
  3. NameError Traceback (most recent call last)
    in
    59 processed_tweet = processed_tweet.lower()
    60
    —> 61 sentiment = text_classifier.predict(tfidfconverter.transform([ processed_tweet]).toarray())
    62 print(processed_tweet ,”:”, sentiment)
    63

    NameError: name ‘text_classifier’ is not defined

    plz how can i solve this problem Usman plz mail me via: shituabdullahi4u (at) gmail (dot) com

    No votes yet.
    Please wait...

Leave a Reply