Scraping Tweets and Performing Sentiment Analysis

Sentiment Analysis is a special case of text classification where users’ opinions or sentiments regarding a product are classified into predefined categories such as positive, negative, neutral etc. Public sentiments can then be used for corporate decision making regarding a product which is being liked or disliked by the public.

Both rule-based and statistical techniques have been developed for sentimental analysis. With the advancements in Machine Learning and natural language processing techniques, Sentiment Analysis techniques have improved a lot.

In this tutorial, you will see how Sentiment Analysis can be performed on live Twitter data. The tutorial is divided into two major sections: Scraping Tweets from Twitter and Performing Sentiment Analysis.

Tutorial Contents

Scraping Tweets from Twitter

Following are the steps that you need to perform before you can scrape tweets from Twitter:

Creating a Twitter Developer Account

The first thing that you need to do is create a Twitter developer account. To do so, you need to go to Twitter developer website and create your account.

To create a developer account, you will have to verify your cell phone number and will have to answer a few basic questions regarding why you need a developer’s account etc. Next, you will have to agree to their terms and services. Finally, you will receive an email in your account for the verification of your account. Got to your email and confirm your account. Your application for the developer account will be reviewed by the concerned authorities as shown below:

Creating a New Twitter Application

Once you have created your developer account with Twitter, follow these steps to create a new Twitter Application:

log into your account and go to this link to create a new Twitter application. Click on the “Create an app” button on the top right.
You will be presented with a form where you have to enter the App Name, Application description, Website URL. For the sake of this tutorial, I named my application “twitter-scraping-xyz”. For website URL, you can add any place holder name as well. I used “www.google.com” for website URL. Add any description for the app and click the “Create” button at the bottom. Leave the rest of the fields. They are not necessary.

Getting API Keys and Access Tokens

To connect with your Twitter Server application from a client application such as Python, you will need consumer API keys and Access tokens. To do so, go to the application page; click on the “Keys and tokens” menu from the top. You will see API Key and API Secret Key on the page. For Access Token and Access Token Secret, you will have to click on the “Create” button as shown below:

Before proceeding to the next section, you should have Consumer API key, API Secret key, Access token, Access token secret.

Connecting Python Client Application to Twitter Server

Now as you have everything, you need to connect to the Twitter server and fetch live tweets. The library we will be using to connect to the Twitter server and scrape live tweets will be Tweepy. The library can be downloaded using the following command:

python -m pip install tweepy

1 2	python -m pip install tweepy

To connect to the Twitter Application server from a Python client, use the consumer API key, consumer API secret, Access token, and Access token secret. Execute the following script:

import tweepy
import re

from tweepy import OAuthHandler

consumer_api_key = 'D5CvS7MrSfSoigFQFkQ5sioi4'
consumer_api_secret = 'ci9IHZPJ2l8oX4rIolOzv359sq7iQ5vPVGuVHJW96IWIT3nyzD' 
access_token = '165879850-d6GPXrp2nhM6qJG2lKleOcCJSZRhED435N8sgxD8'
access_token_secret ='kQsvtXf5pajEiqT6L2HOpxN9BYakrWDOHmsMKo0C6j18U'

import tweepy

import re

from tweepy import OAuthHandler

consumer_api_key = 'D5CvS7MrSfSoigFQFkQ5sioi4'

consumer_api_secret = 'ci9IHZPJ2l8oX4rIolOzv359sq7iQ5vPVGuVHJW96IWIT3nyzD'

access_token = '165879850-d6GPXrp2nhM6qJG2lKleOcCJSZRhED435N8sgxD8'

access_token_secret ='kQsvtXf5pajEiqT6L2HOpxN9BYakrWDOHmsMKo0C6j18U'

In this script, you store the consumer API key, consumer API secret, Access token and Access token secret in corresponding string variables. you will use these variables to connect with the Twitter application. It is important to remember that your application will have different values for these consumer API key and secret as well as access token and access token secret. You will use those values in your application.

It is also pertinent to mention that we imported OAuthHandler from tweepy library. The OAuthHandler takes the Consumer API Key and Consumer API Secret as arguments. The Consumer API Key and Secret tell our client application which application to connect with, while the access tokens define the rights to access the application. Execute the following script:

authorizer = OAuthHandler(consumer_api_key, consumer_api_secret)
authorizer.set_access_token(access_token, access_token_secret)

authorizer = OAuthHandler(consumer_api_key, consumer_api_secret)

authorizer.set_access_token(access_token, access_token_secret)

Scraping Tweets

We have successfully connected to the Twitter API. The next step is to fetch tweets. Execute the following script to do so:

api = tweepy.API(authorizer ,timeout=15)

all_tweets = []

search_query = 'microsoft'

for tweet_object in tweepy.Cursor(api.search,q=search_query+" -filter:retweets",lang='en',result_type='recent').items(200):
    all_tweets.append(tweet_object.text)

api = tweepy.API(authorizer ,timeout=15)

all_tweets = []

search_query = 'microsoft'

for tweet_object in tweepy.Cursor(api.search,q=search_query+" -filter:retweets",lang='en',result_type='recent').items(200):

all_tweets.append(tweet_object.text)

In the script above, you first specify that if no tweet is found after searching for 15 seconds, the application should time out. Next, create an empty list all_tweets which will contain the scraped tweets. In the search_query specify the string “microsoft” which means that you want to search the tweets that contain the word “microsoft”.

Next, execute a loop that uses tweepy’s Cursor object to fetch tweets. The Cursor object takes several parameters which are as follows:

The first parameter is the type of operation you want to perform. We want to search tweets; therefore, specify api.search as the first parameter.
The second parameter is the search query. In addition to the search query, specify that -filter:retweets which means: do not fetch retweets.
The third parameter is the language where we specify “en” since we only want English tweets.
Finally, the result_type parameter is set to “recent” since we only need recent tweets.
The item attribute sets the number of tweets to return. Here we return only the 200 recent most tweets.

Once you execute the script above, you will see 200 most recent tweets containing the string “microsoft” will be stored in the all_tweets list and with that, we end the first part of the article.

Performing Sentimental Analysis

We have scraped live tweets from twitter. In this section, you will learn how to create a sentimental analysis model using existing dataset and to use that model to predict sentiments for the 200 tweets that you scraped in the last section.

The process of creating a sentimental analysis model is very similar to the one I explained in my previous article Twitter Sentiment Analysis Using TF-IDF.

Follow these steps to perform sentiment analysis on scraped tweets:

Installing Required Libraries

In this tutorial, we will use multiple libraries that you have to install beforehand. To install them use pip in your Terminal or CMD as follows:

pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn
pip install nltk

pip install numpy

pip install pandas

pip install matplotlib

pip install seaborn

pip install nltk

Note: If you are on Linux or Mac, you might need to use sudo before pip to avoid permissions issues.

Importing Libraries

Since you will be using Python for developing a sentiment analysis model, you need to import the required libraries. The following script does that:

import numpy as np 
import pandas as pd 
import re  
import nltk 
nltk.download('stopwords')  
from nltk.corpus import stopwords

import numpy as np

import pandas as pd

import re

import nltk

nltk.download('stopwords')

from nltk.corpus import stopwords

In the script above, we import “Numpy”, “Pandas”, “NLTK” and “re” libraries.

Loading the Dataset

To create your sentiment analysis model, you can use the Twitter dataset that contains tweets about six united states airlines. The dataset is freely available at this Github Link.

Execute the following script to load the dataset:

tweets = pd.read_csv("https://raw.githubusercontent.com/kolaveridi/kaggle-Twitter-US-Airline-Sentiment-/master/Tweets.csv")

1 2	tweets = pd.read_csv("https://raw.githubusercontent.com/kolaveridi/kaggle-Twitter-US-Airline-Sentiment-/master/Tweets.csv")

Data Preprocessing

As we did in our previous article Twitter Sentiment Analysis Using TF-IDF, we will divide the data into the label and feature set and then will remove special characters and empty spaces from the tweets. Execute the following script to do so:

X = tweets.iloc[:, 10].values  
y = tweets.iloc[:, 1].values


processed_tweets = []
 
for tweet in range(0, len(X)):  
    # Remove all the special characters
    processed_tweet = re.sub(r'\W', ' ', str(X[tweet]))
 
    # remove all single characters
    processed_tweet = re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_tweet)
 
    # Remove single characters from the start
    processed_tweet = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_tweet) 
 
    # Substituting multiple spaces with single space
    processed_tweet= re.sub(r'\s+', ' ', processed_tweet, flags=re.I)
 
    # Removing prefixed 'b'
    processed_tweet = re.sub(r'^b\s+', '', processed_tweet)
 
    # Converting to Lowercase
    processed_tweet = processed_tweet.lower()
 
    processed_tweets.append(processed_tweet)

X = tweets.iloc[:, 10].values

y = tweets.iloc[:, 1].values

processed_tweets = []

for tweet in range(0, len(X)):

# Remove all the special characters

processed_tweet = re.sub(r'\W', ' ', str(X[tweet]))

# remove all single characters

processed_tweet = re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_tweet)

# Remove single characters from the start

processed_tweet = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_tweet)

# Substituting multiple spaces with single space

processed_tweet= re.sub(r'\s+', ' ', processed_tweet, flags=re.I)

# Removing prefixed 'b'

processed_tweet = re.sub(r'^b\s+', '', processed_tweet)

# Converting to Lowercase

processed_tweet = processed_tweet.lower()

processed_tweets.append(processed_tweet)

TF-IDF for Text to Numeric Conversion

You can use the TFIDF scheme to convert text to numbers. The following script does that:

from sklearn.feature_extraction.text import TfidfVectorizer  
tfidfconverter = TfidfVectorizer(max_features=2000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))  
X = tfidfconverter.fit_transform(processed_tweets).toarray()

from sklearn.feature_extraction.text import TfidfVectorizer

tfidfconverter = TfidfVectorizer(max_features=2000, min_df=5, max_df=0.7, stop_words=stopwords.words('english'))

X = tfidfconverter.fit_transform(processed_tweets).toarray()

Training the Sentimental Analysis Model

Finally, to train the sentimental analysis model, execute the following script. It is important to mention that here we did not split our data into training and test set since we will be testing the performance of our algorithm on the scraped tweets.

from sklearn.ensemble import RandomForestClassifier
text_classifier = RandomForestClassifier(n_estimators=100, random_state=0)  
text_classifier.fit(X, y)

from sklearn.ensemble import RandomForestClassifier

text_classifier = RandomForestClassifier(n_estimators=100, random_state=0)

text_classifier.fit(X, y)

Predicting Sentiment for the Scraped Tweets

Before you predict the sentiment for the scraped tweets, you need to remove special characters and empty spaces from them as you did with the training dataset. The following script preprocesses the scraped tweets, convert tweet text to a corresponding numeric representation using TFIDF approach and then predicts sentimental analysis of the tweet using the sentimental analysis model that we trained in the previous step:

for tweet in all_tweets:
    # Remove all the special characters
    processed_tweet = re.sub(r'\W', ' ', tweet)
 
    # remove all single characters
    processed_tweet = re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_tweet)
 
    # Remove single characters from the start
    processed_tweet = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_tweet) 
 
    # Substituting multiple spaces with single space
    processed_tweet= re.sub(r'\s+', ' ', processed_tweet, flags=re.I)
 
    # Removing prefixed 'b'
    processed_tweet = re.sub(r'^b\s+', '', processed_tweet)
 
    # Converting to Lowercase
    processed_tweet = processed_tweet.lower()
 
    sentiment = text_classifier.predict(tfidfconverter.transform([ processed_tweet]).toarray())
    print(processed_tweet ,":", sentiment)

for tweet in all_tweets:

# Remove all the special characters

processed_tweet = re.sub(r'\W', ' ', tweet)

# remove all single characters

processed_tweet = re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_tweet)

# Remove single characters from the start

processed_tweet = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_tweet)

# Substituting multiple spaces with single space

processed_tweet= re.sub(r'\s+', ' ', processed_tweet, flags=re.I)

# Removing prefixed 'b'

processed_tweet = re.sub(r'^b\s+', '', processed_tweet)

# Converting to Lowercase

processed_tweet = processed_tweet.lower()

sentiment = text_classifier.predict(tfidfconverter.transform([ processed_tweet]).toarray())

print(processed_tweet ,":", sentiment)

In the output, you will see each of the 200 scraped tweets containing the word “microsoft” along with its sentiment. A screenshot of the output from the Spyder console is shown below:

Conclusion

The sentimental analysis is one of the most important tasks in corporate decision making. Being aware of the public sentiment about a product can play a crucial role in the success or failure of the product. In this tutorial, you saw how to scrape live tweets from Twitter and perform Sentiment Analysis on the tweets.

Usman Malik

I am Machine Learning and Data Science expert currently pursuing my PhD in Computer Science from Normandy University, France.

Rating: 5.0/5. From 2 votes.

Please wait...

4 Replies to “Scraping Tweets and Performing Sentiment Analysis”

shitu says:

August 26, 2019 at 1:52 am

Thanks, brother!
Plz I want to save the CSV file to my computer, and section 1 contains no save. Can you add the script.

Rate this item:

Rating: 5.0/5. From 1 vote.

Please wait...

1. Usman says:
  
  August 26, 2019 at 11:24 am
  
  @shitu – Yes, that’s because we are using the scrapped tweets in the section. Saving the tweets and loading them again in the second section would be redundant. Anyways, here is how you can create CSV from scrapped tweets:
  
  import pandas as pd df = pd.DataFrame(all_tweets) df.to_csv('filename.csv', index=False)
  
  Rate this item:
  
  Rating: 5.0/5. From 1 vote.
  
  Please wait...
  
Shitu says:

August 30, 2019 at 6:54 am

slm i have problem with X = tweets.iloc[:, 10].values
y = tweets.iloc[:, 1].values

IndexError: single positional indexer is out-of-bounds plz help
but i realize that my dataset does not have column name. plz help me on how i can scrap tweets with columns name like the one u used eg (tweet_id,tweet,etc) thanks

Rate this item:

No votes yet.

Please wait...

Shitu says:

August 31, 2019 at 12:49 pm

NameError Traceback (most recent call last)
in
59 processed_tweet = processed_tweet.lower()
60
—> 61 sentiment = text_classifier.predict(tfidfconverter.transform([ processed_tweet]).toarray())
62 print(processed_tweet ,”:”, sentiment)
63

NameError: name ‘text_classifier’ is not defined

plz how can i solve this problem Usman plz mail me via: shituabdullahi4u (at) gmail (dot) com

Rate this item:

No votes yet.

Please wait...

Scraping Tweets and Performing Sentiment Analysis

Scraping Tweets from Twitter

Creating a Twitter Developer Account

Creating a New Twitter Application

Getting API Keys and Access Tokens

Connecting Python Client Application to Twitter Server

Scraping Tweets

Performing Sentimental Analysis

Installing Required Libraries

Importing Libraries

Loading the Dataset

Data Preprocessing

TF-IDF for Text to Numeric Conversion

Training the Sentimental Analysis Model

Predicting Sentiment for the Scraped Tweets

Conclusion

Related

4 Replies to “Scraping Tweets and Performing Sentiment Analysis”

Leave a Reply Cancel reply

Scraping Tweets from Twitter

Creating a Twitter Developer Account

Creating a New Twitter Application

Getting API Keys and Access Tokens

Connecting Python Client Application to Twitter Server

Scraping Tweets

Performing Sentimental Analysis

Installing Required Libraries

Importing Libraries

Loading the Dataset

Data Preprocessing

TF-IDF for Text to Numeric Conversion

Training the Sentimental Analysis Model

Predicting Sentiment for the Scraped Tweets

Conclusion

Share this tutorial:

Related

4 Replies to “Scraping Tweets and Performing Sentiment Analysis”

Leave a Reply Cancel reply

Want to learn more?