Sentiment Analysis is a special case of text classification where users’ opinions or sentiments regarding a product are classified into predefined categories such as positive, negative, neutral etc. Public sentiments can then be used for corporate decision making regarding a product which is being liked or disliked by the public.
Both rule-based and statistical techniques have been developed for sentimental analysis. With the advancements in Machine Learning and natural language processing techniques, Sentiment Analysis techniques have improved a lot.
In this tutorial, you will see how Sentiment Analysis can be performed on live Twitter data. The tutorial is divided into two major sections: Scraping Tweets from Twitter and Performing Sentiment Analysis.
Scraping Tweets from Twitter
Following are the steps that you need to perform before you can scrape tweets from Twitter:
Creating a Twitter Developer Account
The first thing that you need to do is create a Twitter developer account. To do so, you need to go to Twitter developer website and create your account.
To create a developer account, you will have to verify your cell phone number and will have to answer a few basic questions regarding why you need a developer’s account etc. Next, you will have to agree to their terms and services. Finally, you will receive an email in your account for the verification of your account. Got to your email and confirm your account. Your application for the developer account will be reviewed by the concerned authorities as shown below:
Creating a New Twitter Application
Once you have created your developer account with Twitter, follow these steps to create a new Twitter Application:
- log into your account and go to this link to create a new Twitter application. Click on the “Create an app” button on the top right.
- You will be presented with a form where you have to enter the App Name, Application description, Website URL. For the sake of this tutorial, I named my application “twitter-scraping-xyz”. For website URL, you can add any place holder name as well. I used “www.google.com” for website URL. Add any description for the app and click the “Create” button at the bottom. Leave the rest of the fields. They are not necessary.
Getting API Keys and Access Tokens
To connect with your Twitter Server application from a client application such as Python, you will need consumer API keys and Access tokens. To do so, go to the application page; click on the “Keys and tokens” menu from the top. You will see API Key and API Secret Key on the page. For Access Token and Access Token Secret, you will have to click on the “Create” button as shown below:
Before proceeding to the next section, you should have Consumer API key, API Secret key, Access token, Access token secret.
Connecting Python Client Application to Twitter Server
Now as you have everything, you need to connect to the Twitter server and fetch live tweets. The library we will be using to connect to the Twitter server and scrape live tweets will be Tweepy. The library can be downloaded using the following command:
1 2 |
python -m pip install tweepy |
To connect to the Twitter Application server from a Python client, use the consumer API key, consumer API secret, Access token, and Access token secret. Execute the following script:
1 2 3 4 5 6 7 8 9 10 |
import tweepy import re from tweepy import OAuthHandler consumer_api_key = 'D5CvS7MrSfSoigFQFkQ5sioi4' consumer_api_secret = 'ci9IHZPJ2l8oX4rIolOzv359sq7iQ5vPVGuVHJW96IWIT3nyzD' access_token = '165879850-d6GPXrp2nhM6qJG2lKleOcCJSZRhED435N8sgxD8' access_token_secret ='kQsvtXf5pajEiqT6L2HOpxN9BYakrWDOHmsMKo0C6j18U' |
In this script, you store the consumer API key, consumer API secret, Access token and Access token secret in corresponding string variables. you will use these variables to connect with the Twitter application. It is important to remember that your application will have different values for these consumer API key and secret as well as access token and access token secret. You will use those values in your application.
It is also pertinent to mention that we imported OAuthHandler from tweepy library. The OAuthHandler takes the Consumer API Key and Consumer API Secret as arguments. The Consumer API Key and Secret tell our client application which application to connect with, while the access tokens define the rights to access the application. Execute the following script:
1 2 3 |
authorizer = OAuthHandler(consumer_api_key, consumer_api_secret) authorizer.set_access_token(access_token, access_token_secret) |
Scraping Tweets
We have successfully connected to the Twitter API. The next step is to fetch tweets. Execute the following script to do so:
1 2 3 4 5 6 7 8 9 |
api = tweepy.API(authorizer ,timeout=15) all_tweets = [] search_query = 'microsoft' for tweet_object in tweepy.Cursor(api.search,q=search_query+" -filter:retweets",lang='en',result_type='recent').items(200): all_tweets.append(tweet_object.text) |
In the script above, you first specify that if no tweet is found after searching for 15 seconds, the application should time out. Next, create an empty list all_tweets which will contain the scraped tweets. In the search_query specify the string “microsoft” which means that you want to search the tweets that contain the word “microsoft”.
Next, execute a loop that uses tweepy’s Cursor object to fetch tweets. The Cursor object takes several parameters which are as follows:
- The first parameter is the type of operation you want to perform. We want to search tweets; therefore, specify api.search as the first parameter.
- The second parameter is the search query. In addition to the search query, specify that -filter:retweets which means: do not fetch retweets.
- The third parameter is the language where we specify “en” since we only want English tweets.
- Finally, the result_type parameter is set to “recent” since we only need recent tweets.
- The item attribute sets the number of tweets to return. Here we return only the 200 recent most tweets.
Once you execute the script above, you will see 200 most recent tweets containing the string “microsoft” will be stored in the all_tweets list and with that, we end the first part of the article.
Performing Sentimental Analysis
We have scraped live tweets from twitter. In this section, you will learn how to create a sentimental analysis model using existing dataset and to use that model to predict sentiments for the 200 tweets that you scraped in the last section.
The process of creating a sentimental analysis model is very similar to the one I explained in my previous article Twitter Sentiment Analysis Using TF-IDF.
Follow these steps to perform sentiment analysis on scraped tweets:
Installing Required Libraries
In this tutorial, we will use multiple libraries that you have to install beforehand. To install them use pip in your Terminal or CMD as follows:
1 2 3 4 5 6 |
pip install numpy pip install pandas pip install matplotlib pip install seaborn pip install nltk |
Note: If you are on Linux or Mac, you might need to use sudo before pip to avoid permissions issues.
Importing Libraries
Since you will be using Python for developing a sentiment analysis model, you need to import the required libraries. The following script does that:
1 2 3 4 5 6 7 |
import numpy as np import pandas as pd import re import nltk nltk.download('stopwords') from nltk.corpus import stopwords |
In the script above, we import “Numpy”, “Pandas”, “NLTK” and “re” libraries.
Loading the Dataset
To create your sentiment analysis model, you can use the Twitter dataset that contains tweets about six united states airlines. The dataset is freely available at this Github Link.
Execute the following script to load the dataset:
1 2 |
tweets = pd.read_csv("https://raw.githubusercontent.com/kolaveridi/kaggle-Twitter-US-Airline-Sentiment-/master/Tweets.csv") |
Data Preprocessing
As we did in our previous article Twitter Sentiment Analysis Using TF-IDF, we will divide the data into the label and feature set and then will remove special characters and empty spaces from the tweets. Execute the following script to do so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
X = tweets.iloc[:, 10].values y = tweets.iloc[:, 1].values processed_tweets = [] for tweet in range(0, len(X)): # Remove all the special characters processed_tweet = re.sub(r'\W', ' ', str(X[tweet])) # remove all single characters processed_tweet = re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_tweet) # Remove single characters from the start processed_tweet = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_tweet) # Substituting multiple spaces with single space processed_tweet= re.sub(r'\s+', ' ', processed_tweet, flags=re.I) # Removing prefixed 'b' processed_tweet = re.sub(r'^b\s+', '', processed_tweet) # Converting to Lowercase processed_tweet = processed_tweet.lower() processed_tweets.append(processed_tweet) |
TF-IDF for Text to Numeric Conversion
You can use the TFIDF scheme to convert text to numbers. The following script does that:
1 2 3 4 |
from sklearn.feature_extraction.text import TfidfVectorizer tfidfconverter = TfidfVectorizer(max_features=2000, min_df=5, max_df=0.7, stop_words=stopwords.words('english')) X = tfidfconverter.fit_transform(processed_tweets).toarray() |
Training the Sentimental Analysis Model
Finally, to train the sentimental analysis model, execute the following script. It is important to mention that here we did not split our data into training and test set since we will be testing the performance of our algorithm on the scraped tweets.
1 2 3 4 |
from sklearn.ensemble import RandomForestClassifier text_classifier = RandomForestClassifier(n_estimators=100, random_state=0) text_classifier.fit(X, y) |
Predicting Sentiment for the Scraped Tweets
Before you predict the sentiment for the scraped tweets, you need to remove special characters and empty spaces from them as you did with the training dataset. The following script preprocesses the scraped tweets, convert tweet text to a corresponding numeric representation using TFIDF approach and then predicts sentimental analysis of the tweet using the sentimental analysis model that we trained in the previous step:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
for tweet in all_tweets: # Remove all the special characters processed_tweet = re.sub(r'\W', ' ', tweet) # remove all single characters processed_tweet = re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_tweet) # Remove single characters from the start processed_tweet = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_tweet) # Substituting multiple spaces with single space processed_tweet= re.sub(r'\s+', ' ', processed_tweet, flags=re.I) # Removing prefixed 'b' processed_tweet = re.sub(r'^b\s+', '', processed_tweet) # Converting to Lowercase processed_tweet = processed_tweet.lower() sentiment = text_classifier.predict(tfidfconverter.transform([ processed_tweet]).toarray()) print(processed_tweet ,":", sentiment) |
In the output, you will see each of the 200 scraped tweets containing the word “microsoft” along with its sentiment. A screenshot of the output from the Spyder console is shown below:
Conclusion
The sentimental analysis is one of the most important tasks in corporate decision making. Being aware of the public sentiment about a product can play a crucial role in the success or failure of the product. In this tutorial, you saw how to scrape live tweets from Twitter and perform Sentiment Analysis on the tweets.
I am Machine Learning and Data Science expert currently pursuing my PhD in Computer Science from Normandy University, France.
Thanks, brother!
Plz I want to save the CSV file to my computer, and section 1 contains no save. Can you add the script.
@shitu – Yes, that’s because we are using the scrapped tweets in the section. Saving the tweets and loading them again in the second section would be redundant. Anyways, here is how you can create CSV from scrapped tweets:
import pandas as pd
df = pd.DataFrame(all_tweets)
df.to_csv('filename.csv', index=False)
slm i have problem with X = tweets.iloc[:, 10].values
y = tweets.iloc[:, 1].values
IndexError: single positional indexer is out-of-bounds plz help
but i realize that my dataset does not have column name. plz help me on how i can scrap tweets with columns name like the one u used eg (tweet_id,tweet,etc) thanks
NameError Traceback (most recent call last)
in
59 processed_tweet = processed_tweet.lower()
60
—> 61 sentiment = text_classifier.predict(tfidfconverter.transform([ processed_tweet]).toarray())
62 print(processed_tweet ,”:”, sentiment)
63
NameError: name ‘text_classifier’ is not defined
plz how can i solve this problem Usman plz mail me via: shituabdullahi4u (at) gmail (dot) com