Twitter has been a good source for Data Mining. Many data scientists and analytics companies collect tweets and analyze them to understand people’s opinion about some matters.
In this tutorial, you will learn how to use Twitter API and Python Tweepy library to search for a word or phrase and extract tweets that include it and print the results.
Note: This tutorial is different from our other Twitter API tutorial in that the current one uses Twitter Streaming API which fetches live tweets while the other tutorial uses the cursor method to search existing tweets. You can use the cursor to specify the language and tweet limit and you can also filter retweets using cursor.
First of all, you must install the Python Tweepy library. You can do this by running:
1 2 |
pip install tweepy |
Note that if you are on Mac or Linux, you might need to start with sudo to avoid permissions issues.
Now you need to import the libraries you need. Add the following snippet:
1 2 3 4 5 6 |
import tweepy from tweepy import OAuthHandler from tweepy.streaming import StreamListener from tweepy import Stream import time |
Specify the phrase that you want to search.
1 2 |
phrase_to_search = 'Global Warming' |
Go to Twitter Developers page and create a new app. Once you have created an app, generate a new access token and its secret.
Enter your Consumer API keys, Access Token and Access Token Secret
1 2 3 4 5 |
consumer_key = 'your_api_key' consumer_secret = 'your_api_secret_key' access_token = 'your_access_token' access_secret = 'your_access_secret' |
Now you need to setup the streaming logic. This following code is what you need to listen for data from twitter’s streaming API.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
g = [] class StdOutListener(StreamListener): def on_data(self,data): # Streaming API. Streaming API fetches live tweets print(data) g.append(data) time.sleep(2) return True # To print the status if an error happens def on_error(self,status): print(status) |
Start tweet collection and filter tweets on basis of the given phrase.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
def call_api(stream, phrase): # If the time crosses the amount of time mentioned by t_end, then the tweet scrapping stops try: stream.filter(track=[phrase]) except Exception as e: print(e) # If the stream is already connected, the following will disconnect the stream and reconnect it if "Stream object already connected!" in str(e): stream.disconnect() print("connecting again") stream.filter(track=[phrase]) |
Add authentication for twitter’s streaming API and the main function.
1 2 3 4 5 6 |
if __name__=='__main__': listener = StdOutListener() auth = OAuthHandler(consumer_key,consumer_secret) auth.set_access_token(access_token,access_secret) stream = Stream(auth, listener) |
Call the API with the phrase defined earlier.
1 2 |
call_api(stream, phrase_to_search) |
Full Project Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
import tweepy from tweepy import OAuthHandler from tweepy.streaming import StreamListener from tweepy import Stream import time phrase_to_search = 'Global Warming' consumer_key = 'your_api_key' consumer_secret = 'your_api_secret_key' access_token = 'your_access_token' access_secret = 'your_access_secret' g = [] class StdOutListener(StreamListener): def on_data(self,data): # Streaming API. Streaming API listens for live tweets print(data) g.append(data) time.sleep(2) return True # To print the status if an error happens def on_error(self,status): print(status) def call_api(stream, phrase): # If the time crosses the amount of time mentioned by t_end, then the tweet scrapping stops try: stream.filter(track=[phrase]) except Exception as e: print(e) # If the stream is already connected, the following will disconnect the stream and reconnect it if "Stream object already connected!" in str(e): stream.disconnect() print("connecting again") stream.filter(track=[phrase]) if __name__ == '__main__': listener = StdOutListener() auth = OAuthHandler(consumer_key,consumer_secret) auth.set_access_token(access_token,access_secret) stream = Stream(auth, listener) call_api(stream, phrase_to_search) |
Software Engineer & Dancer. Or is it the other way around? 🙂