Downloading Data From Twitter Using the Streaming API

6 min readJun 9, 2019

If you have been living in a Faraday cage for the last 10 years, and have never heard about it, Twitter is a very popular micro-blogging service where users create short messages called tweets that generally express opinions about different topics.

Over the last decade Twitter has become a very popular social networking application, and therefore a lot of interest has span around how to efficiently collect data from the platform.

In this post we will cover how to use the Streaming API to get Tweets that contain certain words or hashtags, and how to efficiently handle the objects returned by the API.

How to get data from Twitter

There are two main ways to download data from Twitter:

Using the REST API to get historical data, followers, friends, or the timeline of specific user.
Using the Streaming API to download data that is being produced on real time.

This post will cover how to use the Streaming API to get Tweets that contain certain words or hashtags, and how to efficiently handle the objects returned by the API.

Using the Streaming API data that is being produced on the moment that we run the collection script will be gathered. We can download tweets with specific keywords or hashtags. For example, if we are searching for tweets with the hashtag #UCL (UEFA Champions League) and somebody posts the following message:

“Football Tonight everybody, happy Tuesday #UCL #BarcelonaVsLiverpool #GoReds”

Then we will have the message added to our dataset. As we will see later, it is not only the text posted in the message what we get from this API, but a lot of other additional information like the user who posted the message, the timestamp, and much more.

How to use Twitter’s Streaming API

To use any of the APIs provided by Twitter, first we need to collect a series of Twitter API keys, which will be used to connect to such API. For this the steps are the following:

Go to https://apps.twitter.com/ and log in with your Twitter credentials.
Click “create an app” (first you might have to apply for a twitter development account)
Fill in the form to create the application.
Go to “Keys and Tokens” tab to collect your tokens.
Create an Access token and access token secret.

Screen where the creationg of the Access Token and Access Token Secret takes place.

I suggest you copy these to a safe location that can be easily accessed, as they will have to be used in any application that aims to collect data from any Twitter API.

Connecting to the API and downloading data

We will be using a Python Library called Tweepy to connect to the Twitter API and download the data. First, we must install Tweepy, which can be done by following the instructions from this link:

tweepy/tweepy

Twitter for Python! Contribute to tweepy/tweepy development by creating an account on GitHub.

github.com

Now tweepy is installed, we will write the script to download the data. Open up a text editor and copy the following code:

In the code above, where it says: “ENTER YOUR ACCESS TOKEN”, etc… replace the capital letters with your respective tokens for each line.

For downloading tweets containing certain key words, change the last line to incorporate the words or hashtags you want to look for. Specifically, change tracklist to be a list with the words or hashtags we want to look for.

In the above example we would be downloading tweets with hashtags concerning Brexit and USA, if we replaced tracklist with [“#Brexit”,”#USA”] . For more sophisticated filters follow the guidelines from

Standard stream parameters

delimited stall_warnings filter_level language follow track locations count with (deprecated) replies (deprecated)…

developer.twitter.com

These guidelines can be used to search for hot topics, narrow to certain geographies, specific times, or URLS. Usernames can also be added to this list apart to specific words to search for active trends or for tweets mentioning a certain user.

By running this code, we get a JSON object as an output, that looks like the following text:

{"created_at":"Fri May 10 04:26:55 +0000 2019","id":1126705151463444480,"id_str":"1126705151463444480","text":"Question Time last night the most biased BBC political programme I\u2019ve ever seen. Farage, in his 34th appearance, si\u2026 https:\/\/t.co\/6RReO5nO5C","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":true…

This is not the full JSON object for the sake of space. All the fields, along with descriptions can be found in:

Tweet object

Tweets are the basic atomic building block of all things Twitter. Tweets are also known as "status updates." The Tweet…

developer.twitter.com

Some of the most important fields are:

text, which contains the text included in the tweet.
created_at, which is a timestamp of when the tweet was created.
user, which contains information about the user that created the tweet, like the username and user id.

Save the document as Twitter_Downloader.py and run the following command to save the output of the script in a txt file called twitter_data.txt.

Twitter_Downloader.py > twitter_data.txt

There will be no output on the CLI, but the .txt file should be created in the same folder where the python script is.

If we run this, data will start being downloaded until we manually stop the script. To fix this, we can add some variables to our code in order to only download a certain number of tweets. Modify the code under the Twitter API credentials in the following manner:

In the code above modify the value of the variable n_tweets to the number of tweets you want to download. In this case we would be downloading 10 tweets. Also, we have made a new variable called tracklist to keep better track of the key words that we are searching for.

Let’s take a closer look at the data that we can download. After running the script for a while and saving its output to a .txt file, open up a Jupyter Notebook and write the following code:

import json
import pandas as pdtweets_data_path = "twitter_data.txt"  
tweets_data = []  
tweets_file = open(tweets_data_path, "r")  
for line in tweets_file:  
    try:  
        tweet = json.loads(line)  
        tweets_data.append(tweet)  
    except:  
        continuetweets = pd.DataFrame()
tweets['text'] = list(map(lambda tweet: tweet['text'], tweets_data))
tweets['Username'] = list(map(lambda tweet: tweet['user']['screen_name'], tweets_data))
tweets['Timestamp'] = list(map(lambda tweet: tweet['created_at'], tweets_data))tweets.head()

After doing this, the resulting Dataframe should look like:

As you can see, all the text fields have the structure “RT @user: text text text”. This is because the way the Streaming API outputs tweets that have been retweeted by a user and are therefore not originally created by such user. On another post the different format of the tweets returned by the Streaming API will be discussed.

All the documentation about how to use Tweepy and Twitters streaming API can be found in:

Streaming With Tweepy - tweepy 3.7.0 documentation

In Tweepy, an instance of tweepy.Stream establishes a streaming session and routes messages to StreamListener instance…

docs.tweepy.org

Conclusion:

We have looked at how to collect data that is being produced on real time using the Streaming API, and how this data can be efficiently displayed and processed. It can then be used for a lot of purposes: from trend or fake news detection using complex Machine Learning algorithms, to Sentiment Analysis for inferring how positive the feeling of a certain brand is, graph building, information diffusion models and much more.

For further research, or clarification of the information found here refer to the previous links left throughout this guide or to:

· Twitter Developers page: https://developer.twitter.com/en/docs

· Tweepy’s github page: https://github.com/tweepy/tweepy

· Tweepy’s official page: https://www.tweepy.org/

· Twitter’s advanced search: https://twitter.com/search-advanced

The Python script and the Jupyter Notebook used in for this post can be found in:

jaimezorno/Twitter_Medium

Series of codes backing up a list of Medium publications about Social Network Analysis on the Twitter platform …

github.com

In the next post we will look at how to use the REST API to collect historical data like previous tweets or the followers from a certain user. Take care until then and thank you for reading :)

Downloading Data From Twitter Using the Streaming API

How to get data from Twitter

How to use Twitter’s Streaming API

Connecting to the API and downloading data

tweepy/tweepy

Twitter for Python! Contribute to tweepy/tweepy development by creating an account on GitHub.

Standard stream parameters

delimited stall_warnings filter_level language follow track locations count with (deprecated) replies (deprecated)…

Tweet object

Tweets are the basic atomic building block of all things Twitter. Tweets are also known as "status updates." The Tweet…

Streaming With Tweepy - tweepy 3.7.0 documentation

In Tweepy, an instance of tweepy.Stream establishes a streaming session and routes messages to StreamListener instance…

Conclusion:

jaimezorno/Twitter_Medium

Series of codes backing up a list of Medium publications about Social Network Analysis on the Twitter platform …

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by James Thorn

Responses (3)

More from James Thorn

Coffee & Caffeine explained

Everything you need to know about this sacred elixir and how to make the most out of it

Deep Learning for NLP: ANNs, RNNs and LSTMs explained!

Learn about Artificial Neural Networks, Deep Learning, Recurrent Neural Networks and LSTMs like never before and use NLP to build a…

Are Our Thoughts Really Ours?

Do we control our minds, or do our minds control us?

Random Forest Explained

Random Forest explained simply: An easy Introduction to training, Classification, and Regression

Recommended from Medium

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

Transforming Text Classification with Semantic Search Techniques — Faiss

Classification models serve as supervised tools for organising documents into specific categories. Semantic search emerges as a practical…

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Goodbye RAG? Gemini 2.0 Flash Have Just Killed It!

Alright!!!

Fired From Meta After 1 Week: Here’s All The Dirt I Got

This is not just another story of a disgruntled ex-employee. I’m not shying away from the serious corporate espionage or the ethical…

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.