avatarMartin Beck

Summary

The provided content offers a comprehensive guide on how to scrape tweets from Twitter using the updated Twitter API v2 in conjunction with the Tweepy Python library, including setup instructions and use cases for data scraping.

Abstract

The article "How to Scrape Tweets From Twitter" is an updated guide that addresses the recent changes in Twitter's API and the discontinuation of free scrape APIs. It details the process of setting up a Twitter Developer account, obtaining necessary tokens, and using Tweepy to interact with Twitter's API for scraping tweets. The guide covers two common use cases: scraping tweets from a specific user and conducting keyword searches to scrape relevant tweets. It also provides code examples for both scenarios, emphasizing the importance of specifying tweet fields and expansions to access additional tweet information. The article concludes with a discussion on the legality of the scraping method, the limits on the number of tweets that can be scraped, and alternatives for scraping without coding.

Opinions

  • The author acknowledges the significance of social media data for insights that traditional methods may not provide, highlighting the value of Twitter data for research and analysis.
  • The author suggests that the process of obtaining a Twitter Developer account and getting approval for app projects may require patience and detailed information about the intended use case.
  • The use of Tweepy is recommended for its ease of interaction with Twitter's API, and the article provides a positive endorsement for its capabilities beyond just scraping tweets.
  • The author emphasizes the importance of understanding Twitter's API levels and versions, as well as the costs associated with scraping large volumes of tweets, indicating a consideration for users' budget constraints.
  • The article implies that while there are non-coding tools available for scraping tweets, the most effective and compliant method is through the official Twitter API using Tweepy, due to recent API changes affecting third-party tools.

How to Scrape Tweets From Twitter

An up-to-date guide on scraping tweets from Twitter using Twitter’s API

Photo by Ink Drop on Shutterstock using Editorial License

Overview

Originally I wrote an article back in 2020 that covered how to scrape tweets from Twitter. Since then a lot of things have changed, including major changes to Twitter’s public search API impacting open source scrapers, Twitter releasing Twitter API v2 back in November 2021, and more recently the removal of most of their free scrape APIs. This follow-up guide was written to provide updated ways of scraping tweets and to answer any potential question people may have about using Twitter API v2 to scrape data.

This guide is meant to be a quick straightforward introduction to scraping tweets from Twitter using Twitter API V2. I’ll be covering setup, and two common use cases while using the API.

Why Should I Scrape Tweets

Social media can provide insights that normally would not be provided via traditional methods such as surveys, census data, or studies with it’s valuable access to people’s unfiltered opinions. This is due to the nature of how social media is used. You’re able to get answers to questions that normally wouldn’t be so easily accessible at such a scale.

Setup

Before we can get started, we’ll need to set up our tools first!

Setting up Tweepy

We’re using Python to interact with Twitter’s official API. Luckily there’s a Python library called Tweepy that makes this process as seamless as possible. However, to use the official API you’ll also need to set up a Twitter Developer account. We’ll go over that first then hop into setting up Tweepy.

Setup Twitter Developer

Before you can move forward it’s important to note that you will need to create a Twitter account or use your current one!

To set up your Twitter Developer account you’ll need to head over to Twitter Developer Portal Projects & Apps page. Where you’ll be prompted about the app you’re setting up. It will bring you to a page where you must fill out the information on the app you’re hoping to build.

Eventually, you’ll be asked to accept the terms and agreement. This will send an email verification.

After that, the application is sent off for review, and at this point, it’s just a waiting game. You may be requested to fill out more information regarding your app and use case.

Approval can take a couple of days up to potentially a week. It will take time, and developer support should reach out to you if there are any questions about the application you submitted.

Once approved, you'll need to grab your tokens in order to interact with the API. But you’ll first need to set up a project and app to get the tokens. You’ll need to navigate within the Developer Portal to Projects and Apps. Where you’ll create a new project. This will lead you through a prompt detailing your use case.

You’ll then need to add an existing app you have or create a new app for your project.

After your app is created you should then be able to finally get your keys and tokens!

If you already have a project and app you’ll need to go to Developer Portal > Projects & Apps > Overview > {App Name} > Keys and Tokens you’ll need to regenerate them if you don’t have access to them. For this article, you’ll need to generate and use a Bearer Token.

Now that you’ve got your tokens ready we can move on to setting up Tweepy!

Scraping with Tweepy

Setup Tweepy

Tweepy is a Python library for accessing the Twitter API. There are several different levels of API access that Tweepy offers as shown here, but those are for very specific use cases. Tweepy is able to accomplish various tasks beyond just scraping tweets. However, this article will only focus on using Twitter’s API to scrape data.

After having grabbed your Bearer Token, working with Tweepy from this point forward is pretty straightforward.

Tweepy is Available for Python versions 3.7 and later. This article won’t cover specifics on installing Python as it has been covered extensively and is a Google search away.

As to setting up Tweepy, it’s a pretty basic Python command. You’ll just need to do a pip install for the Tweepy library.

pip install tweepy

Also important to note I’ll be using the Pandas library for storing tweet data and modifying it.

Setting up Tweepy Credentials

import tweepy

bearer_token = "XXXXXXXXX"

client = tweepy.Client(bearer_token)

Scraping a specific Twitter user’s Tweets:

import tweepy
import pandas as pd

# Input search query to scrape tweets and name csv file
username = 'BillGates'
count = 10

try:
    # grabbing user id from username 
    user_id = client.get_user(username=username).data.id
    
    # Creation of query method using parameters
    tweets = tweepy.Paginator(client.get_users_tweets, user_id, tweet_fields=["author_id", "created_at", "lang", "public_metrics"], expansions=["author_id"], max_results=100).flatten(limit = count)
    
    tweets_list = []
    
    # Pulling information from tweets generator
    tweets_list = [[tweet.created_at, tweet.id, tweet.text, tweet.public_metrics["retweet_count"], tweet.public_metrics["like_count"]]for tweet in tweets]
    
    # Creation of dataframe from tweets list
    tweets_df = pd.DataFrame(tweets_list, columns=["Created At", "Tweet Id", "Text", "Retweet Count", "Like Count"])
    
    # Converting dataframe to CSV 
    tweets_df.to_csv("{}-tweets.csv".format(username), sep=",", index = False)
    
    print("Completed Scrape!")
    
except BaseException as e:
    print("failed on_status,",str(e))

Scraping Tweets Using Keyword Search:

import tweepy
import pandas as pd

# Input search query to scrape tweets and name csv file
keyword_search = 'Dogs'
count = 10

try:
    # Creation of query method using parameters
    tweets = tweepy.Paginator(client.search_recent_tweets, keyword_search, tweet_fields=["author_id", "created_at", "lang", "public_metrics"], user_fields=["username"]).flatten(limit = count)
    
    tweets_list = []
    
    # Pulling information from tweets generator
    tweets_list = [[tweet.created_at, tweet.id, tweet.text, tweet.public_metrics["retweet_count"], tweet.public_metrics["like_count"]]for tweet in tweets]
    
    # Creation of dataframe from tweets list
    tweets_df = pd.DataFrame(tweets_list, columns=["Created At", "Tweet Id", "Text", "Retweet Count", "Like Count"])
    
    # Converting dataframe to CSV 
    tweets_df.to_csv("{}-tweets.csv".format(keyword_search), sep=",", index = False)
    
    print("Completed Scrape!")
    
except BaseException as e:
    print("failed on_status,",str(e))

How Can I Access Other Tweet Information?

For the most part, the above code samples will provide you access to a majority of the Tweet information that people tend to utilize. However, it is possible that you might want different data available from Tweets.

By default, the tweet object returned by the V2 API will only provide id and text fields. Everything else must be specified via either a field parameter or an expansion. If you’d like to pull other tweet information available as shown in the data dictionary here, you’ll need to include that in the tweet_fields section when making the API call.

Tweet Fields

For example, if I wanted to pull the language of a Tweet, I can modify the tweet_fields to include that as shown below.

tweets = tweepy.Paginator(client.search_recent_tweets, "dogs", tweet_fields=["lang"], user_fields=["username"]).flatten(limit = count)

Expansions

Not all tweet data is available in the tweet fields. There is additional information available through tweet expansions.

Similar to tweet_fields you can add to the Paginator method in order to pull more information in.

tweets = tweepy.Paginator(client.search_recent_tweets, "dogs", expansions=["author_id"], max_results=100).flatten(limit = count)

This will then grab additional information to allow you to query through.

FAQs

Is This Method legal to Scrape Tweets?

Yes, we are using Tweepy which leverages Twitter’s official API for searching tweets and pulling that data. This is supported by Twitter’s TOS as shown by the following excerpt pulled from Twitter’s Terms of Service as of August 10th, 2023:

“… search or attempt to access or search the Services by any means (automated or otherwise) other than through our currently available, published interfaces that are provided by us(and only pursuant to the applicable terms and conditions) …”

How Many Tweets Can I Scrape?

With the basic level, you can scrape up to 10,000 tweets a month for $100/month. Or if you need more than that you can instead pay $5,000/month for scraping up to 1 million tweets instead. If neither of these is sufficient for your needs. You’re able to request Enterprise level or higher if needed.

Screenshot of Twitter API Levels and Versions

How Can I Scrape Tweets Without Coding?

There are a couple of solutions such as Scrape Hero, Stevesie, or web scraping automation tools like Octoparse that require learning how to use the app in the first place. However, with Twitter updating their API access a lot of these tools have been impacted in how they can pull that data and how much is possible to scrape.

References

GitHub containing this tutorial’s scraping files: https://github.com/MartinKBeck/TwitterScraper/tree/master/ScraperV4

Tweepy GitHub: https://www.tweepy.org/

Twitter API v2 with Tweepy in Python Guide: https://dev.to/twitterdev/a-comprehensive-guide-for-using-the-twitter-api-v2-using-tweepy-in-python-15d9

Twitter
Scraping
Tweet
Web Scraping
Python
Recommended from ReadMedium