Python: Pull Twitter, Facebook User Data

Updated July 4, 2013: The Twitter portion of this post has been revised to reflect Twitter’s retirement of v1 of its API and its move to v1.1. Since writing the original, I also discovered the excellent python-twitter library, which is extremely useful for handling the now-required authentication with the API and also providing a wrapper around the API itself.

Original post (updated):

The APIs offered by the two social media giants, Twitter and Facebook, offer plenty of possibilities for data gathering and analysis. From tweets and status messages to numbers of followers and friends, photos, locations and more, there’s a lot of information waiting.

Given my nascent interest in Python, I decided to explore the APIs via some simple scripts, fetching Twitter profile and Facebook page data and writing the values to a SQLite database.

These examples are simple but offer a framework for you (and me) to build upon. SQLite support is built into Python, but for the Facebook script you must install the Requests library if you don’t have it.

Facebook Page Data

This script (available on Github) pulls the number of “likes” and “talking about this” for each Facebook page specified in the list called names_list. It creates a SQLite database called social_data.db if none exists and also creates a table to hold the data.

# Fetch Facebook page metrics via Social Graph API into a SQLite DB
# Grabs the number of likes and "talking about" numbers
 
import requests
import sqlite3
import os
from datetime import datetime
 
# These are the accounts for which you will fetch data
names_list = [
    'fallingskies',
    'usatoday'
]
 
# API base URL
base_url = 'https://graph.facebook.com/'
 
# Function to add row to accounts table
def insert_db(handle, likes, talking):
    conn = sqlite3.connect('social_data.db')
    cur = conn.cursor()
    cur.execute('''
        INSERT INTO fbaccounts VALUES (?,?,?,?);
        ''', (datetime.now(), handle, likes, talking))
    conn.commit()
    conn.close()
 
# Create the database if it doesn't exist
if not os.path.exists('social_data.db'):
    conn = sqlite3.connect('social_data.db')
    conn.close()
else:
    pass
 
# Create the table if it's not in the db
conn = sqlite3.connect('social_data.db')
cur = conn.cursor()
cur.execute('''CREATE TABLE IF NOT EXISTS fbaccounts 
    (FetchDate Date, Handle Text, Likes Integer, Talking Integer)
    ''')
conn.commit()
conn.close()
 
# Iterate over handles and hit the API with each
for user in names_list:
    url = base_url + user 
    print 'Fetching ' + user
    response = requests.get(url)
    profile = response.json()
    handle = profile['name']
    likes = profile['likes']
    talking = profile['talking_about_count']
    insert_db(handle, likes, talking)

Twitter profile data

This script (also on Github) uses the python-twitter library to fetch some basic profile data — screen name, followers and description — into a SQLite DB. To get the keys and access tokens required now by version 1.1 of the Twitter API, you’ll need to register an application under your profile. Start at https://dev.twitter.com/

# Fetch Twitter profile details from Twitter API into a SQLite DB
 
import twitter
import sqlite3
import os
from datetime import datetime
 
# Set the Twitter API authentication
api = twitter.Api(consumer_key='your-key',
                  consumer_secret='your-secret-key',
                  access_token_key='your-access-token',
                  access_token_secret='your-token-secret')
 
# These are the accounts for which you will fetch data
handles_list = [
    'chrisschnaars',
    'anthonydb',
    'usatoday'
]
 
# Function to add row to accounts table
def insert_db(handle, followers, description):
    conn = sqlite3.connect('social_data2.db')
    cur = conn.cursor()
    cur.execute('''
        INSERT INTO twaccounts VALUES (?,?,?,?);
        ''', (datetime.now(), handle, followers, description))
    conn.commit()
    conn.close()
 
# Create the database if it doesn't exist
if not os.path.exists('social_data2.db'):
    conn = sqlite3.connect('social_data2.db')
    conn.close()
else:
    pass
 
# Create the table if it's not in the db
conn = sqlite3.connect('social_data2.db')
cur = conn.cursor()
cur.execute('''CREATE TABLE IF NOT EXISTS twaccounts
    (FetchDate Date, Handle Text, Followers Integer, Description Text)
    ''')
conn.commit()
conn.close()
 
# Iterate over handles and hit the API with each
for handle in handles_list:
    print 'Fetching @' + handle
    try:
        user = api.GetUser(screen_name=handle)
        followers = user.GetFollowersCount()
        description = user.GetDescription()
        insert_db(handle, followers, description)
    except:
        print '-- ' + handle + ' not found'

Fetch more Twitter user data, timeline

Here’s a simpler version of the script, which returns some user data plus timeline statuses.

import twitter
 
api = twitter.Api(consumer_key='your-consumer-key',
                  consumer_secret='your-consumer-secret',
                  access_token_key='your-access-token-key',
                  access_token_secret='your-access-token-secret')
 
# set a handle
handle = 'anthonydb'
 
# Get some info on a user
user = api.GetUser(screen_name=handle)
 
print user.GetName()
print user.GetDescription()
print user.GetFollowersCount()
print user.GetStatusesCount()
print user.GetUrl()
 
# get a user timeline
statuses = api.GetUserTimeline(screen_name='pokjournal', count=1)
print [s.text for s in statuses]

Enjoy …

6 Responses to “Python: Pull Twitter, Facebook User Data”

  1. Alex Walsh says:

    Hey Anthony! Thanks for having this online!

    Couple questions for you on ways the code might be tweaked.

    1. If I had a list of 100 Twitter handles, and wanted to add a “category” designation for each to the SQLite database, what might be the easiest way to do that?

    2. What do you recommend for exporting to csv?

    Thanks so much!

  2. Anthony says:

    Hi, Alex,

    Glad you’re finding this helpful. To answer your questions:

    1) To add a category for each handle to your table, you could modify your handles list to make it a list of lists, like so:

    handles_list = [
        ['chrisschnaars', 'journalist'],
        ['anthonydb', 'journalist'],
        ['usatoday', 'newsroom']
    ]

    Then, add a category field to your table when you create it:

    cur.execute('''CREATE TABLE IF NOT EXISTS twaccounts
        (FetchDate Date, Handle Text, Category Text, Followers Integer, Description Text)
        ''')

    Finally, when you iterate over your list of lists, you can pass the handle to the API and the category to the function that inserts the data to the table:

    for handle in handles_list:
        print 'Fetching @' + handle[0]
        try:
            user = api.GetUser(screen_name=handle[0])
            followers = user.GetFollowersCount()
            description = user.GetDescription()
            insert_db(handle[0], handle[1], followers, description)

    You’ll have to make a few other tweaks, but you get the idea.

    2) For CSV output instead of to SQLite, take a look at my CDC flu scraper: https://github.com/anthonydb/CDC-flu-scraper/blob/master/fluscrape.py

  3. Alex Walsh says:

    Awesome! Thanks!

  4. Betsy says:

    Hey! Thanks for posting, this is so helpful! I am new to Python so I hope this is not too trivial of a question, but I was wondering how I can just import my tweets from my profile page and store them in my SQL?

  5. Rishi Mody says:

    Hi Anthony,
    First of all you have done a great job and thank you for putting this up.
    I am trying to extract data from facebook does this code work for api v2.0.
    Also what is the use of the access tokens that they have mentioned.
    Thanks.

  6. Anthony says:

    Hi, Rishi,

    I haven’t looked at the Facebook API since writing this post, so not sure. Apologies.

Leave a Reply