Open space for technologists, investors, tech companies and hackers in Nairobi.

R image

Data Science Lab By Leo Mutuku / January 23, 2014

How To: Using R to Capture and Analyze Tweets

0 Comments

By Chris Orwa,

Junior Data Scientist, iHub Research

Note: Click on images to zoom

The R statistical computing software provides various packages for capturing data from different sources. In this blog, I’ll describe how to use R to collect data from Twitter. But first, let me provide a brief background on how to access Twitter data.

There are three levels to accessing Twitter data:

  • Via the Search API which returns the most popular tweets. This is the data you get when you do a manual search on the Twitter website, and in this mode only a short historical query of the data is available (a few days).
  • Via the Streaming API – it provides real-time access to twitter data but only provides access to a sample of all tweets. The Streaming API is suitable when you just want to get a feel on an occurring event.
  • The last access level is through the Twitter Fire hose, which returns all tweets – both historical and real time – on given keywords.  Unfortunately, only a few companies have this type of access.

In this blog, we will focus on the search and streaming API.

To get started, first register an app on the Twitter developer website https://dev.twitter.com/ and fill in the details.  

create-an-application

After successful registration of the app, click on the app and you should be able to see a similar table to the one below under ‘Oauth Settings’

Access level Read-onlyAbout the application permission model
Consumer key 8mzRs9PySHKmTcvXBcy5w
Consumer secret ZKNBKniG4ADfyk3tHCWQsj0wowapFpXhqoj8O4OnQQ
Request token URL https://api.twitter.com/oauth/request_token
Authorize URL https://api.twitter.com/oauth/authorize
Access token URL https://api.twitter.com/oauth/access_token
Callback URL None
Sign in with Twitter No

These settings will be used as variables in R so it is best to copy and save them in a text file.

Now you can fire-up R and download the twitter package with the command,

install.packages(“twitteR”)

and load it on the workspace with the command

require(twitteR).

Proceed with the code as in the diagram below by referring to the information you copied to the text file.

R code

That’s pretty much standard code! Note that the consumer key and consumer secret key should be the one provided for your app (for this blog, I’ve used mine).The next part is quite tricky and gave me some headache.

After providing the credentials via the

OAuthFactory$new()

function, a system handshake has to be initiated between your app and the Twitter server. A handshake in computing is a prior communication between two systems that sets the rules of the communication – in this case, it is implemented by digital certificates (SSL certificates) sent from the Twitter server acknowledging the app and setting type of information to be communicated.

The more straightforward and faster way to go round this is first to download the certificate with the R code:

download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

Then proceed with the code below to initiate the handshake.

twitCred$handshake(cainfo="cacert.pem")

Success at this stage should be in the form of a request for a pin in a textual format that reads:

To enable the connection, please direct your web browser to: https://api.twitter.com/oauth/authorize?oauth_token=kxzyNUke8nBprcClN4BTipXqgWKKn27Xf7We1qPJZE

Copy Paste the URL to your browser.  This is what I got.

Twitter App Pin

Punch back the PIN on the prompt as a reply below.

When complete, record the PIN given to you and provide it here.

We are almost there! The next step is to register the credentials using the line below.

registerTwitterOAuth(twitCred)

The function returns TRUE when all is well.

At this point I felt home was just a stone throw away only to be hit with an error while trying to use the search Twitter function.

[1] "SSL certificate problem, verify that the CA cert is OK. Details:\nerror:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed"
Error in twInterfaceObj$doAPICall(cmd, params, "GET", ...) :
Error: SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

If this happened to you too, do not worry, I have the antidote. Set the SSL globally using the code below.

library(RCurl)
# Set SSL certs globally
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

Hooray!!! We’ve done it! Now we have the power to search, collect and analyze tweets.

How about we do something interesting? Let’s download tweets from a trending topic and graph the highest contributors to the topic. To do this, download and install the ggplot2 and plyr packages.Using the

searchTwitter()

function in twitteR package, I captured 1000 tweets from the trending topic #TvYa13Million and graphed it.

Here is the code that did all the magic:

[PS: I didn't exit my R sessions, this code is a continuation of the above]

require(ggplot2)
library(plyr) 
TvTweets = searchTwitter("#TvYa13Million",n=1000)
users <- ldply(TvTweets,function(x) return(x$screenName))
ggplot(users,aes(x=V1))+geom_histogram()+theme(axis.text.x = element_text(angle = 45, hjust = 1))+ylab("Count of tweets using #TvYa13Million")+xlab("Twitter handle")

And there you go:

Twitter Graph

You now have the power to capture tweets and analyze them!

Tags , ,

Author : Leo Mutuku

Leo is a research manager at iHub Research. She conducts research on open data, data science and visualization, design research methods, market and investment research.


    No comments yet

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


{{ theme:js file="jquery.fittext.js" }}