iHub Research By Angela Crandall / August 29, 2013
3Vs Crowdsourcing Framework for Elections launched
iHub Research is pleased to publish the results of our research on developing a Crowdsourcing Framework for Elections. Over the past 6 months, we have been looking at a commonly held assumption that crowdsourced information (collected from citizens through online platforms such as Twitter, Facebook, and text messaging) captures more information about the on-the-ground reality than traditional media outlets like television and newspapers. We used Kenya’s General Elections on March 4, 2013 as a case study event to compare information collected from the crowd with results collected by traditional media and other sources.
The three main goals of this study were to:
1) test the viability of passive crowdsourcing in the Kenyan context,
2) to determine which information-gathering mechanism (passive crowdsourcing on Twitter, active crowdsourcing on Uchaguzi, or online publications of traditional media) produced the best real-time picture of the on-the-ground reality, and
3) to develop a framework to help aspiring crowdsourcers to determine whether crowdsourcing is viable in their context and if so, which techniques will offer verifiable and valid information.
INSIGHTS FROM THE RESEARCH
By conducting a quantitative analysis of the data collected from Twitter during the Kenyan election period, we found that ‘passive crowdsourcing’ (data mining of already generated online information) is indeed viable in the Kenyan election context, but only using machine learning techniques. Mining Kenyan Twitter data during an election scenario looks to be a very valuable and worthwhile technique when looking for timely, local information. However, mining is only possible if the user has knowledge of machine learning techniques since, without such techniques, the manual process can take up to 270 working days.
The second objective of the study was to understand what information, if any, Twitter provided beyond traditional media sources, and other crowdsourcing platforms, such as Uchaguzi. We found that Twitter reported incidents as fast or faster than traditional media (as measured in days), though these reports had the disadvantage of not being previously verified like traditional media or Uchaguzi. Twitter contained sets of information/localized information useful to particular interest groups that may not be broadcast by traditional media. Aggregation of such content could string together newsworthy information on a grander scale.
Our third objective of this study was to determine whether there are particular conditions that need to be in place in order for crowdsourcing using online and mobile technologies to be a viable way to gather information during an election. By looking at the particular case of the 2013 Kenyan election, we found that indeed there are factors and considerations that are useful in assessing whether there will be an adequate online ‘crowd’ to source information from. These include, among others: 1) the availability of, and access to, Internet, 2) the adoption and penetration of mobile phone telephony, and 3) the extent and culture of social media networks usage. We further found that it is crucial to consider what type of data is required by the aspiring crowdsourcers before deciding how to gather data. For our project for instance, we desired data from multiple sources for comparative analysis, so we used both passive crowdsourcing to compare to existing active crowdsourcing project, Uchaguzi.
Based on these findings, we designed a ‘3Vs Crowdsourcing Framework for Elections’ made for practitioners such as journalists or crowdmappers. The aim of the framework is to provide guidelines for any crowdsourcers, new or experienced, who are interested in seeing if crowdsourcing is a viable option in a particular location and if so, what type of crowdsourcing is appropriate. This framework helps to carry out an effective crowdsourcing activity by prompting the potential crowdsourcer to investigate the factors that facilitate the sharing of information by ‘ordinary citizens,’ who generate the bulk of crowdsourced information.
We hope that this 3Vs Framework will be tested and revised through practical application in different country contexts. As a first test case, we have retroactively applied the draft framework to the Kenyan 2013 General Election context. In future scenarios, we hope potential deployers will test the potential viability of their crowdsourcing initiative using the framework prior to implementation. Based on the work achieved thus far, we look forward to engaging the wider crowdsourcing community to testing the 3Vs Framework in other countries approaching elections.
We are grateful to Canada’s International Development Research Centre for their support in funding this research.
Highlights from the Highway Africa 2013 (#Highway13) Conference: Speaking Truth to Power? | Nanjira : Confessions Of Aquila at 00:16:48AM Monday, September 9, 2013
[…] Media & Alternative Media in Elections & Accountability, in which I shared findings from research conducted and projects deployed in Kenya during the 2013 General Elections. I didn’t get to attend […]Reply
om at 03:42:00AM Thursday, September 12, 2013
Could you please indicate why the analysis in your study provides roughly 1/3 of one page discussing Uchaguzi? iHub members and both advisors were intimately aware of the project in Kenya as well as with their international partner SBTF.
The iHub blog post provides a serious disconnect between the introductory para and the following “Insights” section. Point 2 asks a reasonable question: which source provides the best ‘ground-truth’? That is of great interest but doesn’t actually get answered. The fillowing “Insights” section restates the “second objective” as ‘what information, if any, does Twitter provide over traditional media and Uchaguzi?’ This is getting closer to the actual report which genuinely appears to have a ‘goal’ of proving that Twitter is more useful than both other options. Whose interests were served by this study ?
Twitter analysis is claimed to have required 90 human hours and seemingly amazing speed of computational resources (mere seconds to process two million plus Tweets!) showing the power that research funding can have. The conclusions section, fails to mention Uchaguzi or SMS at all, but should have highlighted their ‘realtime’ capacity as a distinct advantage over Social Media and traditional news for ‘ground-truth’ during crises. 90 minutes is too long to wait when bad things are happening.
Conflict of interest is another issue. Dr. Maier’s work is, and has been, focused on the analysis of Tweet streams and he’s been responsible for significant resources spent to establish social media as a viable method of acquiring critical information with a shrinking role for human input. That your study’s conclusions echo his theses should then come under some scrutiny.
Can you claim your study to be unbiased? It reads like an advertisement for using realtime NLP on the Tweetstream to discover actionable information during critical events over SMS and traditional media. I’m glad to see the acknowledgement that traditional media outlets were significant contributors to the ‘relevant’ Tweets, substantiating their value as a channel of information – but disappointed to see the breakout of SMS and media as isolated information streams. Any real scenario will contain all three in combination, that when captured can provide vastly more useful realtime data than social media alone. I say this because building a social media “realtime analysis toolkit” will still only be relevant in the locations where citizens are connected to the internet. Whereas putting resources into the effective capture and analysis of SMS data will be applicable globally, now and in the future.
Important facts that should have been included in your analysis:
1. A decision was made on the eve of the elections to not make use of a crowdsourcing tool (microtasking) that could have significantly improved the processing capacity of the Uchaguzi effort. Even though trained personnel were available, they were not utilized in the design and implementation of the crowdsourcing tool, further preventing success.
2. The actual Uchaguzi SMS processing was ad-hoc and unrehearsed, therefore far less effective than could have been, making yours a compromised comparison (yet SMS was still extremely effective as a reporting mechanism).
3. Uchaguzi, afforded a response feature that was used to reply to an SMS message sender for report verification and amplification of details in ‘realtime’ that cannot be achieved as easily with social media. This feature was responsible for verification of violence and mobilization of security forces ‘in the moment’. (would security forces ever act on a Tweet in the same manner ?)
Oddly, iHub members and your report advisors were involved with decisions that possibly hampered the efficacy of Uchaguzi, which still produced valuable reporting by your own criteria. Imagine what could have been done with some resources put to advertisement of an SMS freecode and infrastructure to manage countrywide SMS response to unfolding events in realtime?!
SMS, besides being more affordable and available than internet access worldwide, provides one to one accountability and if aggregated, a collective sentiment that Twitter usernames cannot. As mentioned in the report, it is impossible to verify the actual country of residence or origin of Tweets. This enables a coordinated misinformation campaign to be effective; negating any value of anonymity.
On the other hand, SMS may be coerced individually, but regionally it is difficult or impossible to produce misleading ‘first hand’ reports. With a small amount of code, bad actors can be isolated and investigated later because their existence is valuable in the post-event analysis. With cooperative cell service providers, SMS anonymity can be preserved while location would still be available to use.
Far from being an objective analysis, the upshot of your study and section headings is that Twitter (along with computational analytical services) provides useful passive information that a centrally located, small team could achieve fairly quickly without the potential hangups of engaging the ‘crowd’. Active crowdsourcing may be problematic in terms of managing and training volunteers, but there is another value in having opportunities for humans to interact across borders that requires deeper understanding to quantify.
More development energy put into SMS tools will provide lasting value regardless of whether a crowd or technology is used to filter the messages. Whereas Facebook and Twitter are simply not relevant to the vast majority of humans on this earth.
Lastly, SMS affords the capacity to broadcast messages of importance to a well defined geographic region. Twitter etc. will never have this capacity in a reliable form.Reply
Angela Crandall at 10:49:14AM Sunday, September 15, 2013
Thanks for your comment and interest in the project. Please note that the crowdsourcing framework and Uchaguzi were two separate albeit similar projects. We utilized Uchaguzi as an instance of active crowdsourcing and not as a means of studying effectiveness of SMS in information dissemination. There is a separate report that discussed the findings of Uchaguzi that delves into the methodology deployed as well as the challenges.
The 3Vs Crowsourcing Framework had a broader scope of studying the viability of passive crowdsourcing and did comparative analysis with other sources of information during an election, which included Uchaguzi (active crowdsourcing), mainstream media and the field findings. In the comparative analysis section, there is provision on the type of information that can be captured by the different sources, their similarities and differences. Due to limited scope, the focus for this study was on use of online mechanisms for crowdsourcing, though we fully recognize the value of SMS (and voice calls for that matter).
• Note that we did not report that Twitter is more useful than other media sources. We did however find that Twitter does contain useful information, that could extracted using common data analysis techniques.
• We found that there was useful information contained in all the sources we studied – traditional media, Uchaguzi, Twitter, each had good features and bad.
• It is our opinion that all the media we studied are useful to study in the future. It is currently an open research question as to how to best combine different social media sources and this paper looked at that. It would seem odd to exclude any media such as Twitter, particularly since we found useful data.
• As a research topic, we found that passive crowdsourcing had been less studied than platforms such as Uchaguzi in the context of African elections. iHub Research is about researching technology and providing information to the local tech community not readily available, so we spent more time in the paper discussing the implementation details of mining data from Twitter than implementation details of Uchaguzi, as Uchaguzi was studied in-depth in a separate project and implementation details are available easily to the local community.
• Processing times for Twitter data are correct. To get these speeds we used the Python package scikit-learn using a Linear Support Vector Machine (LSVM) classifying algorithm. This was able to process tweets quickly as at can work with sparse matrix structures, which are particularly useful when representing the text in Tweets in a computer friendly manner.
• We do however note that there is a cost associated with quantitative data mining in that you need the technical expertise and the time spent training the algorithms. These are the most time-consuming aspect of setting up a quantitative analyser, which is discussed in the paper.
• Perhaps it is your personal opinion that “Facebook and Twitter are simply not relevant to the vast majority of humans on this earth.” However, using our 3Vs Framework, we found that Twitter was indeed viable in the context of Kenya, where online presence and access to and varied use of social media is increasing exponentially. Social media may not yet be as ubiquitous as SMS (which is of course not as ubiquitous as voice calling), but we believe they are all methods for engaging audiences that require on-going study. We recommend checking out the Kenyan case study in the Crowdsourcing Framework for statistics into Internet penetration, e.g. in the country.
• On the issue of the response feature that Uchaguzi had, we had hoped to assess near real-time verification of Twitter data, but could not do it at this time since we conducted a post-analysis (of all data sets). We do not believe it’s impossible to verify Twitter reports. We also did not indicate that it’s impossible to verify location, just that tweets were not geo-tagged. Our passive crowdsourcing means we did not make any calls for people to participate, which would have likely included asking people to geo-tag their tweets.
• If you are interested in exclusive SMS-based deployments, perhaps you should check out the Voix des Kivus project, which we also reference in the Crowdsourcing Framework. It should be emphasized that we have a section of the report that explains why it was viable to look to Twitter in the Kenyan context.
Hope that answers your queries. If you have further questions, we are happy to hear from you via email (research at ihub dot co dot ke).Reply
How Useful is a Tweet? at 23:37:51PM Wednesday, September 25, 2013
[…] month, iHub Research published the results of our research looking at crowdsourcing data collected during the March 2013 …. The research has become even more relevant against the backdrop of this weekend’s horrific […]Reply
Daudi Were at 21:15:49PM Sunday, September 29, 2013
Thank you iHub Research for the important and ground breaking work you are doing in this area.
I understand that this post is not about Uchaguzi however, for the sake of clarity regarding the comments above I would like to clarify that
Om was not involved in planning the strategy or execution of Uchaguzi.
Om was not involved in any Uchaguzi partnership negotiations.
Om was not involved in the decision-making around which tools the Uchaguzi partnership should use.
What Om does have is our email addresses and it is telling that he decided not to verify his claims about Uchaguzi with us.
Thanks again iHub Research for the work you are doing. Many of us working on real deployments on the ground find it invaluable.
Uchaguzi Kenya 2013 lead
dwere at Ushahidi dot com
How Useful Is A Tweet?: A review of the first tweets of the Westgate Attack at 14:20:33PM Thursday, October 3, 2013
[…] in days to come. This Westgate example gives further support for the findings derived from our 3Vs Crowdsourcing Framework Research, which will be officially launched at our cocktail event this evening, October 3, […]Reply
Designing Kenya’s Anti Corruption Platform at 22:57:47PM Thursday, October 17, 2013
[…] Collecting information is the easy part! It is crucial that all reports of corruption are taken through a process to establish how credible they are. There are various methods that can be used for this. For Uchaguzi, our citizen centred election platform, we had partners on the ground, trained and led by the Constitution & Reform Education Consortium, who we could call to check up on reports for us. The President has similar options available to him, officers of the Ethics and Anti-Corruption Commission for example. Another option is working with the Data Lab at iHub Research that has developed a variety of data mining and machine-learning techniques for verifying crowd-sourced information. […]Reply
- iHub Cluster
- iHub Consulting
- iHub Research
- iHub Robotics
- iHub UXlab
- This Week at iHub
- Enhancing interaction in the classroom using technology
- Intel Shark Tank: Wednesday 11th December, 2013 | 5 – 8 pm | iHub
- This week at iHub
- Airtel Money Developer Session Wednesday 4th Dec from 1:30-5:30pm