Open space for technologists, investors, tech companies and hackers in Nairobi.


iHub Research By Angela Okune / December 1, 2011

Insights from Safaricom “trash”


By Guest blogger, Elvis Bando

A few weeks ago, @chrisorwa, showed me some datasets that he had been working on. I got an adrenaline kick by just looking at the data. Mostly because it was challenging, then again, the prospects of cracking the data was even more motivating. In a previous blog, Chris wrote about trash sourcing, basically extracting information from trash, which was the basis of his project #Saiclique. When he invited me to join him in the project, I realized that some of the data he had were in a format I could not manipulate (we use different core analysis softwares, I use Rapid Miner, he uses Weka) so we had to start data entry process again. We started here:

We ended up getting the following from the cards (we did slightly over 1000 cards):

From this, I generated a beautiful dataset:

I thought the serial must be a concatenation of 4 sets of 4 digits, so I split the data into that. Running a DBSCAN clustering algorithm on Rapid Miner gave me the following:

The tall column is 0526, which was the second set in my 4-4-4-x data split. None of any other configurations had such strength. What does 0526 even mean? Just to confirm that 0526 meant something, I ran a frequency analysis of each digit in the entire serial and using Benford’s Law (the first digit is always 1, 30% of the time),  I narrowed the cluster to 2-6-3-x configuration.

Not to bore you with my train of thought, after numerous other modelling and analysis, the data finally spoke, here is the transcript:

Safaricom serializing system seems to be similar to, or based on descriptions of a patented system found at

If true, then the card serial number contains information about a card, the date and time it was produced and unique identifier. The rest of the information are called by the code from the system (the called info could be the amount of talk time, the expiry etc). The serial is therefore the only unique identifier of a particular card and show whether or not it has been used or not.

An analysis of cards produced in 2010 and earlier indicate that they were sequential for most parts. The initial two digits was 10 throughout the year indicating, probably the year of production. The remaining parts were sequential. The change of this system was probably because they would have run out of state space. At that time, the serial was a 13 digit number, as opposed to the current 17 digits.

Safaricom prepaid card serial number  is organized into:


The batch number is a two numeral number running from 01-99. It is splits the batch of cards produced each hour to an approximately 10,000 cards. This ensures that they are easily identifiable in case there is theft or a problem.

ManDate is the date of production of the cards. It is written in the format yy-mm-dd. It is exactly 2 years to the expiry date.

Time is the approximate hour in which the cards were produced. It runs from 000-220 (with increments of 10, so we have 010..020…100..110..). In each hour, 99 batches of cards are produced (see Batch#).

Finally, there is the serial part which is a sequential number. The data I have may be inconclusive but it shows that each day (Time), about 1 million cards are serialized. All cards are serialized the same way, so there is no telling the value of a card from the serial (damn!).

What Next?

The dataset could possibly have more information. This may be limiting in the current analysis as variables such as the location of collection of the cards, and the date of collection. This can possibly give a good picture of economic indicators, customer spending and possibly zone spending regions.

The writer is the team leader, Doban Africa Ltd. For more information or access to the raw data, contact Chris Orwa @chrisorwa.

Author : Angela Okune

Angela is Research Lead at iHub. She is keen on growing knowledge on the uptake and utility of ICTs in East Africa. She is also co-lead of Waza Experience, an iHub community initiative aimed at prompting under-privileged children to explore innovation and entrepreneurship concepts grounded in real-world experience.

  • Fred at 11:42:15AM Thursday, December 1, 2011

    Quite an insight….

  • Nick Hargreaves at 12:07:31PM Thursday, December 1, 2011

    Wow, cool work Chris.

  • Chris Orwa at 13:22:05PM Thursday, December 1, 2011

    @Fred : Thanks,
    @Nick: The research is still on-gong & more insights are coming your way [STAY TUNED]

  • Anon at 02:50:32AM Sunday, December 4, 2011

    It’s a crypt-analyst’s treasure trove you’ve got there.. Good work.

  • Samuel Ngoda at 00:01:23AM Monday, December 5, 2011

    This is soooo cool.

  • Elvis (@levisdoban) at 14:34:30PM Tuesday, December 6, 2011

    just a small correction: shows that each hour (Time), about 1 million cards are serialized. it should be each day. I hope to publish more insights in a comprehensive report later this month.

  • Angela Crandall at 15:07:57PM Tuesday, December 6, 2011

    Have made the change @levisdoban. Thanks.

  • Chris at 17:41:38PM Thursday, March 29, 2012

    And why were you doin this and for whose benefit. Might sound dumb but I am not into analytics and you might shed light into this mysteriuos art.

  • kamal twaha at 19:26:50PM Sunday, April 1, 2012

    if you use mathemathics eqns in permutations and combination , you will get shocking results


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

{{ theme:js file="jquery.fittext.js" }}