No Thumbnail Available
The challenge of using telecom big data to report poverty in developing countries : the case of Uganda
Files
Helleputte_13411000_2015.pdf
Closed access - Adobe PDF
- 8.19 MB
Helleputte_13411000_2015_Appendices.pdf
Closed access - Adobe PDF
- 4.54 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- It is widely agreed that good governance requires measures to elaborate, implement and monitor policies. But measurement demands reliable data and unfortunately, while in developed societies, data abound, developing countries are desperately short of them. In parallel, big data have recently become a buzzword and being plentiful, cheap, real-time and neutral, people started to wonder if they could actually fill current gaps in the developing world data. A new field was born: big data for development. Our work falls within this context and more particularly looks into the use of one type of big data – telecom data – to tackle one developmental goal – poverty alleviation. It relies on airtime credit purchases (allowing economic analysis) and calling records (allowing social, spatial and technological analyses) from an operator in Uganda, provided by the private company Real Impact Analytics (RIA). Simply put, it aims at finding in telecom data indicators that can replicate a regional poverty ranking from ground-truth data while fostering thinking about the underlying methodological challenges. We structure our approach in four parts. First, we review the state of the art of big data for development. We explain its core concepts (definition, types and features of big data), the opportunities it offers (completing missing data, offering real-time monitoring and feedback, etc.) and the challenges it raises (privacy, access, misuse, analytics issues, multidisciplinarity, contextualisation, actionability). Then, we delve into the application of interest and go through the scientific literature about using telecom data to derive socio-economic indicators. We present numerous findings that we could subsequently confirm or disconfirm in our analysis. Second, we provide the context of the analysis. We briefly describe Uganda and its position in the mobile revolution (penetration rate: ~50 subscriptions per 100 inhabitants). We also describe the datasets we used: (i) the telecom data, consisting of daily top-ups and daily Call Detail Records (CDRs) from 1/02/2015 to 10/05/2015, and (ii) the ground-truth data we compared them against, consisting of poverty indicators at the level of 10 regions, derived from the last Uganda National Household Survey (UNHS) conducted in 2012/13. Third, we open up methodological issues. We detail the steps of our analysis: data pre-processing, indicators computation and results validation. More importantly, we discuss the limitations of such analyses. We explain that the sample of one operator’s mobile phone users is biased, that data are not exempt of errors, that computations are sensitive and that the assumptions that need to be made cannot always be formally verified. Fourth, we present our mitigated results. Probably due to the overall low penetration rate in Uganda, the operator’s penetration rate, the level of activity and the average number of contacts end up being the only indicators able to satisfactorily replicate the regional poverty ranking. The sum of top-ups and some traditional economic indicators computed with it can replicate the ranking, but only if the least penetrated (that is also the poorest!) sub-region is removed. Top-up spending cannot properly reflect income distribution and inequality, suggesting we cannot assume a linear relationship between an individual’s sum of top-ups and income, and/or that our sample is not representative enough of the whole population. The mobility-based and technology-based indicators computed from the CDRs are not conclusive. Generally speaking, richest areas are more often properly detected than the poorest ones. After suggesting several future perspectives (both research- and operation-oriented), we conclude by stressing three key points that one should always keep in mind when conducting big data research: the importance of contextualization, the sensitivity of pre-processing and the limitations of the data.