Kathi Kitner, Thea de Wet
In the early part of this millennium, the term big data was beginning to be bandied about among those in technology-related disciplines. The term was spoken with a bit of awe, a bit of cockiness, and always a breathlessness about what incredible potential big data held for making our world more understandable, more fixable, and overall, just better. The ability to account for and measure everyday human behavior in real time at a fine granular level had—and still has—most of the business and academic worlds agog at the possibilities.
But just like new parents who celebrate the knowledge that they will soon have a baby, full of hope for a bright and shiny future yet blind to the reality of having an infant, many big data advocates and proselytizers have little idea of all the work necessary to bring the promise of big data to fruition. For parents, there are sleepless nights, doctor's bills, unexpected costs, and a never-ending unfolding of challenges and problems until that child is old enough to leave home and be independent. So it goes with big data too: sleepless nights of data cleaning and transfer, reiterating the protocols, seeking experts to help with crashed datasets, unexpected costs, and confusing blips in the analysis. All of this and still we have not arrived at the key purpose of using big data: finding a way to present that data in a usable form to solve the problem we originally wanted to address. In short, big data is still not ready to leave the nest and fly.
More recently, and yet closely related to big data, is the Internet of Things (IoT), which has folks dancing in the aisles again, for it is the next big thing—and when IoT pairs up with big data at the dance, the two partners should make data transformation at least appear to be seamless. However, the problems and challenges of making the IoT useful in everyday life are much the same as those faced by big data. In fact, if one thinks about it, the IoT is the infrastructure that gives us the big data; thus, the two things are tightly intertwined. This interconnectedness does not make either one more easily approachable by the layperson, nor is the pairing really any more user-friendly to those with more technical training. While there is research that addresses the challenges big data and the IoT pose to large institutions such as hospitals, or to those coding for IoT deployments, the needs of the consumer/end user are not as well explored or critically understood. Outside of some of the quirky "connected" toasters or a few home-energy products like the Nest thermometer, big data and the Internet of Things have not yet penetrated the common person's mindset. And yet, there is the belief by many technologists that great things will come if only we could hook up the sensors and get the data flowing. The rest of the usefulness will easily follow.
For all the talk about how potentially transformative the IoT and big data are for the world, there is little acknowledgement that the majority of imaginings and works in progress are focused on a particular demographic: the global elite and the almost-elite. Until recently, when researchers did talk about the rest of the world, they did so in terms of what could be learned about, for example, Kenyans and the mining of their entire set of mobile phone records to search for novel behavior patterns. Searching such a database could aid development efforts, policy makers suggested, with a tone of certainty and gravitas. But in that tone there was little acknowledgement of personal data agency (the right of all people to determine how their data will be used), and even less concern that even now, in 2015, less than half the world's population has ever accessed the Internet, and therefore is unlikely (for now) to participate in the promise of a better world improved by the IoT and big data. (See SciDevNet's Spotlight series , or ICTWorks recent post  for more on this particular issue.)
Not understanding how everyday people would make use of big data, along with the lack of interest in big data by those same people, poses an enormous problem. To illustrate some of the dilemmas we see in the often-unexamined promise of big data, we will look at a project our team did that examined the adoption of smartphones by participants in the informal economy in Johannesburg, South Africa.
Partly in order to challenge the parochialism of the hubbub around big data, we began with asking fundamental questions about urban life in a large African metropolis. In our particular case, we started with wanting to understand the everyday lives of people commonly known as hawkers, or street traders, in South Africa. The traders are a part of the informal economy that makes up some 80 percent of Sub-Saharan Africa's labor force and generates up to 55 percent of a country's GDP. We wondered about technology use among traders, and if there were new ways of organizing information and goods that would benefit their economic activities. We had two broad research questions:
- Among those unfamiliar with accessing the Internet or using smart mobile devices, what are the key challenges in learning these technologies?
- Could enough data be generated to create ground-up insights from the field that would give both a qualitative and a quantitative understanding of the street traders' behaviors and that could inform the design of a better system of procuring and selling foodstuffs in Johannesburg?
In our perhaps naive hope of understanding the urban social dynamics of traders as key operators in ensuring food security for working class residents of Johannesburg, we devised a research strategy that would allow us to track the traders, food vendors, and university students (a contrasting group) through smartphones (supplied by us). We hoped that the massive dataset generated by everyday use by 25 participants over 12 weeks would give us key insights that could be contextualized with our ethnographic work with the traders.
These lofty goals were, however, affected by the following three significant problems: lack of familiarity with Internet-enabled technology, infrastructure issues, and the analysis of resulting data.
Lack of familiarity with Internet-enabled technology. While the study overall could also illustrate many of the methodological challenges of fieldwork in the 21st century (how to conduct research in a corporation, how to work across time zones and great distances, problems of cross-disciplinarity, funding, gaining field access, language barriers, field assistants, and many more), here we will address the unexpected experience of making the technology usable and useful for the participants.
Most participants did not have an email address, and nobody knew how to use email on a phone.
Most of the people who received Android smartphones were not familiar with using any sort of Internet-enabled technology. While most had mobile phones, their phones were not new and shiny, and no one had ever used one to access the Internet. Participants used phones to make voice calls and send text messages. Only a few had used Facebook or Whatsapp, or any other social media services.
Knowing this, we organized a workshop that served to welcome the participants to the study, explain fully what we were doing, get signatures on confidentiality forms, and give one-on-one how-to sessions on the phone's set-up and operation. Looking back, we should have had an insight from that day that the research would not smoothly unfold. Most participants (except for the students) did not have an email address, and nobody knew how to use email on a phone. It was a considerable effort to try and explain what email was and why everybody needed an email address to be in the study. Many participants were simply daunted by the phone's complexity, and those who were not were far too eager to customize it. In the coming weeks, researchers were continually going out to reset phones that had been "wiped" by accident, which also meant that our tracking program had been corrupted or deleted. This took countless hours, but showed us just how complicated a simple smartphone is to those who have never owned one—or anything similar—before. The primary activity during the first month after deploying the phones was to reset and restart the data-collection process enabled through the phones.
Infrastructure issues. Working from the milieu of the middle to upper classes of urban centers worldwide, we as researchers are almost embarrassed to admit that we discounted communications infrastructure as a key element when designing the study. We knew cellphone service (mostly 3G and some new 4G coverage) to be ubiquitous and mostly reliable in Johannesburg. However, the research design called for the data on each phone to be uploaded via Wi-Fi connections in order to save on data costs, which are still high in South Africa. While Wi-Fi hotspots are still not pervasive in South Africa (or many other places in the majority of the world), we believed there were enough available to participants to make this a feasible option for inexpensively uploading the data. What we did not count on was that the participants did not know about Wi-Fi, nor did they know how to find it, and if they found it, they did not know how to connect. Even the university students who participated had to struggle to get university Wi-Fi settings. At the time of study, smartphones were only just becoming common, and the university IT department was working feverishly to make access available securely to all on campus. Security concerns, spotty and intermittent coverage, and lack of understanding about how Wi-Fi works were all barriers.
Our solution was threefold: 1) we would collect some non-student phones on a staggered schedule, carry them to one of the researchers' houses that was equipped with Wi-Fi, upload the data, and then return the phones; 2) go to the participants in the field equipped with purchased Wi-Fi hotspots that were sold by the telecom companies; and 3) meet with students on campus to help them connect to Wi-Fi and upload their data. All solutions had disadvantages; however, the biggest one was the time it took to carry out the data-upload tasks.
Data losses, due to the previous problems of accidental data erasure or data lost in transmission, were somewhat overcome by having a surfeit of data being collected and, critically, by having qualitative/ethnographic data to supplement the digital outputs. This was little consolation in practice, and points to how the seemingly small problems of deploying big data research can lead to major headaches and an inability to achieve the research goals.
Analysis of results. This was the most egregious problem of all. Specific problems included: the inability to sufficiently construct a data program to analyze the results and make the data usable for the participants, and answer questions about food supply; transportation systems; communications between, for example, buyers and sellers; and, most basically, how people learn a new technology over time. While some of these questions were answered in a simple, one-dimensional sense that one might get from survey data, the richness promised by big data analysis never came to fruition. Why?
While we searched for someone with the right skillset, there was no one available to us in South Africa to assist with analyzing the data. The key problem was availability of data analysts. There were skilled analysts, but either they were swamped with work or our budget was not sufficient to hire them. While the university IT department was extremely helpful in pulling the datasets from the servers and putting them into large comma-delimited files, even that was a difficult job due to the qualities of the text data collected.
Once I was back in the U.S., I envisioned having a software coder help me to arrange the datasets and construct data algorithms to pull out more complex and nuanced insights into the everyday behaviors and practices of the study's participants. But due to the vagaries of the industry and rapidly shifting priorities, those hopes never materialized. As a result, in a certain sense the project was a failure, but it still speaks to pitfalls to be encountered when trying to construct a big data project that is outside of the usual parameters of, for example, a large social media company or a well-funded nonprofit or government organization that has data scientists to spare.
Our big data experience has taught us that, not surprisingly, some things are easier than others, but all are difficult to a degree. Here is a list of things to keep in mind for anyone wanting to run a big data project on any scale:
- Data is just data without the context, and there is power in context. In our project, the context would be gained through ethnographic research, which must be carried out before you design a system and way before you think of deploying it. Understanding the people and the broader, more particular culture, community, and infrastructure on the ground in advance will help immensely when it comes time for analysis or if parts of your dataset turn up missing. And here we cannot emphasize enough that "knowing the community" does not come from a few focus groups or careful screening of secondary sources like market or demographic studies. It comes from being in the community, meeting with different types of people, and paying attention to and learning from histories and stories, tensions and gaps. Christopher Le Dantec and Sarah Fox describe facing the difficulties of gaining community trust even after they thought the rapport had been built . If what you are building is intended to benefit the community, then how those benefits will play out—or won't—should be well understood in advance. This can come only from direct engagement with various community members.
- From knowing the community, it follows to double check what you think you know and then make corrections. Check again in a bit. Is it working? Do you need to adjust again? Staying true to a research design is great where possible, but the design should also allow for frequent iterations when necessary. For example, had we known about the difficulties with Wi-Fi access, we could have planned for more time in the field, or chosen a different path for data upload.
- Budget for the data work, in both time and money. This is a critical part of the overall big data work, but unfortunately is too often overlooked. Many projects fail due to neglect of invisible data work. In our case, failing to truly appreciate how hard and time-consuming it is to conduct long-distance research of this type was one aspect; not fully appreciating that entering into an entirely new domain, in this case big data analytics, was another.
There are other questions tightly wound with big data that should not be forgotten when undertaking a new project: the privacy and security of the data and its creators need to be well understood; the value of data (both in terms of personal value and the data's circulatory value) should be calculated; and the simple cost of data streams and access to the same must also be adequately accounted for in future work.
In closing, it is worth mentioning that the participants learned as much as we, the researchers, did. They got a new glimpse of what is possible and a sense of belonging to—and making—their own future. The road to that future is not without difficulties and glitches to overcome, both for them and for us as researchers. It does little good to be glib about new technologies, new methods, and the potential of big data or a super-connected future when in truth the road is full of bumps and holes. Yet there is no turning back: The cost of data is high but still desirable; the network coverage is spotty but rapidly improving; and the magic of Whatsapp, at-your-fingertips football scores, and online global business information will be forever captivating.
1. Piotrowski, J. Big obstacles ahead for big data for development. SciDev.net. Apr. 14, 2015; http://www.scidev.net/global/data/feature/obstacles-big-datadevelopment.html
2. ICTWorks. How can we all profit from development data? Mar. 18, 2015; http://bit.ly/1MOsbrd
Kathi R. Kitner, a cultural anthropologist and senior research scientist with Intel Labs since 2006, is interested in how social histories and cultural constructs (e.g., class and gender) act as a conduit for different types of emerging technology usage and adoption. email@example.com
Thea de Wet is a professor of anthropology and development studies and director of the Centre for Anthropological Research at the University of Johannesburg, South Africa. She is currently focusing on urban food security, weather and local knowledge, and street traders and technology use. firstname.lastname@example.org
©2015 ACM 1072-5220/15/07 $15.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.