Some grand challenges—if not the grand challenges—of the next decade will be in reframing the notion of "personal" in personal data and in managing such data ethically.
What is "personal" data?
First, there is data we share intentionally when we use, for example, social media platforms, chat applications, dating sites, location-sharing sites, and cloud services. Much shared content, whether an image, a statement, or a document, is personal. And then there is the personal data and metadata traces we share unintentionally—data collected actively and passively when we interact with, encounter, or are encountered by digital technologies at large. Such "technologies at large" are increasingly becoming part of the fabric of our lives. This emerging world is made up of "smart cities" with sensors and computation embedded in the urban landscape , with satellites tracking the movement of cars for individualized toll collection , with vehicles that track not just our movements but also our velocity, and with homes ever more filled with "listening machines" . This is a transition to living within a vast interconnected sensing apparatus. This is the world of the so-called Internet of Things, where digital data capture is now designed into the mundane flows of everyday life. In this world, we as "ordinary" citizens unintentionally and unknowingly leave data breadcrumbs at multiple levels of granularity wherever we go. Unplanned, unknown, and ad hoc data streams increase every day; data capture and collection practices are constantly expanding.
The cautious call these surveillance practices; the optimistic call them a fantastic opportunity for effective service provision, and, of course, for scientific research. Whichever group you identify with most, the truth is we are intrigued with the potential for "insights," with the promise of "smart" services, and with the excitement of the power of data—sometimes we are giddied by what we can learn about ourselves and others.
But there are several consequences of this giddiness about the potential for data capture.
First, much data collection is done without a purpose and a known set of intents. It is data collected because it is possible, lacking an actively designed capture, management, and analysis strategy. This is data collection just because, a hoarding of data that may become useful down the road but for which neither a use case nor a business case has yet been defined. Without a strategy and a purpose, we often fail to carefully assess how well the captured data addresses critical questions; we fail to engage in considered sampling. In this regard and many others, we see that data science—the set of "scientific practices" that sits at the center of all the promised and hoped-for research insights—is less developed, organized, and purposeful than most of the optimistic views assume. Data science is in its infancy. We are learning as we go. We are building models on the fly. To date, too little work is done to identify and address incorrect inferences and the fact that the categorization of people and activities is often elastic and uncertain, due to ineffective intent modeling. Despite belief in the truthiness of data, in the certainty of "hard" numbers, this is research without clear scientific standards for rigor, and without transparency and the ethical checks and balances that formal research processes require. In economics terms, the farming and manufacturing of data is a primary market, full of promise.
This is where we as HCI researchers can leap in. We need to take a proactive, critical stance when we are asked to design, develop, or evaluate devices and services that incorporate data capture, storage, and analysis. We need to speak up when it comes to service design and actively investigate business models and incentives that drive developers to intentionally avoid secure data practices  and businesses to sell data—what is called floating data—and treat personal data with a cavalier attitude.
There are also some prosaic and practical things we can do, beyond advocating for a more abstract, ethical stance on data collection:
- Informed consent is an obvious place to start. For consent to be informed, there needs to be information. This area is riddled with poor information design and poor interaction design.
- Beyond information for consent, we need to think carefully about fair use and quality assurance, requiring that only data needed for the task at hand be collected and that it be used only for the purposes stated.
- We need to design for effective opt-out. Good examples of opting out are people's practices around the use of "incognito" modes and throwaway accounts. All services should be designed with incognito and guest options, where data is not stored in any way that can be linked back to the individual.
- We need to design for legibility, scrutability , and "inverse privacy." As coauthors Yuri Gurevich, Efim Hudis, and Jeannette M. Wing argue in the July 2016 issue of Communications of the ACM, "the inverse privacy entitlement principle" puts the burden of justification for data collection and hoarding on the data holder and calls for giving individuals access to the data collected about them. They call this the share back policy .
- We need to design for deletion. If floating data is a business model, then data return on discontinuation of service should also be possible. Of course, this is technically and legally difficult owing to the requirements of aggregate statistical modeling and data privacy laws. How can we really remove all traces of someone from an aggregate? What about the connective tissue of associative metadata? Difficulty is not a reasonable excuse for inaction: We need to figure out what data deletion means for people and for data modeling.
- Finally, we need to build better risk and trust models. We need to focus on mutual value and benefit in these models, and on creating lasting partnerships that engender trust all the way up and down the chain—from technical infrastructure to customer service to the design of and adherence to policy.
The rate of change in the volume, variety, and ubiquity of data is arguably impossible to measure, let alone fully understand. Finding the right path starts with asking the right questions but also with being clear as to what we are setting up for the future. It is time for us all to become more inquisitive as consumers and researchers. It is time for us as HCI practitioners and researchers to become more informed about personal data collection—we need to ask: Why collect? Why store? For what purpose and with what rationale are we collecting the data?
2. In Singapore, for example: http://mashable.com/2016/02/26/singapore-satellite-tracking-cars/#g6.dk2u5iiqF
5. I have written about "scrutable data models" in an earlier column: http://interactions.acm.org/archive/view/september-october-2014/scrupulous-scrutable-and-sumptuous-personal-data-futures
Originally from the U.K., Elizabeth Churchill has been leading corporate research at top U.S. companies for the past 18 years. Her research interests include social media, distributed collaboration, mediated communication, and ubiquitous and embedded computing applications. email@example.com
Copyright held by author
The Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.