XXI.5 September + October 2014
Page: 38
Digital Citation

In big data we trust?

Juha Lehikoinen, Ville Koistinen

Suppose you need to make a decision about your Web-store entry-page design. The data tells you that the more you leverage search and filtering, the more users will view your products, so you create a design in which users can easily see all of the different ways of looking for products. Search and filtering are the most prominent features on your entry page.

As an alternative, you could have a design that focuses on communicating the experience attributes, is genuinely yours, and makes your company stand out from the competitors. But that design cannot incorporate the search and filtering upfront.



Which design will you choose?

There is a viral-like proverb floating around the Internet, describing big data like this: Big data is like teenage sex: Everyone talks about it, nobody really knows how to do it, and everyone thinks everyone else is doing it, so everyone claims they are doing it. (Original source unknown)

Indeed, typical to all heavily hyped terms, a sort of mysticism surrounds the whole phenomenon. Here, we look at big data from the human perspective, demystify some of the hype, and propose a human way forward with the big data approach. Along the way we emphasize the human perception of reality, trust, learning, and visualizing meaning.


Big Data for the Rest of US

Big data is taking over planet Earth. The raw data gathered and stored continuously by sensors and information systems across the globe, not to mention the social applications that track and store the trails of our human interactions in the digital realm, is beyond human comprehension. In 2013 the amount of stored data was estimated to be around 1,200 exabytes (1,200 billion gigabytes). This is a lot, folks.

There is no single scientific breakthrough behind big data. On the contrary, the methods used have been well known and established for quite some time. The reason the big data phenomenon became visible only recently is the ubiquitous availability of sufficient storage capacity and processing power—for a significantly lower fee than a decade ago. Not to mention the availability of data itself.

Thousands of applications have been identified, ranging from detecting diseases to predicting stock market behavior and consumer trends. In an increasing number of businesses, to make a decision you may just “ask the data.” Or, to be more precise, you develop a hypothesis stating a condition and then run a big data analysis that indicates whether the hypothesis is true (with a certain probability).

From the outset, big data is a very simple phenomenon. It is all about:

  • gathering data by utilizing multiple data sources
  • analyzing data to find correlations between myriad factors
  • making predictions based on correlations.

The philosophy behind big data is to use all of the data available. This is in strong contrast to the small data approach, where only samples of the whole datasets are analyzed, and statistical methods are used to calculate probabilities.

Obviously, some very serious hard science lies behind the scenes—learning algorithms, artificial intelligence, and the like. These are beyond the scope of this article.

Big data is more than just the methods, though. We think that more than anything else, big data is an approach to finding answers (or solving problems, if you like).

Big Data is an Approach

One might easily think that big data is just a big amount of data with some complex mathematical and computational operations thrown in. However, the very nature of big data—multiple incoherent data sources, hundreds of factors, untraceable correlations, complex math that only a handful of data scientists on Earth really understand—implies that while big data machinery may give us a very precise answer on where the flu will hit next, we do not know why.

During the process, the big data approach loses causality. It can answer what, but not why. The single factors that lead to the conclusion are there, but the correlations do not explain causality. The correlations may be completely random. Thus, the underlying human motivations can’t be extracted—even when applied to explain human behavior.

Then the obvious question from a human point of view is: Is it acceptable to act without knowing why? Is it okay to perform a surgery or to arrest someone if the big data says so (with a given accuracy)?

There are three severe pitfalls related to losing causality:

  • Responsibility. Who is responsible for the decision if it is made based solely on a big data prediction? The big data machinery?
  • Learning. When you do not know why, you do not learn. There is no way to develop your actions based on your earlier experiences if every time you need a decision, you just ask the data.
  • Trust. It is much more difficult to trust the results if you can’t see the reasons. What you need to do is trust the algorithm. Can you really consider an algorithm an authority? Even worse, if the prediction appears incorrect (false positive), it may be next to impossible to make corrective actions (for instance, if for some reason Facebook determines that you like to eat fish even though you are lethally allergic to it, you will get fish-related ads till the sun expands, no matter what you do). You can’t teach a predictive big data engine by changing just one tiny bit of data. Or, if you can, then how trustworthy is the whole system?

Big data may lead us to “data dictatorship” sooner than we may realize. We strongly oppose such a narrow mindset.

Another characteristic related to big data is imprecision. The big data approach implies using multiple heterogeneous datasets, or several pieces of “small data.” In many cases, the data was actually originally collected for some completely different purpose (data reusability), and the datasets are differently formatted and stored with varying accuracy. Inevitably, these different datasets bring imprecision. And then the question is: Do we accept improved quantity with deteriorated accuracy? The answer may be yes or no, depending heavily on what the data is used for.

Yet another aspect is change. Data can extract an approximation of the truth only on the current setting. Should the circumstances change, the analysis needs to be run again. This is especially the case when human behavior is involved—all may change when a new feature or artifact is introduced to people (data providers, in this case).

It is much more difficult to trust the results if you can’t see the reasons. What you need to do is trust the algorithm. Can you really consider an algorithm an authority?

We claim that big data is not about the amount of data or the machines or algorithms. Rather, it is about how we use the data. As an approach, big data is very promising, and it may change in many ways our core understanding of the world. Yet big data should not be exercised in isolation. Human involvement is important.

Human Perspective

What is the place for us humans in the big data prediction-making machinery? Are we just statisticians, or should we play an active role in every phase of data crunching?

In fact, we humans have a number of roles within the big data machinery, whether we wanted this or not:

  • Developers. Some of us develop the machinery itself.
  • Decision makers. Quite a few of us use big data systems daily.
  • Subjects. All of us are affected by big-data-based decisions, in one way or another.
  • Data providers. All of us provide huge amounts of personal data for the big data machine to process.

Next, we will look at the big data developers and decision makers. The two latter groups would deserve an entire article of their own. (Rest assured, we will get back to them in a future article.)

Learning algorithms are able to find correlations between thousands of factors, something far beyond what is humanly possible. However, it can be argued that big data is never smarter than the human operators mining it. There are two aspects that essentially define the usefulness of big data:

  • What can ultimately be asked?
  • How do I figure out what is important?

It takes a lot of human creativity to innovate on what datasets to use, what the important correlations are (the big data machinery may identify dozens of correlations that at the end of the day will have nothing to do with your original quest), and how to test them. Building a model for finding out the desired answers indeed requires human intervention. Big data is not magic. The big data approach does not provide answers if you do not know how to ask.

In addition to an accurate model, another important part of the process is presenting the results. Information visualization, especially for huge datasets, is a creative task of its own that requires not only understanding the problem domain, but also human cognitive capabilities and restrictions.

Good visualizations communicate the information quickly and clearly, and make complex systems, statuses, and trends understandable at a glance. Badly designed visualizations can leave the user more confused than before, tell nothing, or, even worse, lead the user to draw incorrect conclusions. There is a plentitude of bad examples available, produced even by those who specialize in visualizations. There are even websites dedicated to bad examples of data visualization.

This is why we always emphasize understanding the phenomena—“mining the essential”—before starting the visualization design. The goal is to convey meaning, not just to visualize discrete data points, no matter how well laid out. We think that the importance of visualizing meaning will increase significantly in the years to come.

But how, then, should we best utilize big data in decision making and prediction? Blindly trusting the data itself may be a perilous route, regardless of how well it is presented. We may well expect that big data answers our “what” questions, and then need to work out ways to also answer why. This is where we propose putting people back in the driver’s seat.

How to Get Both What and Why

We as humans love stories. We are very good at understanding stories and creating them from distinct pieces of events, people, and places. Our lives can best be described as stories. We are also very good at telling stories. In fact, storytelling is an efficient means of exchanging information between human beings—especially information related to business decisions or scientific discoveries. We learn from each other’s experiences and distribute those learnings as stories in one form or another.

Storytelling, or human perception of reality, is something that big data is not good at. Depending on the context, you may also call it hunch, experience, or instinct. Nevertheless, it is a matter of interpreting reality through our external and internal contexts, past and present.

This is why we propose a twofold approach. We propose utilizing data where it is at its best, and a human approach where it’s better suited. We will get both what and why. Or, better yet, we can combine the two. Results from a big data analysis may be considered part of your current context—one additional source for making informed decisions.

We strongly recommend fusing big data decision making with qualitative research methods that rely on human perception of reality. With these two methods combined, our understanding of the world will take a significant leap. We get the world full of data along with human perspective, trust, and learning.

In some cases, we may start with the big data approach to identify possible solutions and then work on the results with qualitative methods to figure out why. Or, alternatively, we may start with qualitative research to identify future trends or human behavioral patterns, and then use big data to validate them. Either way, the two different methods support each other significantly.

Obviously, when there is not yet any data, you can’t use big data for any prediction. A typical data-sparse case is product development, where the features of a new product are planned. You may utilize big data to a certain degree, for example to figure out the past behaviors of the future product users, to analyze competitor products, and the like. This data may greatly help to reveal otherwise undetectable patterns, and is invaluable in the hands of an experienced researcher. However, it takes a human to actually talk with people face-to-face to find out what their expectations and future wishes and needs are. Further, it takes a human to translate the vague human expressions—coupled with results from big data analysis—into concrete product features and functionalities, and a human designer to make a realization of a new product or feature that matches both the data and human aspects. There is no big data machinery that can make such predictions, interpretations, and designs.

Let’s now get back to the original question of the Web-store entry-page design we raised at the outset. This scenario highlights the importance of combining both what and why. The data clearly says that search and filtering are essential, and the human perception and experience tell the importance of appealing design, brand identity, and uniqueness. Therefore, while redesigning the site, you may utilize both worlds. You can make search and filtering as easy to access and use as possible, but you should not sacrifice the other equally essential aspects while doing so. You may even innovate completely new ways of providing easy access to search and filtering along the way (this is a typical example of creating new: The data tells you where the problem is and provides you with reason to innovate). Making decisions solely based on past data, without the human perspective, creative input, and some risk-taking, will not make you a leader.

The above example is as simple as any, and not much research is needed to extract the human perception in this case. However, you may once have to face an arbitrarily complex matter in an otherwise similar condition. Then, analyzing the data, validating findings, and identifying new directions with qualitative research is highly recommended. Taking a leap of faith by trusting only the data can be avoided.

This is the future of decision making we would like to see: Combine big data analysis with qualitative research. This is our “what and why” approach.


Juha Lehikoinen is co-founder and chairman at Leadin Oy, one of the fastest-growing user experience service agencies in Europe. His research interests include understanding human-data interactions and interrelations, and augmented reality interactions. He is the lead author of Personal Content Experience (John Wiley & Sons, 2007). juha.lehikoinen@leadin.fi

Ville Koistinen is principal designer at Leadin, working on design assignments from the industrial Internet to automotive. His interests include design for challenging environments and contexts, and special user groups. ville.koistinen@leadin.fi

©2014 ACM  1072-5220/14/09  $15.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2014 ACM, Inc.

Post Comment

No Comments Found