Behavior-driven testing of big data exploration tools

Authors:
Leilani Battle

Companies, governments, and institutions all over the world use massive datasets to make decisions that affect our daily lives, such as how climate change is addressed, who is protected from Covid-19, or which investments to prioritize to maximize a company's growth. A major focus of visualization research is to develop innovative tools for data exploration, where an analyst visually and interactively queries their data to understand how they can leverage it to help their organization make informed decisions [1]. Across all sectors, data exploration is central to business development and decision-making processes. For example, before an analyst can effectively train a machine-learning model on a new dataset, they must first understand what data attributes are available, which attributes may be of interest to their organization, and what relationships may exist between these attributes that could be modeled. Since analysts may not know the answers to these questions a priori, they discover them via exploration.

However, big data is valuable only when it can provide useful insights, which analysts seek to extract from their exploration of data visualizations. Much like mastering a new recipe, it takes effort and skill to process data and design effective visualizations, and analysts are increasingly turning to computational tools to support their efforts to visualize massive data and process it into a form suitable for consumption. Evaluating these tools, however, is challenging. For example, work from my group shows how data exploration tools can slow down, bias, confuse, or even mislead analysts as they explore [1,2,3].

This challenge arises in part because data exploration tools are used for a wide variety of problems and by a wide variety of people, and standard tool benchmarks are unequipped to handle these variations [3]. For example, tool performance issues are traditionally detected using benchmarking software (e.g., as used by database system builders [4]) or user studies (e.g., as used by visualization tool builders [5]). Database benchmarks simulate how well a database system supports a range of routine business intelligence operations, a core use case for analytics-oriented database systems [4]; however, analysts often query their data opportunistically based on statistical patterns or anomalies seen as they explore, resulting in spontaneous queries that these benchmarks fail to capture [3]. User studies help researchers investigate how analysts interact with a data exploration tool [5,6], but unlike benchmarks, they capture only a fraction of the scenarios that this tool may encounter in the real world; thus, they fail to generalize across many tools, datasets, and data exploration contexts. Consider the following example that illustrates a few of these challenges. Figure 1 shows a (stylized) snippet of a data exploration session, which drives the two following motivating scenarios.

Figure 1. How exploration intent drives an analyst's interactions with a visualization interface, and how these interactions correspond to database queries executed by a back-end database system.

Building data exploration interfaces. A junior analyst named James has just joined a project to investigate how a new healthcare app can help users navigate health disparities in the U.S. James wants to assemble a data-driven list of observed disparities for further analysis. To start, he decides to explore the National Health and Nutrition Examination Survey (NHANES; https://www.cdc.gov/nchs/nhanes/index.htm) dataset released by the U.S. government. To make it easier to query the many files that make up the NHANES dataset, James loads them into a database system. To make his database easy to explore, he queries it through Tableau Desktop, a data exploration tool. Acting on his intent to identify health disparities, James creates an initial box plot showing the distribution of body mass index (BMI) values per race. He then adds a checkbox filter, which updates a histogram of BMI values for the races that he clicks on. The tool issues queries to the database system when a checkbox is clicked. James, however, notices an annoying delay before the visualizations update and after he clicks. He searches online for a solution, but with limited experience with database systems and Tableau, he is unable to fix the problem.

Evaluating optimisations for exploration tools. Laura is a developer building a new back-end optimizer for data exploration tools. Laura's optimizer caches and reuses query results from database systems in an interaction-aware way, inspired by prior studies of user exploration behavior (e.g., [1,2,6]). Her optimizer should, in theory, support exploration using a range of visualization types, including interactive bar charts, box plots, geographic maps, and parallel coordinates plots. Unfortunately, Laura cannot fully test the versatility of her optimizer because existing benchmarks either simulate inapplicable use cases or support only bar charts [3].

A possible solution: Behavior-driven testing of big data exploration tools. I was recently awarded an early career grant from the National Science Foundation (https://www.nsf.gov/awardsearch/showAward?AWD_ID=2141506) to tackle the performance challenges faced by analysts like James and developers like Laura. My vision for this project is to develop better ways to evaluate tools as they are used in the wild; if we can automate the way we evaluate data exploration tools, then we can automatically test new tools as soon as they are created, tune them to real workloads, and help analysts be more efficient and effective at generating insights. The key idea behind our evaluation strategy is to simulate how analysts explore their data using new and existing models of user interaction behavior.

There are two major hurdles, however, to designing simulation-based evaluations. First, we lack appropriate methods for specifying an analyst's intent—in other words, their analysis goals—in a way that data exploration tools can fully grasp them. For example, James is interested in analyzing potential healthcare disparities, but without a deeper understanding of the context behind the NHANES dataset, such as what a disparity is, our automated software will struggle with simulating James's use case. Second, how do we measure how realistic a given simulation is, and what does it mean for a simulation to be realistic? We have a few ideas, for example, asking experts to watch videos of real and simulated exploration sessions and asking them to guess which videos are real and which are fake, but these questions remain unanswered in academic literature.

As we embark on this project, we are excited about opportunities to learn from related areas of computer science to overcome these hurdles, such as artificial intelligence, robotics, and game design. For example, game designers and roboticists often design intelligent agents to reason about a given environment and respond to other agents' actions, which could help us simulate how analysts interact with data exploration tools to achieve specific objectives. By thinking about these problems from another perspective, we are also encouraged to reflect on how we perceive the interplay between humans and data exploration tools. For example, does it make more sense to treat human analysts as goal-based agents in pursuit of a known objective or as learning-based agents that aim to learn from their environment but not necessarily achieve a particular objective? In each case, a supportive data exploration tool should behave very differently, yet these questions remain underexplored in data science. We look forward to investigating these challenges further in our future research.

References

1. Battle, L. and Heer, J. Characterizing exploratory visual analysis: A literature review and evaluation of analytic provenance in tableau. Computer Graphics Forum 38, 3 (2019), 145–159.

2. Battle, L., Crouser, R.J., Nakeshimana, A., Montoly, A., Chang, R., and Stonebraker, M. The role of latency and task complexity in predicting visual search behavior. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 1246–1255.

3. Battle, L. et al. Database benchmarking for supporting real-time interactive querying of large data. Proc. of the 2020 ACM SIGMOD International Conference on Management of Data. ACM, New York, 2020, 1571–1587.

4. Gray, J. Benchmark Handbook: For Database and Transaction Processing Systems. Morgan Kaufmann, 1992.

5. Lam, H., Bertini, E., Isenberg, P., Plaisant, C., and Carpendale, S. Empirical studies in information visualization: Seven scenarios. IEEE Trans. on Visualization and Computer Graphics 18, 9 (2011), 1520–1536.

6. Battle, L. and Ottley, A. Testing theories of task in visual analytics. ACM Interactions 29, 3 (2022), 22–23.

Author

Leilani Battle is an assistant professor in the Allen School at the University of Washington. Her research focus is on developing interactive data-intensive systems that aid analysts in performing complex data exploration and analysis. She holds an M.S. and a Ph.D. in computer science from MIT and a B.S. in computer engineering from UW. [email protected]

ACM Interactions

Columns

Behavior-driven testing of big data exploration tools

Post Comment

View This Article

Reader Tools

Browse This Issue

SIGN IN