AI is incorporated into recommendation services, driverless vehicles, surveillance and security, social media, and many other types of systems. Yet people who interact with such systems often do not understand what the AI does or how it works. This lack of transparency can create confusion, frustration, and mistrust. And indeed, specific socially untoward consequences of algorithmic interactions have been widely identified.
→ Explanatory AI is an important approach to make it possible for humans to trust AI.
→ Explanations are diverse; logical explanations are both more and less than humans usually need.
→ Pragmatic explanation is a well-developed and widely engaged human practice.
→ We should start designing explanatory AI now.
A popular way to think about evoking or enhancing trust between humans and AI is "explanatory AI," the notion that AI should be able to explain itself, and thereby to help people understand what it is doing and how. This concept is similar to traditional conceptions of design rationale, but it is more dynamic and episodic: The rationale or explanation is to be generated in the course of system operation or interaction; it is intended to explain a specific (recent) episode of system operation and sharing this with human participants is incorporated into the real-time human-computer interaction cycle.
In this article, I contrast logical and pragmatic notions of explanation and rationale. Explanation, understood logically, is both more and less than people want, need, and can make use of. An explanation of system events that is complete and accurate, but not useful or usable by the people affected in a real situation, cannot ameliorate the confusion, frustration, and mistrust entrained by a lack of transparency in AI interactions. Indeed, a purported explanation that fails to be useful or usable to affected people could be experienced by those people as disingenuous or deceptive; it could cause additional confusion, frustration, and mistrust. Pragmatic explanations are causal accounts that afford and effectively support improved understanding and interaction by humans.
Rather than asking merely how AI can provide logically correct explanations, we need to ask how AI can be engaging and transparent to people. Human-technology interactions that are engaging and transparent are the original and abiding objectives of human-centered informatics. These objectives apply to AI today, as they have always applied to all information technology.
Explanatory AI explains itself; it is capable of presenting a rationale for and an account of why it does what it does. However, the term explanation in this context is problematic in that the criteria for something being an explanation are both varied and complex. Worse than this, the criteria for something being an explanation are inescapably subjective and somewhat open ended, when the ultimate criterion is that humans recognize the explanation as relevant, sound, cooperative, and fair, and attribute responsibility and increased trust to the source of the explanation.
The criteria for an account being explanatory are varied and complex because many different underlying system events are causal antecedents for anything a system does. This is analogous to natural systems for which events and explanatory relationships can be identified at various scales. Information system descriptions have traditionally relied on modularity and levels of abstraction for system design, implementation, and maintenance to manage such complexity. However, in any specific case, the relevant description required to explain system behavior could require analysis of multiple modules and/or levels of abstraction.
When we talk about explanatory AI, we need always to ask, "Explanatory for whom and in what circumstances?"
The more challenging issue is that explanations in human-computer systems are necessarily subjective and open ended. A satisfactory explanation of system behavior for a given person in a given activity context might not be satisfactory for a different person, or even for that same person in a different context. As an example, consider recommendations; a system could recommend a course of action and explain the recommendation by describing how following the recommended course of action will satisfy a declared or inferred user goal. Such an explanation would describe a series of events that would occur if the user followed the recommended advice. Alternatively, the system could describe the reasoning process through which a recommended course of action was identified by the system. This second sort of explanation would describe a series of system events that led to a particular recommendation being offered. Both of these are explanatory, but in different ways. Human participants might disagree as to whether one or the other was actually explanatory in a given context.
The subjectivity of explanations is particularly critical when one compares the needs and expectations of technical experts with those of application domain experts, two important and often distinct categories of AI users. For a technical expert, a useful explanation of system behavior might often be articulated as a chain of system events at a key level of abstraction. But for an application domain expert, a useful explanation would need to be articulated with domain semantics; the system events alone would usually not be understood as a satisfactory explanation. For example, explaining why a social media "influencer" currently is having less influence is a very different context for a system developer, a technical expert concerned with refining and implementing platform policy to keep users engaged, and for the influencer, an application domain expert whose influence is fading. The influencer might not understand the algorithmic basis for influence or user engagement, but nonetheless could suspect the platform is doing something to undermine their influence. In a case like this, two different kinds of expertise are making two very different kinds of sense of the same events. When we talk about explanatory AI, we need always to ask, "Explanatory for whom and in what circumstances?"
The human-centeredness and context-centeredness of explanatory AI is analogous to challenges and distinctions that computing has faced in the past. Consider the history of technical documentation. In the early decades of computing, documentation was a key to developing effective software engineering practices. Documentation made it possible to manage software development so that the system initially described was the system ultimately produced, and subsequently maintained. But even at this early stage, different software engineering stages (requirements development, design, implementation, testing, and so forth) needed distinct documentation. There was never a time when "the documentation" actually referred to a single and comprehensive description of a system. Indeed, when computers started to become pervasive in society in the 1970s, the multifarious nature of documentation became stark: Humans and their organizations needed documentation that directly articulated systems with respect to the human activities supported by the systems. For such documentation, most system events are not even described.
A simple and direct approach toward making AI more effectively explanatory is to put focus on explaining AI to domain experts (human users of AI systems) in the interaction contexts that those experts actually encounter and actually value, and to use terms and concepts from the domain semantics in articulating system explanations. With respect to the state of the art, this amounts to a strategic redirection, but it should be seen more modestly as merely providing explanations that have some reasonable chance of being useful to the people using AI systems.
This approach is quite analogous to an early approach to usability engineering in which attention was focused on trying to explain to users how a system works through simplified "user models" and to provide guidance to domain experts for how they could achieve domain goals using the system . Early investigations of "intelligent" explanation experimented with this design direction, employing user models to explain how a current usage context occurred, and what the user could do to move their goals forward . This approach represents a somewhat minimal approach to explanatory AI, in the sense that it does not entail fundamental redesign of what systems do or how they do it, but rather focuses on better explaining what is already designed to humans who are already trying to use it.
A stronger approach to making AI more effectively explanatory—and a more significant challenge for the goal of explanatory AI—is to design AI systems that directly afford effective explanations. This approach requires designing AI systems to make explanation more integrated and implicit in user interactions, and more intuitive with respect to human inferences about explanation. Put another way, having to explicitly manage explanation dialogue through user models to clarify system behavior that is experienced by people as misleading, confusing, or otherwise not transparent is already a clear indicator that explanatory AI has failed.
This stronger approach to making AI more effectively explanatory constitutes a redirection of fundamental AI priorities. As Peter Hancock  recently put it, powerful AI systems are a kind of automation that marshals speed, efficiency, accuracy, and reliability to "alleviate the excessive cognitive loads on… human controllers." It is "paradoxical"—Hancock's word—to fundamentally undermine or compromise the strengths of such systems "for some form of 'explanatory' value." This may indeed be a fundamental conflict in values. However, we might still choose to pursue the path of explainable AI if we reckon that compelling and effective explanations for domain experts are key to establishing transparency, accountability, and trust in human-AI interactions, and avoiding socially untoward consequences of algorithmic interactions.
The arguments for pursuing stronger approaches to effectively explanatory AI may not hinge critically, or perhaps at all, on objective, performance-oriented metrics like speed, efficiency, accuracy, and reliability. Explanation, and the willingness and ability to provide effective explanation, is a distinctive and robust social practice among humans that provides a foundation for sensemaking and problem-solving, attributions of interpersonal trust and mistrust, giving and taking of responsibility, and even a sense that one's experience is transparent and coherent .
Explanations are integral to how humans encounter the world, and how they talk about the world. People readily create and attribute explanatory models to phenomena they encounter in everyday life . Indeed, psychology and AI are filled with research demonstrations in which people seek to attribute sophisticated motives and intelligence to almost any animated behavior. Spontaneous explanatory models focus selectively on key contrasts and salient causal relationships in an experience . They can be problematic in some regards; they are notoriously inductive, incomplete, and even inconsistent [1,4]. But these models can also provide significant guidance for HCI designers who design specifically to leverage this characteristic of human cognition. A particularly influential example in the development of HCI is the graphical user interface, which explicitly leveraged a wide range of easily appropriable and concrete explanatory models for human interaction with computers in the late 1970s and early 1980s.
Besides being fundamental to human cognition, explanation for humans, and for what humans do with and expect from explanation, is much more diverse and nuanced than the classic sense of explanation in positivist science and philosophy. It is also more nuanced than the simple contrast illustrated earlier between explanations articulated on system events, and how the system produces and justifies its outputs, versus explanations articulated on a domain semantics of how humans could use the outputs of the system to better participate in real-world activities that mostly take place outside the system. For example, explanation for humans frequently involves informal and conversational argumentation : People interact about what the relevant claims are, the backing and warrants for these claims, and the qualifications and rebuttals for claims. This is a far more immersive and provisional sense of explanation. It frames explanation as a kind of ongoing social sensemaking activity that leverages and produces accountability and trust among participants. In this view, capacities for participating in rhetorical interactions are constitutive of being able to give effective explanatory accounts to human partners.
Explanations are integral to how humans encounter the world, and how they talk about the world.
Viewing explanation as typically conversational and social has specific implications for our ambitions of explanatory AI. For example, adequate conversation partners participate cooperatively (in the sense of ) in a precisely choreographed turn-taking protocol that requires building, maintaining, and using a theory of the interlocutor and of the conversational situation. This requires continually tracking what has been said, what the interlocutor might have inferred from what has been said, how the argumentation has developed, what the interlocutor understands as common ground and contested ground as the interaction proceeds, and what the interlocutor needs and expects from moment to moment. Of course, even human interlocutors sometimes fail to manage all this, but the point is that when they do fail, the conversation fails (albeit temporarily), impairing the interlocutor's ability to make sense of what is going on, necessitating repair interactions and limiting the extent to which the interlocutor can confidently attribute competency, transparency, accountability, and trust to the AI.
The observations and arguments above translate readily into concrete research directions. For example, explanability scenarios, defined as narrative representations of human-AI encounters in which a human domain expert and an AI system interact and make sense of their interactions, can be constructed and investigated right now. Scenario-based studies of explainable AI have addressed a range of questions about the types of models and interactions that can be effective for people in various AI contexts . Importantly, scenario-based design investigations can be carried out before we ever build systems capable of such interactions .
Explainable AI is a significant approach to achieving better AI for humans. Although explaining what AI is doing, how it is doing it, and why it is doing it that way will be quite difficult to achieve, it is probably also both too much and too little for most humans. Humans operating as domain experts do not need or want more system information; they do not want technical design rationale and debugging explanations. At the same time, humans probably will not trust or be engaged by apparently intelligent systems that will not or cannot recognize the interaction domain context in which they are embedded or effectively participate in explanatory interactions.
I framed this discussion through a consideration of the basic-level question of why and how humans can trust AI. This question is fundamental and challenging, but it is the right sort of question to ask. My primary concern is that trying to answer the question entirely through approaches that seek to have AI systems explain what they do and how they work in a formal and technical sense only is doomed to fail and to not provide bases for people to trust AI. I do not think this is setting the bar high or low; it's more a matter of focusing on the actual critical goals.
Viewing explanation as fundamentally pragmatic, conversational, and social entails that it is never just a matter of having an explicit technical account. Instead, it is always a matter of sensemaking, awareness, and negotiation in a very broad sense, and of the responsibility humans accept and demonstrate toward one another as they participate in everyday interactions. As an objective for explainable AI, this is ambitious, but if we do not recognize what the challenge is, we surely will never make it to the solution.
The field of human-computer interaction originally coalesced around the concept of usability in the early 1980s, but not because the concept was already defined clearly, or could be predictably achieved in system design. It was instead because the emerging concept of usability evoked a huge amount of productive inquiry into the nature and consequences of usability, fundamentally changing how technology developers, users, and others thought about what using a computer could and should be. Suppose that AI technologies were reconceptualized as properly including the capability to effectively explain what they are doing, how they are doing it, and what courses of action they are considering. Effective, in this context, would mean codifying and reporting on plans and actions in a way that is intelligible, relevant, and intriguing to humans. The standard would not be a superficial Turing-style simulacrum, but rather a depth-oriented investigation of human-computer interaction to fundamentally advance our understanding of accountability and transparency. We have already seen, in the history of HCI, how such a program of inquiry can transform computing.
This article is based on a closing keynote talk from InCITe 2022, the 2nd International Conference on Information Technology (Amity University, Uttar Pradesh, India, March 4, 2022).
3. Hancock, P.A. Avoiding adverse autonomous agent actions. Human-Computer Interaction 37, 3 (2022), 211–236; https://www.tandfonline.com/doi/full/10.1080/07370024.2021.1970556
John M. Carroll is a distinguished professor of information sciences and technology at Pennsylvania State University. His research is in human-centered design of technology, especially the transformative possibilities and risks entrained. Carroll is an ACM Fellow and received the Rigo Award and the CHI Lifetime Achievement Award from ACM. [email protected]
©2022 ACM 1072-5520/22/07 $15.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.