XXIII.6 November-December 2016
Page: 22
Digital Citation

In the long tail: An animistic view on conversational interfaces

Sorin Pintilie

back to top 

A bot is a robot without a body, intelligence divorced of physical form. And, Cartesian innuendos aside, the only things we know to fit that description are spirits. That's why it's so hard to talk about them. The meaning of the word simply won't stay still.

A bot is an abstraction of the thing it evolved from, a human construct with non-human characteristics. This makes communicating with them particularly challenging. How do you talk to these things? Hard question because it's new for us, but for others, talking to things is a way of life.

Nayaka are forest dwellers who live in the Wynaad area of the Nilgiris Hills, in South India.

Like similar Amazonian groups, they have this notion of devaru, which loosely translates to "superpersons" or "persons with extra powers," and it applies to all beings, human and non-human alike. Spirits, if you will.

Our bots, their devaru are fertile ground for exploring fictional alternatives to conversational interfaces. The more computed our environment becomes, the more it boomerangs us right back to this kind of primal belief system, an animistic worldview where everything has a spirit [1], rocks and lions and men.

back to top  The Problem

Jef Raskin, the father of the Macintosh, said that common language makes computers seem friendly. And indeed, people do seem more tolerant when the system says something like "I can't do that" rather than "command not recognized." A postulate that spread so religiously in the design community that there's now only one, seemingly obvious answer: Make bots talk like people.

Just one problem, though. The embodied nature of language is in stark contrast with the disembodied nature of bots. People leave out movements, memories, and various experiences when talking to other people. The structure and orderliness of a conversation derives from the way in which it is enacted by humans, with a body, in a physical world. That's a notion of context that is richer than the abstracted, computable notion used by algorithms.

Bots can't even stay on topic yet. Due to current limitations of vector-space models, in most series of question-answer exchanges, or pairwise utterances, the context switches from one pair to the next. In other words, the more you speak to it, the more likely the bot will have to say, "I don't know what you mean."

Context awareness is a major, if not the biggest, problem with the current machine-learning-based agents. And the prevailing design workaround seems to be descriptive models, essentially humans explicitly telling the machine what the context is. The downside? If not done properly, it can make the bot seem condescending, uncanny, or downright dimwitted.

Even in best-case scenarios, it works for only very specific verticals like meeting schedulers or, in the case of general-purpose virtual assistants, just for the few things that users do most often—get directions, find a restaurant, communicate with friends.

In general terms, authoring guided dialogues is a mechanism for failing gracefully. It doesn't address discoverability, which is a bit problematic, since the Internet caters to the desires of an infinitely long tail of consumers with minority interests who trail behind (but ultimately exceed) the swollen head of the mainstream. As designers, then, our concern is not simply to support particular forms of conversation, but rather to support the evolution of conversation.

back to top  The Alternative

The plurality of shapes that these things can take is staggering. Peers, butlers, assistants, pets, aliens, all acting like a "magic well"—the more we learn about them, the deeper the mystery gets. So maybe it's time we stop trying to define the magic metamorph.

This is where the devaru parallel comes into play. Devaru is something made out of relationships [2], and it's used by the Nayaka to think about their environment. They see someone—or something—in terms of how it relates to someone—or something—else. For example, a common way to refer to someone is "Mathen who laughs a lot," Mathen being one of the few names the Nayaka use. We see bots much in the same way—bots that tweet, write, and predict anything from jokes to news and disasters; bots that search, shop, and recommend; and so much more. We, much like the Nayaka, describe them through socially defined relationships.

Bots are not about physical reality, but rather about availability for engagement. They don't have a personality; they adopt one, however, as soon as you interact with them.

Bots are relational properties of us interacting with machines. They don't have a stable form and it's relevant just to those particular settings. It can't be delineated in advance; its scope is defined dynamically.

That shifts the control of the interaction away from the designers onto the users, who themselves determine the meaning of the technologies they use through the ways in which they incorporate them into practice [3]. That's an open approach, in which users are active participants [4]. It means designing systems that adapt, rather than describe.

back to top  From Natural to Formal and Back

Relationships develop over time and so do conversations.

Human natural language is good up to a point, but there's a lot that exists out there for which human natural language doesn't yet have descriptions. Complex computational systems, for example, are essentially a non-human form of communication that turns out to be richer than traditional human communication.

When it comes to bots, we have these much higher bandwidth channels of communication available that need to be identified and made legible to empower users to use them. A natural-language-only approach obfuscates that.

Unlike a traditional GUI (graphical user interface), in a CUI (conversational user interface) any text is interpretable as code (formal language) or a user command (functional language).

Talking to bots is a mixture of these types of language: natural, functional, and formal. Users learn this mixture by interacting, by talking. It's a natural, kinesthetic kind of thinking that can be used to move between these different types of language. The secret is in those moments of transition.

back to top  The Multimodal World of Artificial Conversations

In traditional GUIs, modes are metaphors for ways of accessing the same function in different ways. You can print something by using a menu item, a toolbar button, or a hotkey. A simple metaphor that accommodates and justifies the transition from beginner to expert.

Linguistic modals could potentially serve the same purpose. Redundancies built in between code, user commands, and their accompanying natural language can serve as mechanisms of navigation between different states. With any sentence, bots look for keywords, structured data, or any other actionable elements and strip away the rest. Why not make that visible to users and allow them to mix and match? Tweak the keywords, add filters or parameters—issue commands, basically.

The beauty of a command-line type of interface is that it makes it easy for programs to dribble out little comments, warnings, and messages here and there. Modals, essentially. Bits of natural language that can recede—once enough trust is gained—into a supporting role, just to inform, create mental models, and give out helpful tips. This would switch the operative language from natural to functional. Once that's learned, the same principles can be used to turn it into formal language. Gradually unpacking the different layers of abstractions is a powerful way of accommodating the evolution of the conversation, switching between different modes and opening up the architecture to show the context. Also, Stephen Wolfram thinks that can work [5].

It's important to have accurate and honest metaphors. Space, for example, is what allows for windows to have clear borders and content areas that afford interaction. In CUIs, time is the glue that holds everything together. We'll need frameworks and tools that will help us visualize, map, and manage linguistic interactions over time.


As we transition to more robust and mature conversational frameworks, maybe we can start talking more about the seamful [6] integration of both human and computer language, what affordances time creates, and how we can make those affordances legible. The big challenge here is that, as opposed to physical things, we don't have a form that describes how the thing works anymore.

Just words over time.

back to top  References

1. Jones, M. The Demon-Haunted World. Webstock, 2009.

2. Bird-David, N. 'Animism' revisited: Personhood, environment, and relational epistemology. Current Anthropology 40, S1. Special Issue: Culture—A Second Chance? (Feb. 1999), S67–S91.

3. Wenger, E. Communities of Practice: Learning, Meaning, and Identity. Cambridge Univ. Press, Cambridge, UK, 1998.

4. Slavin, K. Design as participation. MIT Media Lab Journal of Design, 2016.

5. Wolfram, S. Programming with natural language is actually going to work. Blog, 2010;

6. Arnall, T. No to NoUI. 2013;

back to top  Author

Sorin Pintilie is a designer with a cross-disciplinary curiosity for what the world has to offer. He is interested in words and the technology behind them. He works where the social Web meets the semantic Web.

back to top 

Copyright held by author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.

Post Comment

No Comments Found