Throwing Spaghetti against the Wall: Why Technology Leaders Need to Invest More in HCI and UX

Authors:
Elizabeth F. Churchill

At the time of writing this column, the world of search and information seeking is in a furor about OpenAI's ChatGPT, and Microsoft's adoption of its capabilities to "disrupt" the paradigm of online search as we know it. Google's response was to announce the release of Bard, which has similar capabilities.

Generative AI is the talk of the town, and these tools in particular are at the center of the conversation, as well as being the conversational agents du jour.

Secretly, I'm hoping that the business "disruption" conversations will all be behind us by the time this column is published. I am more interested in constructive conversations about how to ensure we create positive and meaningful experiences for people.

ChatGPT and Bard use large language models (LLMs) to generate answers to questions, purportedly outperforming our (human) ability to do traditional search tasks with our favorite search engines. In theory, they can produce prose answers to questions that rival any A-grade student essay. To some, this capability is a shocking disruption in the field of traditional online search. I have certainly been having fun. Here was ChatGPT's answer to my question about the topic of this issue of Interactions, "undoing data practices":

Question: What are queer, decolonial, more-than/other-than-human readings of dominant epistemic forms in data science?

Response: Queer, decolonial, and more-than/other-than-human readings of dominant epistemic forms in data science are critical perspectives that challenge the dominant epistemologies and power structures in data science.

Queer readings of data science examine the ways in which dominant epistemic forms in data science are shaped by and reinforce heteronormative and cisnormative assumptions. This includes questioning the gender and sexuality biases in data collection and analysis, as well as the ways in which data science reproduces and amplifies existing power structures that marginalize queer individuals and communities.

Decolonial readings of data science examine how dominant epistemic forms in data science are shaped by and reinforce colonial structures and ways of thinking. This includes questioning the ways in which data science is used to surveil, control, and exploit marginalized communities, as well as the ways in which data is collected and analyzed in ways that perpetuate colonial power dynamics.

Overall, these critical perspectives offer important insights into the ways in which dominant epistemic forms in data science are shaped by and reinforce various forms of marginalization and oppression, and can help to identify ways in which data science can be used to challenge and transform these structures.

An interesting response; not bad at all. And yet, as I try more and more questions, I have noticed some things:

Accuracy varies. Sometimes the answers are wrong, which leads me to question the veracity and consistency of all the responses to my queries.
Concerningly, people take received summaries and answers to questions as "certain," in part because of the overblown belief in AI as all-knowing, as actually having high "intelligence" in the human sense of the word. Sadly, it has been reported that these tools project 100 percent certainty when they are actually correct 80 to 85 percent of the time—on a good day.
Responses never say anything novel or surprising; they are not thought provoking or innovative. They offer an excellent regurgitation of what I could have looked up (aka searched for) myself.
Using these tools is pedagogically problematic. In the act of researching a topic, of looking something up and summarizing what I learned, I would have cognitively engaged with relevant materials, reviewed the debates and the sides of any particular "argument," and had to have come up with an opinion. That would have been an engaged learning experience. Using these tools forgoes that deep engagement with a topic.

Though certainly useful and a lot of fun, for me these tools currently have issues similar to:

Spelling autocorrect, which I find to be a useful tool, although one that consistently changes everything from my British English spelling to American English spelling, favourite to favorite being one good example.
Email suggestion, which when trained on an undeclared dataset suggested that my email should start with "Hello, Darling" when writing to a senior vice president at my company.
"Smart" product recommendations, which invite me to buy myriad things for which I have no need just because I once bought someone a gift.

The discussion around these tools renders visible a confusion between information, knowledge, domain literacy, and conversational exchange and debate. Unsurprisingly, after a lot of hype and overblown excitement, the inevitable happened: The conversational agents started getting things wrong. Asked "What new discoveries from the James Webb Space Telescope can I tell my 9-year-old about?" Google's demo of Bard offered three bullet points in return, including one that states that the telescope "took the very first pictures of a planet outside of our own solar system." Astronomers were swift to point out that this is factually inaccurate—the first image was taken in 2004, 14 years before the James Webb Space Telescope was launched. The agents also started becoming rude and creepy. Microsoft's Bing chatbot very quickly became "unhinged," generating rude, angry statements, and generally behaving like our very human Internet trolls. In one Verge article, journalists reported a number of incidents where Bing was insulting, lied, called its human interlocutors names, and generated emotionally manipulative discourse.

What surprises me most is why we keep being surprised.

Microsoft's Tay chatbot was trained on Twitter data. Only a few hours after it was launched, in 2016, it began posting inflammatory and offensive tweets through its Twitter account. Microsoft shut down the service, blaming trolls on Twitter for feeding it offensive content, which it learned and then regurgitated. Yes, another example of the training dataset problem. It appears that trolls and hackers are already training our new crop of LLM-backed conversational agents to spread socially incendiary messages and/or figure out how to hack into systems. In the hands of nefarious actors, these tools have already been identified to be potential security threats [1].

What can we do as HCI and UX practitioners and scholars to have more influence, to be more involved? I hear from friends that HCI and UX skill sets are routinely missing from the exuberant techno-centric teams who produce these tools. It is clear that there is not enough testing in what has been dubbed the "AI arms race," and that companies are, as usual, more involved in getting their shiny new toys launched than in thinking deeply how to serve people.

You may have noticed that the ChatGPT answer I posted earlier did not include an answer to the question of "non-human and other-than-human." That was because I saved it for last. Here is ChatGPT's answer:

More-than/other-than-human readings of data science examine the ways in which dominant epistemic forms in data science are shaped by and reinforce anthropocentric assumptions, as well as the ways in which data science can be used to understand and support non-human entities and ecologies. This includes questioning the ways in which data science reduces complex systems to discrete variables and ignores the interdependence of human and non-human actors, as well as exploring the ways in which data science can be used to support more-than-human forms of knowledge and ways of being.

I highlight that one phrase intentionally. It is clear the data scientists and company leaders don't take the advice of the summaries their own models create. Or more likely, they don't ask the right questions to understand what their own models spit back at them.

More imagining the consequences of these technologies is needed. Better understanding of the datasets that are the training substrate is needed. More transparency and reflectiveness is needed. This should not be news to anyone.

I want to emphasize that I am excited about these emerging capabilities. I'm certainly excited that we are actively reflecting on human information search and retrieval needs and activities, and what can change with the help of powerful tools. And that the incumbents are getting an energizing shake-up. However, the negative and assumedly unintended consequences of LLM-based "conversational" tools were entirely predictable.

When will technology leaders stop repeatedly throwing technological spaghetti against the wall just to see what sticks? When will they hire and/or take seriously the advice of professionals whose training and interests focus on the social and societal consequences of these tools rather than fetishizing technologies, focusing on business "me-first-ism," and releasing poorly implemented, half-baked implementations?

Let's hire and take seriously the advice of people who know how to ask, What if?—people who care about the broader and longer term impact of tools like ChatGPT and Bard. Let's get the experienced cooks in the kitchen—cooks who know how to assess whether the spaghetti is cooked without flinging it at a wall.

References

1. Europol. The criminal use of ChatGPT — a cautionary tale about large language models. Mar. 27, 2023; https://www.europol.europa.eu/media-press/newsroom/news/criminal-use-of-chatgpt-cautionary-tale-about-large-language-models

Author

Originally from the U.K., Elizabeth F. Churchill has been leading corporate research at top U.S. companies for more than 20 years. Her research interests include designer and developer experiences, distributed collaboration, and ubiquitous/embedded computing applications. [email protected]

ACM Interactions

Columns

Throwing Spaghetti against the Wall: Why Technology Leaders Need to Invest More in HCI and UX

Post Comment

View This Article

Reader Tools

Browse This Issue

SIGN IN