Columns

XXII.3 May - June 2015
Page: 22
Digital Citation

Patchwork living, rubber duck debugging, and the chaos monkey


Authors:
Elizabeth Churchill

“Patchwork living” is a good description of my life with technology.

One of the meanings of the word patch is to repair a tear, a breach, in a material. By extension, “patching” software or firmware means to fix something that is damaged, torn, or no longer functioning as it should. A software security patch, for example, repairs a new (or newly discovered) breach in the security capabilities of a device, service, or application.

This kind of repair work defines my patchwork-living lifestyle. I spend a lot of time being interrupted by, and then managing, the bleeping, jumping, jiggling mess of alerts that inform me of patches in need of installing, software updates desperate to be indulged, and licenses needing to be revisited….

Yesterday is a good example: An update to the operating system on my computer was strongly recommended—alert, alert, alert. I checked that everything was saved, closed documents and applications, and clicked Install. Then I waited and waited, and finally restarted my machine. Forty minutes later, I had apparently emerged from the techno-grooming tunnel. Success.

But wait…

Why did the software that I used just last week, the one I need to accomplish this morning’s task, no longer work? Why does my music no longer stream to my speakers? Why can I no longer see the media server from my computer?

Taking slow, logical steps like a good systems administrator, I sought the root cause. I crawled around on the floor plugging and unplugging devices, rebooting, restarting, and resetting. I set permissions and reconnected devices that had just hours before been happily communicating. I engaged in “rubber duck debugging”—talking out loud, stepping through settings and interconnections, unpacking the ways in which I have patched my technologies together. The search for the eureka moment, when the “thing that broke” would reveal itself, consumed all my attention. Two hours evaporated.

Field research on everyday technology management reveals I am not the only one encountering such issues. This is both a relief (who likes being alone in their struggles?) and irritating. It is indicative of an industry that is, itself, in need of repair, of “patching.” I fear for the future. My collection of notionally connected devices and services is modest compared with the predicted Internet of Things world that will have embedded, connected, computational systems and subsystems woven into the entire fabric of our lives. If we continue on this trajectory—ever more aggressive marketing and the proliferation of intertwined, intermingled services, devices, and applications coupled with a standards process that can’t keep up with the pace of technology development or with the driving force of business—we will be in even more trouble. User experience so far belies the marketing rhetoric of a seamlessly and securely supported life with assemblies of harmoniously interconnected devices and services. Rather, many user experiences suggest this could be the dawn of the Internet of Partially Connected Frustration or the Internet of Insecure Curmudgeonly Connections [1].

What can we in HCI, UX, and design engineering do?

First, let’s stop being myopic, imagining bubbled and bounded designs. Little of what we design is stand-alone. As HCI researchers and practitioners, everything we design is part of a bigger, sprawling networked system—in fact an unwieldy, unruly, and unpredictable set of complex systems. Certainly anything connected to the Internet is part of a very complex system that includes myriad devices, software, routers, and services. Complex systems are open, not closed. Their boundaries are fuzzy; one can’t tell where they start and where they end. The edges of our devices, our services, and our applications are blurry, part of a dynamic, shape-shifting world. We can predict neither who our users will be nor what use they will make of what we create. We cannot predict what computational agents and services will be interacted with, hosted, and/or destroyed as unplanned connections are engaged.

As we move deeper and deeper into the era of the Internet of Things, to borrow Sidney Dekkar’s phrase, we are “drifting into fragility.” Admittedly, Dekkar’s work focuses on complex, safety-critical systems; consumer devices are not typically conceived and prototyped and marketed as safety-critical systems. Rather, they are considered to be for communication and social media consumption and participation, for discretionary personal tasks and information management, for entertainment—which reflects an out-of-date conception. We know that personal devices are increasingly woven into the fabric of coordinated, collaborative, and collective work activities and patterns. They are increasingly becoming critical.

ins01.gif

Nassim Taleb claims there are three kinds of complex system: fragile, robust, and anti-fragile [2]. Fragile systems are not built to withstand volatility. Robust systems require careful predictive models of likely failures and have redundancies built in. “Anti-fragile” systems like deviance and volatility; they are built to create, test, debug, and grow from disorder. Taleb writes, “It is far easier to figure out if something is fragile, not easy to predict the occurrence of an event that may harm it. Fragility can be measured; risk is not measurable.”

But risk can be understood and tested, and anti-fragile systems can be implemented. In the world of critical systems, connecting pieces of the system is not the breakthrough—keeping them running and avoiding failure at critical moments is. For that, one needs transparent, interrogable, reflective systems that allow the user to easily understand what is going on and patch and repair as needed—before time is running out for the critical deadline, or when the connection to the Internet is sketchy. A good example is Netflix’s server systems that are designed for intentional breakage, a test and retest model [3]. Their Chaos Monkey “wreaks havoc like a wild and armed monkey set loose in a data center,” working “on the principle that the best way to avoid major failures is to fail constantly.” Unexpected failures always happen at the worst times, so Chaos Monkey enables simulated failures when they can be monitored and repaired. Chaos Monkey is the crash-test dummy of Web services, checking for abnormal conditions, configurations, and security issues.

So first, let’s stop being self-servingly clumsy in assuming we are designing encapsulated services; let’s ask responsible questions about what lies beyond the borders of what we are implementing, what the possible sources of failure may be, and what our users will need to know to do effective repair. Second, let’s think about designing software agents like the Chaos Monkey that exist to test our interconnections and reveal to us any patches that may be needed before things go wrong. And third, let’s build better debugging tools for people who don’t have time and/or any interest in becoming versant in the vagaries of complex computational systems. Consumers are increasingly the system administrators of their own complex ecosystems; it would behoove us to learn from system administrators how they manage complex systems and what tools they have at their disposal.

The grand challenge for the Internet of Things era is not going to be how to get computation into everything. Rather, it is going to be how to build anti-fragile systems for the technological sedimentary layers of an everyday life.

References

1. The last time I wrote about the Internet of Things was in 2009 in the March+April issue of Interactions. I see an uptick in hyperbole and excitement, and a huge amount of technical and design work before the dreams shown in the vision videos will come to pass. Perhaps happily, the term Internet of Things has peaked in terms of Gartner’s “hype cycle.” It is now thankfully entering the “trough of disillusionment,” which hopefully will mean some deeper research will be done. Security issues are being discussed, as are interoperability standards: Gartner’s Hung LeHong wrote a while back, “Standardization (data standards, wireless protocols, technologies) is still a challenge to more-rapid adoption of the IoT.”

2. Taleb, N. Antifragile: Things That Gain from Disorder. Random House, New York, 2012.

3. https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey

Author

Elizabeth Churchill has been working in the area of HCI for over 25 years. Originally from the U.K., she has been working in corporate research in the U.S. for 18 years, focusing on social media, mediated communication, and ubiquitous and embedded computing applications. churchill@acm.org

Copyright held by author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.

Post Comment


No Comments Found