PeopleLens

Issue: XXVIII.3 May - June 2021
Page: 10
Digital Citation

Authors:
Cecily Morrison, Ed Cutrell, Martin Grayson, Geert Roumen, Rita Marques, Anja Thieme, Alex Taylor, Abigail Sellen

The PeopleLens is an open-ended AI system that offers people who are blind or who have low vision further resources to make sense of and engage with their immediate social surroundings. It has been used most recently by children who are blind in school settings, supporting their skills in proactively interacting with classmates.

The PeopleLens is an exploration in how we can design human-AI interaction that moves beyond discrete task support to provide a continuous stream of dynamic information. It was inspired by ethnographic research with Paralympic athletes and spectators, research that captured the diverse sense-making skills that people with low vision use to orientate to and interact with others. The PeopleLens was imagined as a way of amplifying the details in people's surroundings, enabling them to extend their existing skills and capabilities.

The Experience

Our current implementation of PeopleLens, being used by schoolchildren, has three key features:

Person-in-Front: This feature reads out the name of a person to whom a user orientates.
Orientation Guide: This feature provides additional sound cues, giving the user a better understanding of the system's detection of bodies or faces. These cues assist users in directing their body and head orientation, improving system accuracy. The orientation guide also conveys where a user's attention is directed to those nearby.
External Feedback: Finally, the head-mounted device has an external LED interface that indicates the system state to a communication partner.

TH, a young boy in school uniform, is pictured in a classroom. He is wearing the PeopleLens head-mounted device and looking to the right, across the classroom. An overlay with three illustrations has been added to the photo. One illustration is of a walking, female line-figure and the figure's face is outlined a line-drawn frame (suggesting face recognition). A second illustration is of a male standing figure looking towards the PeopleLens camera TH is wearing. Lines are drawn to emphasise this figure is looking towards the PeopleLens camera (suggesting gaze recognition). The third and final illustration is of a sitting female figure, reading a book. Lines are drawn through the figure emphasising her seated position (suggesting posture recognition).

Illustration of PeopleLens's primary recognition functionality. The system recognizes people, as well as their location, pose, and gaze direction. This is integrated into a continuously maintained 360-degree worldview regardless of who the user is facing. If the user looks at someone, a spatialized "bump" is played followed by a name if known, or a spatialized click if the person can't be identified. A spatialized sound is produced when someone looks at the user.

Supporting Reciprocal Interaction

The PeopleLens is not just a resource for the user but also a means of supporting reciprocal interaction with communication partners. In our trials, we found people adjusting their own body position in order to be identified. To supplement this, we developed external feedback on the device. A semicircular LED interface affixed to the top of the HoloLens provides communication partners with information about the state of the system. This assists the development of common ground and reflexive interpretation of behavior, giving users and communication partners additional cues to establish and maintain mutual attention.

TH, a young boy in school uniform and holding a white cane, is pictured in a classroom. He is facing towards the camera and wearing the head-mounted PeopleLens system.

TH using the PeopleLens system in a classroom. TH, who has used different versions of the device since 2018, has been integral in informing its current design.^*

Key to this design was to find ways in which users and communication partners can establish shared understandings with different sensory modalities. In the final design, a moving white light tracks the location of the nearest detected person, flashing green when that person is identified to the user. However, a number of visual-tactile interactions to support the user's understanding of the system information were also explored.

Extending Human Capabilities

The PeopleLens offers an example of human-AI interaction that seeks to expand people's capabilities. As a system, it is not designed to operate on behalf of its users, replacing what might be thought of as an absence of sight. Rather, interaction with the system is intended to extend abilities: helping a user to achieve existing goals, adding and enriching information already relied upon, and building new strategies on top of existing ones. Our perspective on this more complex and unfolding coupling of people and technology seeks to approach people's capacities in expansive terms, seeing them as always emerging through interwoven relations.

The PeopleLens offers an example of human-AI interaction that seeks to expand people's capabilities.

This concrete example demonstrates the potential opportunities that an AI experience can provide when it is designed to work in concert with human capabilities.

A screen shot of the PeopleLens' system view—built for developers and to support researchers studying the use of the system. There are four regions to the system view. the top left displays the front-facing, wide-angle camera input; the top-right the peripheral and central camera inputs (in grey-scale); the bottom left a 3D model of the computer generated "360-degree world"; and bottom right a simulated top-view tracking the user's view of the model world. In the views known people are labelled green, and unknown red.

The PeopleLens prototype uses a modified HoloLens in combination with five state-of-the-art computer vision algorithms to continuously identify and track people in space and capture their gaze direction. Tracking the 6DOF motion of the head-mounted device, this information can be presented to users via spatial audio. This system view shows the front-facing, wide-angle camera input (top left); peripheral and central camera inputs (grayscale, top right); model of the 360-degree world (bottom left); and a simulated top-view tracking the user's view onto the model world. Known people are labeled green, and unknown red.

* We acknowledge TH's considerable contribution to PeopleLens and refer to him (in abbreviated form) in this work with permission from him and his immediate family.

Authors

Cecily Morrison is a principal researcher at Microsoft Research, working at the intersection of human-computer interaction and artificial intelligence. She is interested in how our tools, models, and interfaces enable people to extend their own capabilities through using AI-infused systems. [email protected]

Ed Cutrell is a senior principal researcher at Microsoft Research, where he explores computing for disability, accessibility, and inclusive design in the MSR Ability group. Over the years, he has worked on a broad range of HCI topics, ranging from input tech to search interfaces to technology for global development. [email protected]

Martin Grayson is a principal research software design engineer at Microsoft Research who led the design and engineering of the PeopleLens. [email protected]

Geert Roumen is a maker and interaction designer who recently graduated from the Umeå Institute of Design. He focuses on bridging the digital and physical world with a hands-on design approach, creating early prototypes and doing research in a playful yet serious way, to bring the design and people together from the start.

Rita Faia Marques is an associate designer whose work focuses on designing responsible AI systems. [email protected]

Anja Thieme is a senior researcher in the Healthcare Intelligence group at Microsoft Research, designing and studying mental health technologies. [email protected]

Alex Taylor is a sociologist at the Centre for Human Centred Interaction Design, at City, University of London. With a fascination for the entanglements between social life and machines, his research ranges from empirical studies of technology in everyday life to speculative design interventions. [email protected]

Abigail Sellen is deputy director at Microsoft Research Cambridge, and has published on many different topics in HCI that put human aspiration front and center in designing new technology. A recent focus is on the intelligibility of AI systems, viewed through the lens of both HCI and philosophy. [email protected]

Sidebar

Consider this scenario where the user combines his own abilities with the system's features, and where others also work with the user and system: The user walks into a familiar classroom. He hears three bumps (percussive sounds representing body presence) at about 10 o'clock (forward left). He guesses that three people are standing at the interactive whiteboard working on a problem set with their backs to him. As he shifts his gaze to the right, he hears a quick succession of bumps, which he guesses are other children sitting at their desks reading with their heads down. As he moves to his right, he hears a bump followed by a name, Jane. The user clicks his tongue and listens for its echo. He guesses that Jane is standing next to the wall, perhaps at the classroom coat rack. However, the user really wants to tell his friend Oscar about the new Lego set he got over the weekend. He heads in the direction of Oscar's seat. As he gets closer, he hears woodblock sounds, which prompt him to look down. Looking at the external interface on the user's headset, Oscar can see that the system recognizes him, but he's to the right of center so he moves slightly to be properly detected. Oscar's name is then read out. The user surmises that Oscar must be sitting down. He grabs a chair and pulls out a Lego figure for Oscar to see.

Exhibit X

PeopleLens