Jean Scholtz, Oriana Love, William Pike, Joseph Bruce, Dee Kim, Arthur McBain
By now almost everyone is familiar with the use of user-centered design (UCD) techniques to develop software products, but less common is the idea of using UCD techniques to shape and guide the directions of an academic research project. The SuperIdentity (SID) project is aimed at helping workers engaged in identification tasks. Research of this type rarely includes UCD work, as no specific end product is anticipated and, therefore, no user interaction. However, SID researchers decided that an understanding of potential end users would be beneficial in focusing their research efforts. They agreed that an early activity should be to study some people whose work involved identifying individuals to determine how they did this, what information they started with, their end goal, the resources they used, and any problems they encountered. This knowledge would enable the project to focus the research more specifically to support the issues faced by end users currently engaged in identification tasks. Here, we describe how this UCD input is helping to shape the SID research work even beyond expectations.
More specifically, the SID research project is an identity attribution and enrichment project that uses attributes from the biographical, biometric, cyber, and psychological domains to produce a fuller, more accurate "identity" of an individual. For example, knowledge of an individual's online activities can provide insight into personality traits through analysis of online content, online behaviors, and even selection of avatars.
The project is developing a model of identity attributes in these domains to support individuals in finding paths among and between these domains to provide more robust identifications of individuals. UCD work is helping project researchers understand how law enforcement, cyber security investigators, and intelligence analysts currently work and how they view the utility of the SID research project.
The SID research project is a collaboration between six universities in the U.K. (Bath, Dundee, Kent, Leicester, Oxford, and Southampton) and the Pacific Northwest National Laboratory (PNNL). It originated in a workshop convened by the U.K.'s Engineering and Physical Sciences Research Council (EPSRC). The purpose of the project is to provide intelligence and law-enforcement services with a greatly enhanced ability to identify, and attribute information to, individuals and groups in both real and cyber domains. SID deviates from existing approaches in that the work incorporates contributions from an expansive spectrum of scientific domains, including biometric, psychological, behavioral, and online indicators of identity, enabling a broader set of identity measures to be considered than ever before.
The SID project offers an innovative and exciting new approach to the concept of identity. The assumption is that while there may be many dimensions to an identity—some more stable than others—all should ultimately refer back to a single core identity, or SuperIdentity. SID takes this approach further than any existing work by including static and behavioral measures from both the real world and the cyber world. The obvious consequence is that identification is improved by the combination of measures.
SID provides two capabilities that are entirely unique. First, the project offers an identity framework through which associations can be derived between different identity measures. The value of these associations is that known pieces of information may then be used to predict previously unknown pieces of information. Second, the project offers the capacity to quantify the certainty associated with an identification decision. This enables the end user to have a level of confidence (or risk) in their decision, and to make a judgment as to whether additional information is required. The objectives of the project are:
- to combine identity measures across real and cyber domains to inform identification decisions in the face of partial and changing knowledge and uncertainty;
- to uncover hidden data and relationships between data that can contribute to informed decisions about identity; and
- to quantify the certainty of an identification by quantifying the reliability of each contributing measure.
The research to achieve these objectives is broken into three parts, each with several tasks. The first part is to create the factors of interest in identification and to create the SuperIdentity model. Tasks include reviewing identity aspects in the real world and cyber domains, both slow-changing and behavioral; creating a generic model of these aspects and their relationships; and determining the social, legal, and ethical acceptability of the use of these identity aspects in both the U.K. and the U.S.
UCD work is helping project researchers understand how intelligence analysts currently work.
The second part is to establish salience and certainty across contexts. The tasks here are to create baselines and variances for different aspects of identification and to determine the accuracy of human and computer identification of real-world and cyber factors of behavior.
The third part of the project is to explore patterns and templates within SID. To accomplish this, use cases have been collected and are currently being used in the development of a visualization tool for exploring the model. The visualizations, along with SID data, will be used later this year in evaluating the utility of the SID project. Eventually, customized tools may be developed for specific groups of end users who need to identify individuals for transactions, law enforcement, or other security-related uses.
Our work at PNNL is focused on the third part of the project, exploration of patterns and templates in SID. We started work early in the project to collect use cases from envisioned end users, primarily intelligence analysts and law enforcement officers. We conducted 19 unclassified interviews with participants to understand their work in identifying individuals.
We asked some basic questions such as:
- What is your job title? Your responsibilities?
- How long you have worked in this role?
- On a normal day, what types of tasks do you do?
- What percentage of these tasks involves some sort of identification of individuals?
- Who are the other people you work with? In what capacity?
- How many people work in a role similar to yours?
- What types of technology are used in your work?
- How is success measured? By you? By the organization?
- What is the output of your work?
Then we asked them to tell us in more depth about tasks that involve identifying someone. Things we probed for included:
- What types of data are available for use?
- What is the certainty needed in their identification?
- What features of an individual do you usually have to start with?
- What features are the most helpful in making identifications?
- What features do you always search for?
- What features are the easiest to obtain?
- What features are the most difficult to obtain?
- What is the timeframe in which you must identify someone?
- Have you ever been unsuccessful identifying an individual? If so, why?
Next we gave participants a questionnaire that listed a number of attributes and asked them to rank the categories that are most important to them for their identification needs. Here is a list of categories and some examples in each:
- demographics (age, gender, addresses, work history, family and friends' names)
- financial information (salary, credit reports)
- court records (arrests, gun registrations, outstanding warrants)
- physical attributes (tattoos, height, weight, fingerprints, birthmarks)
- cyber attributes (IP address, email addresses, social network usernames)
- official documents (driver's license, passport, official ID)
- other (publications, toll-road records, surveillance videos)
We also asked them to rank on a seven-point Likert scale to what degree the following were important to them in identification of an individual:
- confidence/provenance of the accuracy of the intelligence gathered
- speed of access to datasets and analysis
- robustness of evidence over time
- completeness/richness of identity
We then created a description of each individual's identification task, using known attributes and unknown but desired attributes along with a certainty that needed to be achieved. In total, we created 20 descriptive use cases. Figure 2 shows the format we used for documenting these.
A number of interviews were also conducted in the U.K. to determine any differences between the U.S. and U.K. in the type of identification tasks. We then created several generic use cases based on frequent knowns and desired unknowns that could be found using attributes from both the real world and the cyber world. Figure 3 contains a description of one generic use case.
By considering these use cases in the context of the SID model, some useful identity features were discovered, such as determining an individual's ideology as expressed in writing samples.
The visualization of the model is currently under development at PNNL. The visualization has a number of uses. First, it helps the SID researchers understand exactly what is already in the model and where connections between domains are scarce. Second, the visualization is a great way to demonstrate the SID work to the project stakeholders. The next steps are to take the visualization to the communities that contributed to the generation of the use cases and investigate how the capabilities demonstrated in the SID project can impact their work. Their feedback will be used to adjust the models and visualization and later to design customized tools for various communities.
Currently the interaction visualization takes as input what identity attribute(s) the end user knows and the identity attribute(s) desired. Given this information, the user is shown a number of paths where the dots represent an attribute of a person and the lines between the dots represent a transformation. Transformations are either inferences that can be drawn with some degree of certainty from one attribute to another or facts that lead to another attribute. For example, knowing an individual's driver's license number, a law enforcement officer can use an official resource to obtain the individual's address. An inference can be made from the length of the hand to gender, as men tend to have longer hands. The domains (biometric, biographical, cyber, and psychological) in the visualization are shown as color-coded dots, and the number of dots represent the length of the path. The certainty of the desired attributes is represented by descriptors (very high, high, medium, low, very low) at the top of the path. Expanding a path gives the attributes revealed at each step. The user can click to obtain an explanation of how a given attribute is obtained if desired.
As an example, suppose an image in a tweet has generated a huge number of followers. The knowns are the image and the tweet avatar, and the desired unknown is the location of the image. Figure 4 shows the initial input to the model and the resulting paths that would help in obtaining this information. The paths are arranged by certainty. Figure 5 shows the expansion of one of the paths with very high certainty to see the identity attributes that are determined along the way.
Figure 6 shows an example of an explanation. In this case the explanation is why Twitter images can be used to derive the location. The explanation given is that many images are tagged with a geolocation, and this provides the latitude/longitude that can be used to generate a city name.
At this point the use cases have been extremely helpful in determining what transformations between domains are useful. Of course, many of these transformations already exist and can be added to the model immediately (fingerprint recognition, face recognition, etc.). Some transformations exist in private databases, such as obtaining owner names and addresses from license plate information. Many new transformations are being explored by SID researchers, such as swipe gestures on mobile phones. Some early visualizations of the transformations showed that there were few links between other domains and the psychological domains. SID research has been looking at different personality traits and how these show up in cyber behavior. Certainty levels for identification were flagged as a necessity in the use cases. Early visualizations of the model helped researchers identify the number of transformations between domains and focus their research accordingly. The visualization of the different paths for identification has been useful for the researchers to identify critical nodes and missing nodes. Additionally, the combination of a use case and the visualization of paths for identification have been extremely valuable to show the project stakeholders the progress of the work. The next step in PNNL's work is to conduct an evaluation with the communities who helped contribute to the use cases. Potential users will be shown an example use case and the path visualizations and asked to provide feedback.
To date, the UCD work has proven even more useful than anticipated in the SID research. Both the interviews and subsequent use case development have been helpful in focusing the research and the model development. The visualization work has been useful in illustrating the underlying model and therefore has also helped show deficiencies in the model, as well as the utility and progress of the research work. As the stakeholders have responded positively to the utility of the model, it is anticipated that the upcoming user evaluations will be successful as well.
We urge researchers to consider whether their projects would benefit from UCD work, primarily the development of use cases to guide the directions of the research and to aid in evaluation.
The work at PNNL is performed under funding from the Department of Homeland Security, Science and Technology (DHS S&T) Reference Number HSHQPM-11-X-00014. The U.K. universities of Southampton, Leicester, Oxford, Dundee, Bath, and Kent are funded under grants from the Engineering and Physical Sciences Research Council (EPSRC). We would like to thank our colleagues at Oxford for supplying the model used in the visualizations and for their constructive comments on visualization prototypes. We would also like to thank our colleagues at Bath and Oxford for their work on the use cases.
Jean Scholtz has been working in user-centered evaluation for 25 years. The technology domains she has applied user-centered evaluation to include video conferencing systems, robotics, information retrieval, massive data, and visual analytics environments. She has a Ph.D. in computer science from the University of Nebraska. firstname.lastname@example.org
Oriana Love is a user researcher and designer within the domains of identity, social media, multimedia analytics, and large graphs. She has filed 14 patents in the areas of collaboration, calendaring, identity management, UI, and RFID. She has a master's degree in computer science from the Georgia Institute of Technology. email@example.com
William Pike leads the Visual Analytics group at Pacific Northwest National Laboratory (http://vis.pnnl.gov) and PNNL's Analysis in Motion initiative (http://aim.pnnl.gov). His research interests include mixed initiative systems, visualization as an aid to knowledge capture, and active machine learning. He holds a Ph.D. in geography, with an emphasis on geographic information science, from Penn State. firstname.lastname@example.org
Joseph Bruce is a researcher in visual analytics and has been contributing to a variety of analytic tools for 10 years. He specializes in visualization in Web environments. Other areas of research include collaborative filtering and automated user assistants. He has a B.S. in computer science and in mathematics from George Fox University. email@example.com
Dee Kim is a multi-disciplinary designer with an emphasis in UX and visual design. She has worked in studio environments think tanks, non-profit organizations, and the government sector. She has an M.F.A. in media design from Art Center College of Design. firstname.lastname@example.org
Arthur McBain is involved in visual analytics-centered development of applications and interactive displays to aid in user-driven analysis of large sets of data. He has a B.Sc. in computer science from the University of Wisconsin at Eau Claire. email@example.com
SuperIdentity Annual Reports and links to publications: http://www.southampton.ac.uk/superidentity/reports/index.page
Hodges, D., Creese, S., and Goldsmith, M. A model for identity in the cyber and natural universes. Proc. of Intelligence and Security Informatics Conference (EISIC), 2012 European. IEEE, 2012, 115–122; DOI: 10.1109/EISIC.2012.43 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6298821&isnumber=6298809
Saxby S. and Knight, A.M. SuperIdentity framework. Proc. of 8th International Conference on Legal Security and Privacy Issues in IT Law (LSPI). 2013.
©2014 ACM 1072-5220/14/07 $15.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2014 ACM, Inc.