Samir Raiyani, Janaki Kumar
Does multimodality increase the efficiency of warehouse workers? This is the question that our team at SAP sought to answer. SAP is a software company specializing in providing enterprise solutions, including a warehouse management system used by customers worldwide. The goal of our project was to enhance an existing warehouse-management software module with multimodal inputs and outputs, and study its effects on worker productivity.
What Is Multimodality?According to Wikipedia, "Multimodal Interaction provides the user with multiple modes of interfacing with a system beyond the traditional keyboard and mouse input/output . The most common such interface combines a visual modality (e.g., a display, keyboard, and mouse) with a voice modality (speech recognition for input, speech synthesis and recorded audio for output)."
In this project, the system had the following modalities:
- Screen display (output)
- Touch screen (input)
- Voice (input and output)
- Bar-code scanner (input)
Why Warehouse Workers? Driven by the increase in offshoring of manufacturing and the demand for consumer goods, warehouses have grown in size and complexity . At the same time, the volume of orders and the diversity of goods stored in each warehouse have also increased dramatically. These factors have increased the workload of the warehouse worker, especially the picker who is responsible for locating the various line items in each order and getting it ready for shipping.
Here is an overview of the typical business processes in a warehouse: Goods arrive at warehouses by truck. They are received at dock doors and taken away by forklift operators to the various storage containers or picking areas inside the warehouse. When a customer order is received, warehouse workers pick the line items of the order and place them into totes. These totes are taken to a staging area via a conveyer belt, packed and shipped to their final destination. An overview of this process is shown in Figure 1.
In this project, we focused primarily on the picking process. The picking process starts when the picker gets the order printed on a sheet of paper. The warehouse storage location (aisle/bin number) and the quantity ordered are noted next to each line item. Figure 2 shows a typical order sheet:
The picker walks to the bin, picks up the items and places them into a tote. She repeats this for all line items. Once she is done, she takes the tote to the staging area where it is packed and shipped.
In some warehouses, the picking process is entirely manual. In others, the barcode on each item is scanned using a barcode scanner. The barcode scanner, in turn, is connected to the warehouse-management software system over a wireless network.
Given the size of warehouses today and the complexity of each order, picking is the most labor-intensive operation in a warehouse and also the most error-prone. Companies are interested in reducing the error rate in picking, since the effort involved in processing returns and shipping a replacement increases the cost of order fulfillment dramatically . Companies are interested in investing in automation and looking for ways to increase the efficiency of the picking process.
Design. The user interface consisted of a handheld device (Intermec 700 PDA running an IBM multimodal Web browser) that had a visual display and wireless capability. This device had a built-in bar-code scanner and a headset for voice input/output. We provided users with a holster for the mobile device to keep their hands free for performing typical warehouse operations such as lifting boxes.
The goal of the screen design was to keep it simple and not overload the warehouse worker. Refer to Figure 3 for sample screen designs.
Our assumption was that the user would complete their task with voice input/output and without holding the device. At those times, the device would remain in the holster. The user would need to hold the device to refer to the display for clarification, or to use the bar-code scanner. To enable this, an external handle was attached to the device to make it easier to use as a scanner. Ideally, we would also have liked to make the device holster retractable, so that it would cling to the user's waist when not in use. However, we were not able to incorporate this into the design over the test period.
To increase speed of operation, the voice dialog had to be extremely short and precise. Also, the words spoken by the user had to be easy to recognize for the voice recognition system. For example the system recognized "Finished" better than "Done," and the dialog was changed accordingly.
Here is a sample voice dialog:
- System: Please go to Aisle 30, section 11.
- System: Level 04, bin 08.
- User: Ready.
- System: Pick 10 each.
- User: Finished.
The voice recognition system was chosen carefully after several trials. The system chosen had to be tolerant to noise from, for example, conveyer belts in a warehouse.
User Research. We visited several warehouses and observed and interviewed multiple users before developing our prototype. We also tested noise levels and measured the time taken for everyday tasks in the current settings.
We conducted our tests in three phases at Ditan Inc's video-game distribution center with warehouse pickers fulfilling real orders for customers. Prior to the introduction of our system, picking was a purely paper-driven process in this warehouse, with workers picking items after reading instructions on the picking sheet associated with each order.
We monitored the pick time for each user and interviewed them at the end of the picking exercise. Over the course of the pilot, we made improvements to the speed of the application, the ergonomics, and the training we provided each of the users. We used a speaker-independent voice recognition system for this pilotthis did not require any customizing of the system for a user's speech patterns. We used a different set of users for each of the three phases.
Phase I: Two relatively experienced warehouse pickers, both male. A five-minute introduction was provided to the users before the tests.
Phase II: Five relatively inexperienced workers, two of them female and three male. A five-minute introduction was provided to the users before the tests.
Phase III: Three relatively experienced workers, all male. Ten to 20 minutes of instructions were provided, which included adjustment of headsets and testing the system with the user, before start of the tests.
- Instructions: Users benefited with the instructions provided in Phase III, which helped increase the comfort level and satisfaction with the system. The instructions helped ease the users into the multimodal experience.
- Noise: Warehouse noise did not affect the voice recognition system and was not an issue for the pilot.
- Voice recognition: Voice recognition was very good for Phases I and III, but many problems were observed during Phase II. These problems can be attributed to users' accented English, as well as to insufficient instructions provided during Phase II.
- Speed: Our system was too slow for warehouse workers. During Phase I, each pick took 30 to 40 seconds. The number came down to an average of 24 seconds by Phase III, but still not close to the ten seconds average recorded by the paper process. Most of the delays were caused by the multimodal browser and the barcode scanner. These issues have since been resolved by the vendors of the multimodal browser technology.
- Ergonomics: We initially experienced several ergonomic issues, primarily related to the use of the barcode scanner. Once we attached a "trigger" handle to the bottom of the user device, the ergonomic problems were somewhat resolved.
- User Satisfaction: The user satisfaction improved considerably during Phase III, where the users were observed to have an enjoyable experience working with the system. Users commented on the "coolness" factor of the multimodal application.
- Error rates: Unfortunately, the sample size during our tests was not large enough for us to draw any conclusions about the reduction in error rates. Anecdotally, we observed frequent errors in the paper process, and none in the multimodal process.
- Modality preferences: Once the warehouse workers established a preference for a modality for a particular task, we observed that they used the same modality throughout the test. This preference was established early in their interaction with the system. For example, if users were able to use voice successfully for confirming the picks, they relied exclusively on this modality and ignored the screen for the confirmation task. This is consistent with research that shows that once a single modality is proven to be superior and sufficient for a task, users tend to use it exclusively, rather than switch between modalities .
During the course of our tests, the multimodal browser technology was relatively new and slow. Therefore we could not observe any efficiency improvements over a paper-driven process. However, recent improvements in the performance of multimodal browser technology have enabled the system to outperform a paper-based process, with lower error rates.
Conclusion.Since our tests, the outstanding issues listed above (speed, ergonomics) have been addressed. Today, this multimodal warehouse application is being delivered by the SAP partner Topsystem SystemHaus GmbH and is used by customers in several countries, and we are starting to get usage reports from live installations . Here is a quote from the Operations manager of HAZET-WERK, Hermann Zerver GmbH & Co. KG: "Since the implementation, we have increased our output volume for about 50 percent without additional staff or longer working hours."
So, does multimodality increase warehouse-worker efficiency? From our observation and anecdotal evidence so far, we believe it does. One day in the future, warehouses will be fully automated and robots will be used to pick orders. Until that day, offering multimodal-interactions capability to a warehouse picker not only increases their efficiency, but also makes their experience more enjoyable.
4. Sharon Oviatt, Antonella DeAngeli & Karen Kuhn: "Integration and Synchronization of Input Modes during Multimodal Human-Computer Interaction:" www.cse.ogi.edu/CHCC/Publications/Integration_Synchronization_Input_Modes_oviatt.pdf
Janaki Mythily Kumar
About the Authors
Samir Raiyani is a senior research scientist and director of the Digital Communities project at SAP Research in Palo Alto, California. Samir has worked on a variety of research projects that center around innovative RFID, sensor networking and mobile computing technologies. His areas of expertise are mobile middleware platforms and multimodal user interfaces. Prior to SAP, he was the founder and system architect at mobile healthcare startup iScribe Inc. Samir has a Masters degree in computer science from Stanford University.
Janaki Mythily Kumar is a manager in the user-experience team at SAP Labs in Palo Alto, California. She has worked in the field of human-computer interaction since 1994 on productions in the CRM, financial and procurement domains. She collaborates on a daily basis with her colleagues around the world to design and build applications that support the needs of the enterprise worker.
©2006 ACM 1072-5220/06/0700 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc.