Research alerts

IX.5 September 2002
Page: 9
Digital Citation

A scalable method for deductive generalization in the spreadsheet paradigm


The following abstracts are from recent issues and the forthcoming issue of ACM's Transactions of Computer Human Interaction (ToCHI). They are included here to alert interactions' readers to what research is being done in the field of Computer Human Interaction. The complete papers, when published, can be found in ACM's Digital Library at

"Generalization" in spreadsheets occurs when a formula referring to a particular cell is copied to other cells. Once the formula is copied, the system then needs to determine the appropriate references in the formulas of those new cells. We have developed a new generalization method that supports extending the spreadsheet paradigm beyond today's commercial spreadsheets, which are restricted to a single grid of cells.

The usual solution to generalization in commercial spreadsheet systems is to apply a strictly spatial approach based on physical relationships. In this approach, when a user copies or "fills" a formula into other cells, the system generalizes any cell references in the copied formula according to the distance of the new cells' number of rows and columns from the original cell.

However, the spreadsheet paradigm includes not only commercial spreadsheet systems, but also a number of research systems that extend the paradigm with features such as supporting multiple copies of the same sheet, graphical types, and high-quality visualizations of complex data. With these extensions, the spatial approach falls short, because the extensions require logical relationships—in addition to or instead of spatial relationships—to define the generalized meaning of the spreadsheet.

For example, suppose a population analyst wants to define a visual representation of data using domain-specific visualization rules that make use of the built-in primitive Circle spreadsheet shown in Figure 1. Figure 2 shows such a visualization in Forms/3. The spreadsheet categorizes population data into cities, towns, and villages, and represents each with a differently sized black circle. If a referenced cell is on another spreadsheet, the notation displayed in the formula is to precede the cell name with the spreadsheet name and a colon, such as "282_primitiveCircle: newCircle."

To create the spreadsheet shown in Figure 2, the population analyst sketches a circle, which creates another instance of the system-provided primitiveCircle spreadsheet (e.g., 179_primitiveCircle), and then edits the instance's cell fillForeColor to Black and its cell radius to be 1 + (population:population[i@j] / 10000)) to refer generically to an element of the Population grid.

The analyst's task is finished, but the system still needs to generalize further. If it did not generalize, all the cells in the graph grid would be the same size, because they would all refer to newCircle on the same copy (such as 179_primitiveCircle). After the system generalizes, each reference in graph's formula will be to cell newCircle on an appropriate copy of primitiveCircle.

After generalization, overly specific instances in the formula, such as 179_primitiveCircle, are treated as samples. The fact that they are now just samples is communicated to users by the tiny key icon next to them. If the user places the mouse over or clicks the key icon, a legend is displayed, as in Figure 2(b), to show what the sample stands for. Figure 2(b) says that the sample reference in the formula for the Graph[i@j] cells is to newCircle on a copy of primitiveCircle, the fillForeColor cell of which is Black and the radius cell of which is calculated using population:population[i@j].

How did this come about? The generalization method consists of two steps: (1) incrementally tracking logical relationships and (2) lazily generalizing these relationships "just in time." Step 1 consists of incrementally building a graph of the relationships among the cells. This step is triggered whenever a user establishes or changes a relationship by editing a formula. Step 2 uses this graph to generalize. It is triggered whenever the system determines that generalization cannot be delayed any longer. Sometimes the relationships are fairly simple to figure out, as in Figure 2, but sometimes they involve complex paths through a number of related cells. In Step 2, the system completes its work in a bottom-up fashion. Starting with an overly concrete cell reference, the system deduces a generalized description from the graph of relationships it built during Step 1 and substitutes, via algebraic back-substitution, the generalized description into all relevant formulas that use the concrete reference. Generalization is complete when all the concrete instances involved in the computation being generalized have finally been eliminated through these substitutions.


This research was supported in part by the National Science Foundation under awards CCR-9806821 and ITR-0082265 and by National Aeronautics and Space Administration grant NGT540022.


Margaret Burnett
Department of Computer Science
Oregon State University
Corvallis, OR 97331
[email protected]

Sherry Yang
Computer Systems Engineering Technology
Oregon Institute of Technology
3201 Campus Dr.
Klamath Falls, OR 97601
[email protected]

Jay Summet
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332-0280
[email protected]

F1Figure 1. A portion of a Forms/3 spreadsheet that defines a circle. The attributes of the circle in cell newCircle are specified by the other cells, some of which look like buttons and menus.

F2Figure 2. (a) The population spreadsheet in Forms/3. Generalization has occurred, as is clear from the different circles in Graph's different cells, even though they all share the same formula. (b) The legend showing what 179_primitiveCircle has been generalized to mean.

©2002 ACM  1072-5220/02/0900  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Post Comment

No Comments Found