Marco Padula, Giuliana Rinaldi
We are passing from a world in which information management lies in the hands of a few devotees to one of widespread and diffuse consumption of information . Raw data is of little practical use; what counts is the knowledge that can be extracted from plain information to support decisional processes and scientific analysis and to synthesize documentation. Multimedia information is rich and easy to use for humans, who draw on it regularly to carry out their activities and communicate the results obtained. But the mass of information now available is so great that it must be organized to improve its usability. And the automatic structuring of the large amount of raw data at our disposal is quite a difficult job: approaches coming from the database community, keen to generate well-structured multidatabases and suitable query languages, are intermingled with contributions from the uncontrolled, unexplored, and unpredictable interaction of a multitude of people who require assistance to navigate hypermedia space. The automatic organization of multimedia information raises the important problem of content emergence , which can be succinctly expressed in two questions: What is required to find and focus on a particular set of information? and What path do we take, instead of the traditional geographical one, through those information centers with the greater probability of satisfying the seeker’s need ?
This situation, in which many resources are available but can be exploited only with great difficulty, has motivated researchers and practitioners of automatic data management to look more closely at data mining and knowledge discovery in databases  and at collaborative approaches to both scientific discovery  and information exploration .
We believe that successful and outstanding contributions require collaboration among researchers in different disciplines, each of whom has a different part to play in preparing a synergic mixture of
- Large databases organized and offered online for remote accessing, integrated through meaningful links among data;
- Freshly defined, working metaphors to map traditional methods into the new information technology environments.
Interaction through an interface that (1) is user-friendly, offering easy interaction to general users who very often remain permanent novices with respect to the data accessed and the technologies used; (2) is user-centered, referring to the user’s traditional paradigms and to his or her daily work metaphors when presenting data and offering tools for the organization, description, and presentation of documents and composing these metaphors into leitmotifs that mimic the user’s workplace, to facilitate browsing for the successful search of online information ; and (3) provides navigation aids tailored to the application domain.
Collaboration nurtures scientific work. Facilitating collaboration throughout a geographically distributed community is a complex task requiring the support of user-centered systems.
In this paper we present a work whose aim is putting into effect the potential offered today by information technologies to realize a working environment that effectively addresses the communication needs of a scientific community involved in seismological research. Currently seismologists, engineers, and historians cooperating in seismic hazard assessment often need to browse all available information on seismic events, including not only parametric earthquake records but also the original texts, intensity maps, and so on from which those records were drawn. Collecting all this information is only the starting point, not the result of researchers’ analysis; therefore, researchers must not only be able to rely on the quality of and easy access to the stored data, but also be empowered to communicate the results obtained and, in turn, refine and extend these data and propose new results.
Along with the scenario we have outlined, we have considered the concepts of data mining, content emergence, and collaboration as landmarks to design a working environment according to the trends displayed by the broad users’ population of the Internet. Information problems are seen in a new perspective, which calls for a global openness to the multifaceted network configuration and to the interrelationships among protagonists. The role of mass storage as electronic memory has been modified and is being replaced by the information network with a new organization of all available information and computing resources, distributed and concentrated. It is a network in which everyone intervenes with his or her own contribution, altering, updating, and offering his or her own results . From an operational point of view, the opportunity to change perspective comes from the power of the client-server paradigm used to design digital network architecture; from the hypermedia paradigm, which offers more complex and expressive user interaction; and from tools and programming languages that allow integration of data and applications.
Facilitating collaboration is a complex task requiring the support of user-centered systems.
The design of user interaction has been based on the actual working procedures and data description languages employed by the domain experts addressed, but, unfortunately, they are neither fixed nor always formally describable. Therefore, representatives of the scientific community have been involved in a cooperative and progressive design process, shifting their role toward one of greater participation in both development and exploitation phases . The environment has been designed as a Web application according to the architectural scheme presented by Bianchi et al. . The scheme stems from the experience of different application domains and is aimed at creating anthropocentric systems and at enhancing the skills and abilities of users as an important means for improving system performance . Consequently, the Web application has three main components: the navigation structure, the surface, and the data space.
The design of the navigation structure of the applicationtranslating users’ needs (as previously identified) into functional specifications for data organization, structuring contents, and pathshas benefited from the use of a data modeling methodology focusing on the design, development, and construction stages [11, 15].
Surface design, that is, how information is presented to users and how users interact with it, has implied decisions about the adoption of a user-centered style of interaction versus a user-friendly one, always taking into account the guidelines for the usability of new web applications . Users interact with Web pages to collect data (formatted text, tables, maps, graphics) presented in meaningful arrangements that focus their attention on specific tasks. Their need to consult multiple levels of information while performing a single task has been met. Task-oriented partitions were designed for the pages in task control panels of different complexities. Control panels are tailored for multimedia macroseismic information querying, presenting and browsing through easy interaction involving no strange vocabulary. The arrangement of the data offers users a more comfortable working page at the expense of greater computing resources for managing its implementation in the current Web infrastructure. The control panel becomes a multimedia earthquake browsing metaphor.
The data space is serves as the basis of an information system that is flexible and integrated and can be used by work stations distributed over a telematic network. The information system is constantly updated as studies and research proceed. It also serves as a reference for researchers in the field, without excluding earth scientists, engineers, government agencies, and the general public. Well-known systems already supplied with an interface to the Web have been chosen for data management.
The application has been developed by refining prototypes through several test cycles. The semantics of the application for both domain experts and Web developers has been progressively developed by using diagrams illustrating user navigation in the application, mock-ups (paper-based prototypes) of Web pages outlining user interaction, and the application itself.
Because of the social relevance of their activities, seismological operators must be able to access new results and information in real time. From a strictly computational point of view, the same data item, or data set, regardless of its mono- or multimedia nature, is often exploited by different application tasks. Consequently, similar functionalities of different tools should be realized using the same piece of code. The highly modular architecture of the application and the convenience of the tools adopted for its implementation allow the rapid re-use of data and software, enhancing the application’s usability and reducing the time required to define new working metaphors, update and extend published materials according to the agreed metaphors, and develop the application.
The control panel becomes a multimedia earthquake browsing metaphor.
User Requirements for Knowledge Discovery in the Context of Historical Seismology
Web miners often navigate through unfamiliar resources, turning data into knowledge by applying procedures that still rely on manual analysis and human interpretation. Information technologists are challenged to conceive new concepts and methods to aid knowledge seeking. The formal description and implementation of data models for automatic seeking is an arduous and complex task that still produces unsatisfactory results. Thematic Web sites are a channel offering a mass of relevant documentation that is still collected and semantically structured with the interactive cooperation of end-user specialists. Knowledge emergence would be improved by domain-specific online databases concentrating on related information and by data viewing authored and customized to support data selection, presentation, and interpretation. The more structured the information on the Web, the easier and more precise the data mining.
Macroseismic research relies on a knowledge discovering activity that yields a large amount of data that provide the details for further analyses and searches. Both seismic hazard analysisused to make engineered constructions safeand civil defense planning need to extend the knowledge of seismicity backward in time. The history of the seismicity of a country is synthesized in a Parametric Earthquake Catalogue (PEC), The PEC is an ordered set of parametric strings, each containing the main parameters of a seismic event, such as the date, epicenter coordinates, intensity, and magnitude of the earthquake. Historical seismological research used to produce a parametric earthquake catalogue is an information intensive activity performed in four phases, corresponding to different levels of abbreviation :
- Mining of historical sources to extract news of the event;
- Selection and interpretation of news;
- Earthquake definition, through parametrization of the news in terms of macroseismic intensity; and
- Earthquake parametrization.
The long-term objective of the application is to render the information produced at each level available to allow backtracking of the compiling procedure of the catalogue: if the data are available, the assumptions in the compiling procedure can be validated, and the reliability of the catalogue can be assessed.
The development of the application has moved in a direction (Figure 1) opposite to that of research activity. First a PEC was delivered (see Level 1 in Figure 1), followed by the Intensity Datapoints (ID) used for parametrization (see Level 2 in Figure 1); the integration of data of the other levels is still under study.
The study examines products by the Gruppo Nazionale per la Difesa dai Terremoti (National Group for Protection against Earthquakes): a Parametric Earthquake Catalogue of damaging earthquakes in the Italian area NT4.1 , and the related Database of Macroseismic Observations DOM4.1 . NT4.1 lists 2,421 damaging earthquakes in the Italian area (surface waves Magnitude Ms >= 4.0; Epicentral Intensity, Io >= 5/6) registered between the years 1000 and 1980 in 80 seismogenic zones and the neighboring areas. DOM4.1 contains 36,000 Intensity Datapoints, which refer to 950 earthquakes and nearly 10,000 localities.
The multimedia data managed by the application are formatted data (PEC, ID) images, such as epicenter maps (EM) and intensity maps (IM), diagrams (seismic histories), and texts.
The application we plan to introduce is a compromise between the current limits of Web browsers and data representations, which strictly resemble the notations and methodologies employed by experts in their traditional working environment to explore and communicate data: the users are supplied with all the information requested by means of a task-oriented partition of the Web pages. Thus, when referring to their working use the Web pages are called control panels.
Various paradigms have been defined and developed to access and query the catalogue NT4.1: by seismogenic zone (Figure 2), by space-time windows (Figure 3), by earthquake. Macroseismic data can be accessed in two ways formulating the minimum requirement: by earthquake (Figure 4) to extract the intensity datapoints, and by locality (Figure 6), to have seismic histories of the sites. The different panels are described in detail here in the following sections.
The Surface: A Control Panel Interface to Data Space
We define the surface as an instrument for system observation and navigation through reliable paths [4, 17]. The surface is composed of Web pages [see 6]: a set of electronically accessible documents with hypertext characteristics. Each document may be passive, active, or dynamic and is defined according to the requirements of the task communication. Pages of all three types have been produced that propose arrangements (views on data) that on the one hand allow the user to focus on a specific task and on the other organize data collected from the data space in any form (formatted text, tables, maps, graphics).
- Passive pages of text, graphics, and hyperlinks are used for descriptive components, such as catalogue contents, table of contents, and information for help online.
- Active pages, which also include interactive graphic user interface components, such as forms, are used to allow users to input data. Examples are the Guest Book panel where users can enter their comments to the editors (http://emidius.itim.mi.cnr.it/cgi-bin/ php.cgi/commenti_home.html) and the NT4.1-Query by space-time windows panel for catalogue NT4.1, which users employ to formulate their queries by filling in the form with data such as the date’s and coordinates’ range to select the area of interest in the catalogue (http://emidius.itim.mi.cnr.it/cgi-bin/php.cgi/NT/Spaziotempo).
- Dynamic pages change over time, modifying their contents and/or layout. They are mainly used when switching from query to retrieval operations. One example is the panel DOM4.1-Query by earthquake, which accesses the database of macroseismic information DOM4.1 (http://emidius.itim.mi.cnr.it/DOM/consultazione.html).
Control panels have a landscape format, designed for a monitor with a resolution of 624 x 832. They are partitioned into frames to abbreviate the distance in space and time between source and destination nodes during visualization. Frames are passive, active, and dynamic in turn. Most of the panels are vertically partitioned: the (passive) frame on the left is used to display information and hotwords that must remain in view throughout navigation, as they are linked to the different panels for querying the databases; the (active, or dynamic) frame on the right is for user interaction with data. Figures 2a and 3a illustrate two types of queries addressing the NT4.1 catalogue. In Figure 2a, the (active) frame on the right is used to query the catalogue by seismogenic zone. The query is visual: a click on a seismogenic zone fires the selection of the records of earthquakes with epicenters within the zone. In Figure 3a, the (active) frame on the right is used to query the catalogue through space-time windows. Here the query is made by filling in a form. In both cases a new passive page visualizing the complete data records is dynamically created (Figures 2b and 3b).
All the panels for querying (by seismogenic zone, by space-time windows, by epicenter) and for browsing the whole catalogue refer to only one level of synthesis of macroseismic data, that is, the catalogue at Level 1 (Figure 1). Instead, the panel DOM4.1 - Query by earthquake, which is linked to the hotphrase catalogue+primary data, simultaneously presents and browses data at the different levels of synthesis that characterize the same earthquake: at the catalogue level (1), at the intensity data level (2), and at the macroseismic study level (3). Figure 4 shows the result of a query. Clicking on the earthquake record in the upper right frame (in reduced format), visualizes by default intensity data and their visual display on the map in the bottom frames. The bottom right frame is used to visualize the macroseismic study on request. Except for the upper left frames, which are passive frames constituting a help area and the legend, the other frames are active and change their content at the user’s request.
Figure 5 shows how current hypermedia technology plays a role in creating the complex task-oriented layout designed for the panel DOM4. 1 - Query by earthquake: the frame partitioning of the page "flattens" on the surface the data referring to the different levels of synthesis.
The Data Space
The data space is the set of all the permanent data necessary for the specialist’s analysis, for describing the layouts for data presentation, as well as both the processes involved and the input, output, and state variables for the correct execution of the processes.
The data space includes different types of data:
- Texts (main part of the Macroseismic Studies); ...
- Formatted data (PEC NT4.1 and ID database DOM4.1);
- Diagrams (seismic histories), images (cartographic representations such as epicenter maps and intensity maps) produced offline (EM, IM) by a geographic information system, or online as in the case of seismic histories, which synthesize the result of computations on data extracted from the archives.
Textual historical sources and bibliographic catalogues will be stored and managed with the CDS/ISIS system , which was developed by UNESCO, and taken as the kernel for developing new versions offering ISIS archives in the Web environment [2, 23].
Functionalities for selecting records and performing logical operations and joins needed to be available in order to integrate the databases (NT4.1 and DOM4.1). We chose mSQL (mini Structured Query Language) [13, 14], a database management system (DBMS) designed to offer rapid access from remote workplaces to data that are stored and managed. mSQL is a sublanguage of an SQL that does not allow the storage of views (character strings representing queries) and nested queries. Notwithstanding this shortcoming, the database is accessed in ANSI SQL. mSQL is interfaced with HTML so that both remote and local databases can be handled.
The agents that allow remote accessing of the archives in order to extract data and organize them for the designed queries have been manifested in PHP/FI (Personal Home Page tools/Form Interpreter) language, version 2.0 . PHP/FI makes it possible to manage SQL queries from HTML pages and to visualize retrieved data through instructions embedded in the HTML source. Programs fired by interaction with a Web page are usually realized in autonomous code and communicate with the Web server through a suitable common gateway interface (CGI). By using PHP/FI, a user can embed both layout and agent descriptions in a page description, easing the burden on the agents managing client-server communication on the Web.
A Global View of the Application
The Web today is still too static to exploit all its potential interactivity . Based on a client-server architecture, it assigns almost all computation to the server and relegates the client’s powers to functionalities that allow the user local interaction for navigating. Direct manipulation is practically impossible: common operations such as drag and drop, resizing, graphic interaction, and sketching are executed through sequences of single HTTP request-response cycles, each requiring too much time given the current Internet bandwidth and the server’s workload. The CGI is currently the most widely used mechanism for distributed data processing on the Web.
Despite the limitations of the Web in supporting interface development, its use has recently grown exponentially, mainly because of
- Its public domain code;
- The availability of large, heterogeneous archives;
- Its friendliness;
- The availability of assorted browsers; and
- The availability of gateways that allow switching among many different transmission protocols.
Full exploitation of these characteristics in our design and implementation of the application (Figure 7) was possible thanks to the modularity of its functional scheme (presented in this section) and to its hierarchic architecture (exemplified in the next section). Furthermore, the reached software granularity was ideal for reusing different application components, thereby accelerating the development of the whole application.
The interface is organized as a navigable web of specific views on the same data space. On the surface side, each view appears as a control panel allowing the user to focus on each information set as needed to carry on his investigation. On the system side, it communicates and activates the agents to autonomously control user interaction and the underlying computation for changing views, processing data, and building the application surface.
The local agents are wired into the client (Web browser, such as Netscape, Mosaic, or Hot Java ) to carry out its functions. There are two main functions:
- Builder B interprets the view description when received from the dialogue manager; and
- Dialogue manager DM sends user requests for navigation (new view) to the remote agents, receives the new view description, and dispatches it to the builder.
The remote agents are hosted by the server environment; they are developed ad hoc (extractor, processor, interpreter) or may be partially wired into the Web server (e.g., Apache, Roxen, NCSA). There are four main functions:
- Dialogue manager DM (wired) receives the requests for navigation from all the interested clients in the Web space, asks the navigator for a new view, and forwards to the interpreter the agent’s definition embedded in a view;
- Navigator N looks for new view definitions in the view definition library and sends them to the dialogue manager;
- Interpreter I is a CGI which can interpret and execute agents described in a suitable language ;
- Specialized agents SA are activated by the I activity to detail the functioning of the views:
- Data extractors E access the archives to extract the data;
- Data organizers O integrate the data extracted and prepare them for inclusion in the view; and
- Data processors P describe specialized computations, for example,for graphic visualization or statistics.
The bus has appropriate specializations dedicated to the physical access to archives with either data (DBMSmSQL), or studies (IRSISIS) and to libraries with images (GIFs and JPEGs) or view definitions (HTML). Furthermore, the bus dispatches the data extracted to the remote dialogue managers through a remote data transmission agent RDT (wired).
A Fish-Eye’s View of a DOM 4.1Query by Earthquake
The surface shows only the physical component that is, the control panelof the view that allows user interaction for navigating. A logical component that has access to the whole data space and guarantees communication among the different processes is also needed. Bianchi et al. [4, 5] presented an approach to interface design that proposed a logical part composed of three layers of agents:
- A top layer of vision agents that produce the view appearance on the surface;
- A middle layer of observer agents that collect and organize the data to be visualized in the views, and control system variables and the communication among the executive agents and between the upper and lower layers; and
- A bottom layer of executive agents that perform the computations required by the views requested by the user.
This approach has been adapted to the clientserver paradigm, which is the keystone of the explosive success of the World Wide Web. Figure 8 exemplifies this paradigm for the application presented in this paper with an illustration of the complete composition of the DOM4.1Query by earthquake view.
The surface of the view appears on the screen of the local client, which interprets (B) the HTML document describing the view in which layout and data are integrated. The client also manages local user interaction and sends (DMl) the user’s requests to the remote server.
The remote server hosts all the remaining computing activities. The DMr manages a view description in terms of layout and description of the agents that can produce the needed data. The agents are described in a language that can be interpreted by an ad hoc CGI (l), which creates and fires the executive (E) and observer (OIM OMS OID OPEC) agents. Once those data have been collected and organized, they are sent to the DMr, which converts the view description and sends it on to the vision agents.
The Ultimate Application: The Transfer Of Knowledge in Real Time during a Seismic Crisis
During the seismic crisis of SeptemberOctober 1997, a Web site dedicated to the earthquakes occurring in Umbria and Marche regions of central Italy was hurriedly set up to supply information and preliminary results of the activities and investigation carried out in the field by the GNDT (National Group for Protection Against Earthquakes) and other institutions (Italian version: http://emidius.itim.mi.cnr.it/GNDT/ T19970926/home.html; English version: http://emidius.itim.mi.cnr.it/GNDT/T19970926_eng/home.html). Long-term seismicity data of the affected area were already available online from the NT4.1 and DOM4. l.
The rapid development and updating of both data and code were urgently required to deliver a subset of NT4.1 and DOM4.1 data focusing only on the affected area. The reuse of data and software enhanced both productivity (when developing the application) and application usability. The term reuse is defined in Garzotto et al.  as "the use of existing information objects or software artifacts in different contexts and for different purposes." We often exploit the same data item or set, whether mono- or multimedia in nature, for different tasks. We also realize similar functionalities of different tools using the same piece of code. For example, NT4.1 data were reused to develop an ad hoc NT4.1- Query by earthquake view, referring only to NT4.1 records of the seismogenic zones in which the affected area lies.
Software was reused in the realization of the specialized agents. Another example is shown in Figure 9: the paper dealing with the main historical earthquakes in the Umbria-Marche area cites the various earthquakes. Hyperlinks from these references dynamically produce the lower half of the DOM4.1Query by earthquake view, presenting Intensity Datapoints together with the related intensity map (Figure 9a) and the related macroseismic study (Figure 9b). In this case, data, layout, and agents were reused at no additional cost for new implementation.
During the seismic crisis, the social relevance of information technology was highlighted by the support supplied to the operators. From the onset of the crisis, they were able to access qualified data and news easily and as needed from their work places. The reuse of software and data allowed rapid production and delivery of views customized for the affected area, together with the relative documentation.
The evident advantages in these circumstances led to the new perspective of the Internet space as an emergency Web.
This work was funded by a 3-year contract (19961998) with the National Civil Protection Department and GNDT/CNR, aimed at providing research, monitoring, and scientific and technical support for the Civil Protection Department in the field of seismic hazard.
Authors would like to thank all the colleagues who contributed to this work, qualifying the data, co-designing and testing the application.
Special thanks are due to Massimilano Stucchi of IRRS/CNR (Institute of Research on Seismic Risk) who has promoted and encouraged the Internet research activities of GNDT.
24. Rubbia Rinaldi G., Padula M. and Albini P. Designing Virtual Social Memory Organizers: on-line and off-line perspectives for historical earthquake data organization and dissemination, in Proc. INIET96 The Internet: transforming our society now. http://info.isoc.org/isoc/ whatis/conferences/inet/96/proceedings/a2/a2_5.htm
25. Rubbia Rinaldi G., Stucchi M., Padula M. and Zerga A. NT4.lonline, a parametric catalogue of Italian earthquakes on the Web, Eighth ACM Conf. on Hypertext, Southampton, UK, April 6-11, 1997, http://journals. ecs.soton.ac.uk/~Iac/ht97/posters.html
26. Rinaldi G., Padula M., e Zerga A. A user-centered WWW application for macroseismic data dissemination and rapid re-use, International Conference AV198, Advanced Visual Interfaces, L’Aquila, Italy, May 25-27, 1998
28. Stucchi M. and Albini P., New developments in macroseismic investigation, in: E. Faccioli and R. Meli (Eds.), Proc. International Workshop on "Seismology and Earthquake Engineering", Mexico City, April 22-26, 1991, pp. 47-70, 1992
Giuliana Rubbia Rinaldi
Institute for Multimedia Information Technology,
National Research Council
via Ampère 56, 20131 Milano, Italy
Mr. Padula is a researcher at the ITIM-CNR (Institute for Multimedia Information Technology of the National Research Council) in Milan, Italy. His scientific interest is the potential of technologies in developing distributed applications and circulating data through new networks. He studies problems of scientific communication and the communication of cultural artifacts through hypermedia systems.
Giuliana Rubbia Rinaldi
Ms. Rubbia Rinaldi is a researcher for the GNDT/CNR (National Group for Protection against Earthquakes). She is responsible for the design and development of tools to manage and disseminate earthquake data. Her main interest is capturing and organizing the structure of the domain and making it accessible to users through suitable visual interfaces in the context of using Web technology as a tool for scientific and collaborative work.
Figure 2. a. (Top) The control panel for visual query
by seismogenic zone against the Parametric Earthquake
Catalogue NT4.1 database; hypermap of Italian area, seismogenic
zones superimposed. b. (Bottom) Visualization of retrieved
earthquakes records of the selected seismogenic zone.
Figure 3. a. (Top) The control panel for query by
space-time windows against Parametric Earthquake Catalogue
NT4.1 database. The form is used to suggest a query-by-example.
b. (Bottom) The retrieved data matching the query specified in
Figure 4. The control panel for Query by
earthquake against DOM4.1 database. The query is performed
following the hyperlink of the earthquake date (upper right).
Bottom-left: retrieved Intensity Datapoints; bottom right:
Figure 5. Composition of the control panel for Query
by earthquake against DOM4.1 database. (a) Contemporary
display of data from Levels 1 and 2 (compare with
Figure 4); (b)
Contemporary display of data from Levels 1, 2, and 3.
Figure 6. The control panel for query by locality
against DOM4.1 database. Upper left frame: selection of initial
of locality name. Bottom left frame: selection of locality of
interest. Upper right frame: seismic history of the locality is
visualized in the form of table. Bottom left frame: visualization
of the seismic history as diagram.
Figure 9. Examples of data and software reuse in
associating historical earthquake quotations to the related
Intensity Datapoints and intensity map (a) and related
Macroseismic Study (b) (compare with Figure
4, 5). The paper belongs to the
Earthquakes of September and October 1997 in Umbria-Marche
(Central Italy) web.
©1999 ACM 1072-5220/99/0700 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 1999 ACM, Inc.