Increasingly, information is being reused across applications, portals, devices, and users. Current publishing systems manage content by providing version and access control, the means of publishing pages, organizing workflow, and separating content and design. However, even with support for the separation of content and design, most systems today do not easily accommodate the efficient reuse of information. Their frameworks enable it, but the organization must decide to customize the environment with complicated scripts and custom code. These tools need to support not only the separation of content from design, but also the reuse of content for multiple purposes, and support the context in which each information fragment is being used.
As we move into more dynamic environments, where information is customized to each user and task, content management systems must support the notion that designers cannot be present at all times. Designers must create meta-designs, or representations of designs, to be used in the interactive and personalized delivery of information. This ability to provide "agile publishing"  challenges the standard practices in the design community. In addition, the content must have a "sense of itself" (a concept presented by N. Negroponte of the MIT Media Lab, around 1995) so it knows how to render itself given its current context.
At IBM's Internet Technology Group, we developed a prototype content management system, called Franklin [3, 4], to address some of these issues. Franklin is based on the notion that content can be organized into reusable fragments of information. These fragments support content reusability with a simplified model of the management of content and design that enforces integrity and consistency. In addition, we supported the customization of content to users and the delivery of content to a variety of display devices. In Franklin, the separation of content from design was achieved through the industry standards of XML and XSL stylesheets.
A strong guiding principle in Franklin's development was to hide the details of the underlying implementation and data representations from the user. As a result, automatically generated forms provided editors with a simplified interface into which to enter content. This form is generated from the document type definition (DTD) and presents input fields for only the data the user needs to edit. Other data are hidden or automatically generated by the system.
The separation of content and design provides the basis of the document-based model. It simplifies the maintenance of both the content and the design. Each is controlled separately and can be authored by people with the appropriate skill sets. As the content evolves, authors of that content can modify the information documents. As the design evolves, designers in charge of the "look and feel" can modify the new stylesheets. Design rules are encoded and stored centrally, increasing the integrity of the overall design strategy. This guarantees a strong and consistent brand across the published information. This separation of content and design also eases the delivery of content to new devices. Each new device represents a new set of stylesheets, but the content remains unaffected. Today, this model is a widely accepted practice in the industry.
A slightly improved content model supports a fragment-based approach. In such a system, there is no limit to the ability to reuse the information. When properly tagged, content can be reused for customization to different audiences and devices. But along with this reuse comes the need to effectively update dependent content and design fragments. A system that takes this approach must maintain dependencies for automatic and efficient updates. With the object dependencies being tracked, when content is modified in one place, it can automatically trigger updates to documents wherever that content occurs. This intelligent updating of content also applies to the updating of designs. The dependency of a particular stylesheet is maintained and, when that asset changes, the dependent pages are automatically updated.
Sidebar. Industry Standards within Franklin
To do this, Franklin incorporated research from IBM that was developed for the events infrastructure, including Olympic games Web sites. The technology, called Trigger Monitor , maintained an object-dependency graph of information fragments within an HTML Web site. When a fragment of information changed, all the pages that depended on that fragment were automatically and efficiently updated. This began to address the issue of content reuse.
We extended Trigger Monitor to support XML and XSL stylesheets. We added dependencies to track the XML content, as well as the XSL stylesheets that embody the site's design. Then, when the content or the design of the Web site changed, all pages that depended on those fragments would automatically update. The ibm.com country portal sites now use Franklin, which contains approximately 29,000 fragments of information for 86 countries. The system is currently used by approximately 150 editors. For ibm.com, the use of XML and XSL as fragments has enabled easy publication to different audiences and different devices. The use of information fragments has ensured the consistency and relevance of the information.
However, a remaining problem of the fragment-based model is that it is difficult to extend given a new context. And it is not possible to know before deployment all the possible contexts of use. We have to add elements to the document definition and modify the stylesheets to enable the presentation of the fragment in new contexts.
But, if information fragments had a "sense" of themselves, adding a new context could control the way information would be used in that situation. These contexts would be based on constraints made explicit for each fragment. By applying techniques from object-oriented and constraint-based programming, we could simplify the interface for the system user by abstracting and encapsulating the knowledge in the information fragments. The end user of the information could then see a more task- and context-based presentation, whereas the content manager could see information that is more understandable and maintainable.
We have some way to go to this point. The first generation of content publishing systems on the Web was focused on HTML. Now, XML/XSL is an important standard. But, because of the HTML legacy, many systems don't properly support this new separation of form and content. Moreover, fewer systems actually support the model of reusable information fragments. And, unfortunately, no systems today support the notion of context-based information management systems with information that has a sense of itself. Future work needs to focus on the information objects and their potential use in different contexts. Explicitly representing these contexts will allow the system to be smarter in how it applied that information to new situations.
3. Meliksetian, D., Weitzman, L., Elo-Dean, S., Milton, J., Zhou, N., Davis, P., and Wu., J. XML content management: Challenges and solutions. In Proceedings of XMLEurope 2001 (Berlin, May 21-25, 2001).
4. Weitzman, L., Meliksetian, D., Elo-Dean, S., Wu, J., Zhou, N., and Gupta, K. Transforming the Content Management Process at ibm.com. In Proceedings of CHI2002-AIGA Experience Design Forum (Minneapolis, MN, April 21-22, 2002).
I would like to thank the Internet Technology Group for providing the support for this project and especially Dikran Meliksetian for his review and input.
Louis Weitzman is a senior software engineer in IBM's Internet Technology Group. For over 25 years he has worked at the intersection of design and computation. He received his doctorate from MIT's Media Lab where he investigated the use of visual languages as a formalism to represent and support the design process.
DTD: Document Type Definition (DTD) provides a means for defining the structure, content, and semantics of XML documents. This standard has evolved into XML Schemas. [World Wide Web Consortium, XML Schemas and Document Type Definitions, www.w3.org/XML/Schema]
HTTP: Hypertext Transfer Protocol (HTTP) is the protocol for communication on the Web. [World Wide Web Consortium, Hypertext Transfer Protocol, www.w3.org/Protocols/]
WebDav: Web-based Distributed Authoring and Versioning (WebDav) is a set of extensions to the HTTP protocol to support users to collaboratively edit and manage files on remote Web servers. [Web-based Distributed Authoring and Versioning, www.webdav.org/]
XML: Extensible Markup Language (XML) is the universal format for structured documents and data on the Web. [World Wide Web Consortium, Extensible Markup Language, www.w3.org/XML/]
XSL: Extensible Stylesheet Language (XSL) is a language for expressing stylesheets. It consists of three parts: XSL Transformations (XSLT), a language for transforming XML documents; the XML Path Language (XPath), an expression language used by XSLT to access or refer to parts of an XML document; and the XSL Formatting Objects (XSLFO), an XML vocabulary for specifying formatting semantics. [World Wide Web Consortium, Extensible Style Sheet Language, www.w3.org/Style/XSL/]
©2004 ACM 1072-5220/04/0300 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc.