XML Community of Practice

Meeting Notes

July 19, 2006


This meeting was hosted by Microsoft at its office in Washington, DC.


Attendees and teleconference participants introduced themselves. Owen Ambur thanked Microsoft for hosting the meeting on short notice when IBM was unable to do so. He noted that the Department of the Interior and many other agencies own enterprise licenses for the Microsoft Office Professional Suite, which includes InfoPath, and that he, Philo Janus, and Jon Barrett are exploring prospects for scheduling InfoPath training for interested agencies. He invited those interested in participating in such training to contact him.


He also mentioned that IBM has agreed to host the xmlCoP’s August 16 meeting for the purpose of providing a briefing on Extensible Forms Description Language (XFDL) and perhaps DB2 version 9, XForms and other XML-related standards initiatives in which IBM is playing leadership roles. Noting the presence of Ed Chase, he suggested that all government forms should be available in all of the major E-forms applications, including PDF and IBM/PureEdge Workplace forms, through the Business Gateway’s eForms service.


Jon Barrett announced that the National Capital Chapter of AIIM (NCC-AIIM) is looking for presentation topics on the theme of collaboration and how enterprise content management (ECM) enables information sharing. They are particularly interested in hearing about best practices, case studies, and lessons learned. Owen commented on the close relationship of those themes to metadata, XML file formats, and AIIM’s C30 committee’s revision of Technical Report 48 (TR-48), which includes a lengthy list of document, content, and records management metadata elements.


Owen clicked through the ET.gov-related links in the opening segment of the agenda – http://xml.gov/agenda/20060719.htm – noting that the most recent submission relates to 3M’s RFID file tracking software. He then introduced Dave Sayers on the teleconference, who provided a brief overview of the product. Owen noted that it would have been very useful to him for purposes of tracking draft testimony through the bureaucracy when he served as Congressional Liaison for the U.S. Fish and Wildlife Service and that even today documents are circulated and reviewed on paper in the Department of the Interior.


Owen displayed the XML-related submissions to ET.gov, using i411's advanced faceted search service. He also showed the listings of ET component CoPs that have reached Stages 2 and 3 of the process. With reference to Stage 2 CoPs, he turned to KC Morris on the teleconference to provide a status update on NIST’s Quality of Design (QOD) tool. In addition to the QOD, he highlighted the XBRL CoP and AIIM’s iECM standards committee as candidates for promotion to Stage 3. He then turned to Paul Fontaine, who is acting chair of the iECM steering committee, to say a few words about the initiative.


Philo Janus briefed the group on Microsoft’s Office Open XML file formats, noting that they are moving away from binary file formats. Doing so will enable users to leverage the power of the applications without being tied to a binary format. Previously, for example, there has been no way to generate a Word file without running Word but now it will be possible to do so without owning Word software. Another key feature of the new formats will be the ability to sanitize them by running third-party tools to remove comments, editing markup, and other metadata that may be inappropriate for inclusion in final copies for public release. Because the formats will be human-readable as well as machine-readable, it will be possible to view them to make sure that no inappropriate information is included.


Acknowledging that there will be difficulties associated with making the transition to the new formats, Philo argued that now is the time to bite the bullet and make the move, suggesting that a 10- to 20-year effort will be required to migrate the old binary formats. Owen noted that the National Archives and Records Administration (NARA) has a program underway, the Electronic Records Archive (ERA), aimed at that very purpose and that Congress has provided substantial funding for it. Philo indicated that Microsoft is working on conversion tools and expressed the belief that the new XML file formats will be used heavily to create PowerPoint files from other documents. Owen commented that he had long wanted to be able to suggest that those who wish to make presentations to the xmlCoP should prepare their presentations in XML format. Currently, he converts PPT files to HTML and posts both renditions on the xml.gov site for the benefit of those who do not have PowerPoint, but now that PowerPoint files will be in XML format, he looks forward not to having to convert them to HTML.


Philo noted that the new formats will be contained in plain .zip files that will reduce the files sizes and can be opened with common ZIP software tools. Microsoft will be providing a compatibility pack for Office 2000 so that users will not be required to upgrade in order to access and use the new formats, consistent with the capabilities of the Office 2000 applications. In addition, bulk conversion tools will be provided to convert the old files to the new formats. Frank Napoli asked whether it will be possible to apply external applications to find all of the old files and convert them to the new and the answer is, yes, there is no reason that could not be done.


Another important feature of the new file formats is increased security, because macros will be excluded from the files themselves. Macros will be stored in a separate file, which can be stripped off or ignored for security purposes. For very large spreadsheet (.xls) files, the option will be provided to save them as binary files for performance purposes, but in most instances, that will not be necessary. Comments will be stored in a separate file and can be encrypted or removed as necessary to protect information from inappropriate access or release, and comments will be stored in conformance with a schema, thus making it very easy to program against them. PowerPoint files will generally be 74% smaller than in the past and substantial reductions in other file sizes will also be realized. Moreover, Philo reiterated that users will not be required to own any Microsoft software in order to create and use Microsoft Office Open XML files. Finally, he noted that vast amounts of additional information are available on Microsoft’s Web site.


In response to a question about who will control changes to the file format specifications, Philo said Ecma will own and control the standard. For those who wish to keep track of developments in that regard, he cited the OpenXMLDeveloper.org site. Ironically, he noted that the first posting on that site related to how to generate Open Office documents in Java. With respect to the proliferation of standards, he suggested that two or three are okay but not thousands. He also noted that as soon as the new Office suite is released agencies with software assurance contracts will immediately own the right to implement it, without the need for any further procurement action. He also pointed out that compatibility with the new formatting standards can be enforced by policy. He briefly demonstrated the smart arts graphics capabilities of Office 2007. The new graphics are saved as XML but can be shared with previous versions of the software as “flat” graphics files in compatibility mode, albeit without the smart tags that accompany the new files.


Philo noted that the new formats will accommodate three types of metadata, known as document properties, i.e., core properties, application properties, and custom, user-defined properties. Owen noted that this feature is closely related to well-known metadata sets, like the Dublin Core, as well as AIIM’s iECM initiative. Philo observed that it is pretty easy to write software programs to use the new properties (metadata). Paul asked if the properties use XML Schema and Philo indicated, yes, they do and the schemas are publically available. In response to a question about XSLT, Philo responded that it is pretty trivial to write code to apply stylesheet transformations and he suggested that software developers should look into the capabilities of Visual Studio to build applications around the Office applications. Quyen Nguyen asked if the XML specifies both the content as well as its presentation and Philo said, yes, it does, in separate files. Quyen also asked about digital signature of the files and Philo indicated that they can be digitally signed but that he would need to defer to others to explain the details.


Philo’s presentation on the Office Open XML file formats is available at http://xml.gov/presentations/ms3/officexml.htm


After the break, Philo briefed the group on XML Paper Specification (XPS). He indicated the .NET 3.0 framework will have APIs for XPS and he noted that the issue of data versus the templates has been a huge problem with respect to eForms. Owen noted the legal doctrine of the “four corners of a document” and that the inability to control what falls within the four corners, when electronic documents can be presented in myriad different ways, was what prompted the U.S. Courts to take a leadership role in having PDF/A established as an ISO standard for archival records. In response to a question about digital rights management, Philo indicated that is can be applied but relies upon the digital rights management server and encryption architecture. In response to a question about the ability to render XPS files in HTML, Philo indicated he could not respond specifically on that point but that the goal is to be able to render files with complete fidelity within a browser. He also noted that workflow metadata can be embedded within the files for integration with document workflow processes.


Philo’s presentation on XPS is available at http://xml.gov/presentations/ms3/xps.htm


Michael Lazar briefed the group on GemStone System’s GemFire Enterprise, which he said enables processing acceleration gains on the order of magnitude of 10 to 100 times faster, thus addressing peak processing needs without additional hardware or software. Their product does not fall into the category of EAI or EII but is what Forrester and others have begun calling “data fabric.” Michael noted that traditional client/server applications are tremendously more efficient than the new SOA applications, but that products like GemFire enable realization of the benefits of SOA while reclaiming processing efficiencies. For example, user authentication can be performed only once rather than multiple times for each separate service in a multi-staged SOA application. The authentication token can be stored once in memory for a configurable length of time. Session and state management can be addressed, and Michael noted that some state objects may be as large as three or four megabytes. Data fabric applications can also manage data, such as enumeration/code lists, that are frequently referenced. With respect to data transformations, Michael noted that the Document Object Model (DOM) is highly flexible but expensive in terms of processing. GemFire compresses the DOM in memory, thus providing for accelerated processing. Finally, with respect to network bandwidth and survivability, he pointed out that GemFire can cache networked information on LANs and he cited GemFire Design Patterns addressing such requirements. In response to a question about security, Michael noted that GemFire can be enabled to only cache to memory and not from disk. GemFire does secure the network connections but currently defers to others to address users level security requirements.


Michael’s presentation is available at http://xml.gov/presentations/gemstone/gemfire.htm


Those who registered their presence at this meeting included:


Owen Ambur, Co-Chair, xmlCoP

Ed Chase, Adobe

Paul Fontaine, FAA

Joab Jackson, GCN

Philo Janus, Microsoft

Michael Lazar, GemStone

Amy LeSueur, Microsoft

Frank Napoli, LMI

Quyen Nguyen, NARA

Rob Shore, Microsoft

Hal Pierson, FAA

Steve Rixse, GemStone

Yonatan Tesemmay, DTS


Those who identified themselves as participating via teleconference were:


Jon Barrett, Microsoft

Brian Delacey, Interactive Securities

Jason Larock, Corel

KC Morris, NIST

Dave Sayers, 3M

Dana Stone, Merck

Allyson Ugarte, XBRL US & Spain (dialing in from Spain)


Please convey any additions or corrections to Owen_Ambur@ios.doi.gov