SOFTWARE AND DATA REQUIREMENTS FOR THE XML REGISTRY FOR THE EPA-STATE NATIONAL ENVIRONMENTAL INFORMATION EXCHANGE NETWORK

 

 

 

CONTRACT NO. 68-W-99-002

TASK ORDER No. 021

 

 

 

Prepared for:

 

United States Environmental Protection Agency

Office of Environmental Information

1200 Pennsylvania Avenue, NW.

Washington, DC 20460

 

Task Order Project Officer:

 

Michael Pendleton

 

 

 

 

 

 

 

 

 

Prepared by:

 

Systems Development Center

Science Applications International Corporation

6565 Arlington Boulevard

Falls Church, VA 22042


 


 

class=Section2>

CONTENTS

EXECUTIVE SUMMARY...................................................................................................... ES-1

 

1.0      INTRODUCTION............................................................................................................... 1

1.1  Purpose.......................................................................................................................... 2

1.2  Scope............................................................................................................................ 2

1.3  System Overview........................................................................................................... 2

1.4  System Architecture........................................................................................................ 3

 

2.0      REFERENCES.................................................................................................................... 6

 

3.0      APPLICABLE STANDARDS............................................................................................ 7

3.1  OASIS/ebXML............................................................................................................. 8

3.2..............   International Organization for Standardization (ISO)/International Electrotechnical

 Commission (IEC) 11179-3:2000 Information technology – Metadata Registry (MDR) - Part 3, Registry metamodel and basic attributes................................................................................... 8

3.3  Universal Description, Discovery and Integration (UDDI)................................................ 9

3.4  Assumptions about Applicable Standards...................................................................... 11

 

4.0      SOFTWARE REQUIREMENTS..................................................................................... 11

4.1  Roles and Role Management......................................................................................... 11

                    4.1.1  Registration Authority........................................................................................ 12

4.1.2  Registry Administrator................................................................................... 12

4.1.3  Responsible Organization............................................................................... 12

4.1.4  Submitting Organization................................................................................. 13

4.1.5  Registry Clients.............................................................................................. 13

4.2  Accessibility................................................................................................................. 13

4.3  Lifecycle Management.................................................................................................. 14

4.3.1  Registration................................................................................................... 14

4.3.1.1  Registered Objects.  ...................................................................... 14

4.3.1.2  XML tags....................................................................................... 15

4.3.1.3  XML datatypes.............................................................................. 15

4.3.1.4  XML schemas (DETs).................................................................... 15

4.3.1.5  XML namespaces.......................................................................... 15

4.3.1.6  XML Trading Partner Agreements.................................................. 16

4.3.1.7  XML document.............................................................................. 16

4.3.1.8  WSDL document.  ........................................................................ 16

4.3.1.9  Registration Process....................................................................... 16

4.3.2  Development Forum...................................................................................... 17

4.3.3  Classification................................................................................................. 18

4.3.4  Administration............................................................................................... 18

4.3.5  Version Control............................................................................................. 19

4.3.6   Object Status Management........................................................................... 20

4.3.7   Validation..................................................................................................... 20

4.3.8   Modifying Content........................................................................................ 21

4.3.9   Approving Objects....................................................................................... 21

4.3.10  Retiring Objects........................................................................................... 21

4.3.11  Removing Objects....................................................................................... 21

4.3.12  Quality Control and Error Handling.............................................................. 22

4.3.13  Audit Trail Maintenance............................................................................... 22

4.4  Query Management...................................................................................................... 22

4.4.1  Discovery/Query........................................................................................... 22

4.4.2  Retrieval........................................................................................................ 23

 

5.0      DATA REQUIREMENTS................................................................................................ 23

5.1  XML Objects and Metadata......................................................................................... 23

5.2  Data Requirements of the OASIS/ebXML RIM version 2.0.......................................... 25

5.3  Data Requirements of the UDDI Specification version 3.0............................................. 26

5.4  Data Requirements of the ISO/IEC 11179 Metamodel.................................................. 27

5.5  Data Requirements Summary........................................................................................ 28

 

6.0     INTEROPERABILITY REQUIREMENTS.................................................................... 30

6.1  Security and Privacy .................................................................................................... 30

6.2  Linkages....................................................................................................................... 30

 

7.0      CONCEPT OF OPERATIONS........................................................................................ 31

 

8.0      PRELIMINARY REGISTRY TOOL OPTIONS............................................................ 33

8.1   Background................................................................................................................. 33

8.2   Existing Online Registries............................................................................................. 34

8.3   Available Registry Software......................................................................................... 35

8.4   Commercially Available Tools...................................................................................... 36

8.5   Related Software......................................................................................................... 37

 

9.0      ACCEPTANCE REQUIREMENTS................................................................................ 38

 

 

EXHIBITS

 

Exhibit 1.  Major Metadata Groupings............................................................................................ 24

Exhibit 2   Data Requirements Summary......................................................................................... 29

 

APPENDIXES

 

Appendix A     Summary of XML Registry Software Requirements

Appendix B     Data Requirements for the OASIS/ebXML Registry Information Model v. 2.0

Appendix C     Data Requirements for the UDDI Specification v. 3.0

Appendix D    Data Requirements for the ISO/IEC 11179 Part 3 Metamodel

Appendix E     XML Registry Requirements Glossary


EXECUTIVE SUMMARY

 

“In the simplest sense, the benefits of XML will be achieved only if organizations of a significant number are using the same XML definitions.  Therefore, these XML definitions must be available for partners to discover and retrieve.  A registry/repository is a mechanism used to discover and retrieve documents, templates, and software (i.e., objects and resources) over the Internet.”  (http://xml.gov)

 

The Environmental Protection Agency (EPA) and its state and tribal information trading partners have initiated collaborative design and development of an Internet-based voluntary National Environmental Information Exchange Network (Network) for state, federal, and Native American Tribal environmental agencies.  An eXtensible Markup Language (XML) Registry is proposed as a component of the Network to serve as a clearinghouse of Network related information, as well as to provide operational support for implementation of the State and EPA nodes of the Network.  In addition, the State-EPA Network XML registry may become part of a larger federation of federal XML registries.  The registry will support both human and automated interactions supporting XML object registration, object status tracking, as well as querying and retrieval for reuse. 

 

The goal of the Network Steering Board is to provide a vehicle for standardizing information exchanges to improve the quality and consistency of the data, and to reduce the reporting burden on the states and tribes.  Therefore, the Network dataflows should be based on data standards that are stored in the Environmental Data Registry.  To ensure the greatest interoperability, the XML Registry should achieve the linkage between data standard metadata and the XML schemas and related documents that are based upon the approved data standards.  To support harmonization of dataflows on the Network, it is important that approved XML schemas and the standard XML tags and other component parts be available for discovery and reuse and reference in new XML schemas.

 

To achieve all of these goals, the proposed XML Registry will be developed based upon three standards: Organization for the Advancement of Structured Information Standards/Electronic Business using eXtensible Markup Language (OASIS/ebXML), International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 11179, and Universal Description, Discovery and Integration Initiative (UDDI).  The OASIS/ebXML standard will be used as a source of specifications for basic XML registry functionality and services.  ISO/IEC 11179 will be used as a source of specifications for the storage of XML tags that are related to corresponding, well-documented  data elements, along with associated enumerated value lists, and their linkage to other XML objects (documents, trading partner agreements, datatypes).  The UDDI specification will guide the registration and discovery of Web services that are part of the Network.

 

This XML Registry Requirements Document will serve to inform the decision about whether to acquire or build an XML Registry to support the Network.  The document outlines applicable standards, surveys available tools, and describes functional and data requirements needed to support the Network.  Once initial decisions have been made on the requirements, an analysis of available implementation options will be developed.   


1.0       INTRODUCTION

 

“In the simplest sense, the benefits of XML will be achieved only if organizations of a significant number are using the same XML definitions.  Therefore, these XML definitions must be available for partners to discover and retrieve.  A registry/repository is a mechanism used to discover and retrieve documents, templates, and software (i.e., objects and resources) over the Internet.” (http://xml.gov)

 

EPA and its state and tribal information trading partners have initiated collaborative design and development of an Internet-based voluntary National Environmental Information Exchange Network (Network) for state, federal, and Native American Tribal environmental agencies.

 

According to the State/EPA Information Management Workgroup, “a Network based on standardized Internet language will allow individual agencies to invest in internal data storage systems of their choice at a pace they can afford, while also supporting easy exchange of environmental data between agencies.”  The Network will facilitate information exchanges between “nodes” maintained individually by participating partners that will use the Internet       to exchange information via standardized eXtensible Markup Language (XML) Data Exchange Templates (DETs) or schemas. [The term schema will be used in this report to refer to an XML document designed for data exchange].  Schemas will be based upon the approved data standards to bring better consistency and quality to the data that trading partners exchange.  Exchange of data between nodes will be governed by Trading Partner Agreements (TPAs) between the partners.  TPAs document the agreed upon data, exchange format, frequency of exchange, security, and related issues.

 

One of the critical nodes on the Network will be an XML Registry that will provide the capability to share information about XML schemas approved for use on the Network, as well as information about schemas under development.  An XML Registry contains registry entries that contain descriptive information, or metadata, about registered XML objects.  The objects may be stored in the registry or in a related repository.  The registry supports the submission and registration of objects, administration of the objects, and makes the metadata available for discovery, understanding, and reuse.  This XML Registry will serve as a location for one-stop shopping of selected information related to the Network, including both a “clearing house” for information and “operational support” for Node implementation.  It should not duplicate functions provided on other Network Nodes.

 

As the information on the Network should be based on data standards approved by the Environmental Data Standards Council (EDSC), the XML Registry should be related to the Environmental Data Registry (EDR) that contains metadata about standard data elements, associated enumerated value domains, and data element groups.  Data standards are "documented agreements on formats and definitions of common data” that are established to bring better consistency and quality to the information that organizations maintain.  The EDR also registers application data elements.  Data trading partners may also develop XML schemas for data they want to share.  It should be possible to document the data elements (as specified by XML tags in an XML schema) in the EDR, even though the data may not be “standardized” through any formal process.

 

1.1       Purpose

 

This XML Registry Requirements document serves to document the requirements for an XML Registry to support the EPA/State Network.  It is part of a series of documents designed to inform the decision about how to provide an XML Registry to support the Network.  The document describes applicable standards, surveys available tools, and describes functional and data requirements needed to support the Network. 

 

1.2       Scope

 

This document identifies functional and data requirements of the XML Registry software, as well as necessary interconnections to related applications.  This document does not include design specifications for the XML Registry, as it may be used to inform a decision to purchase an available registry solution rather than to build a new one.  An options analysis will be addressed in a follow-on document.  The document also includes a high-level concept of operations based upon current understanding of the Network architecture.  A more detailed concept of operations may be developed after a registry solution is selected and the architecture of the Network is more fully defined.

 

1.3       System Overview

 

An XML Registry is planned as part of the Network to serve as a central location for XML objects and related resources.  The XML Registry will provide a lifecycle management interface that will be a tool to manage XML objects through their development and implementation lifecycle.  This interface will be accessible to a limited set of authorized users who will make use of the registration and update functions to manage the metadata about the XML objects, including their status, version, and organizational contacts.  It will provide a forum for exchange of information about XML objects under development to promote harmonization and reuse of schemas.  It will provide a means of tracking an XML object through its progress from development to review to approval.  And, it will provide a source of standardized formats for transmitting data. 

 

The XML Registry will include a query interface that will allow users such as system developers to access available resources (such as schemas and trading partner agreements)


through a central registry, in order to promote reuse and discourage development of disparate exchange formats.  The query and retrieval functions will include both a Web site to support human interactions with the XML Registry and an Application Programming Interface (API) that will enable automatic query and retrieval of objects from the Registry.  As the XML objects in the Registry will be linked to the related data elements and definitions, users will be able to query the Registry based on semantic content, assuring more efficient searching and effective query results.

 

The EPA-State Network XML Registry will include both a registry and a repository function.  A registry is a facility that stores relevant descriptive information (metadata) about registered objects, and makes that information available for discovery, understanding, and reuse.  A repository is a storage and retrieval facility for registered objects that can be retrieved.  Note that a registered object can be stored in the registry, in a repository connected to the XML Registry, or in another separate place since an XML object may be accessed through use of a Unique Identifier (UID) that references the object’s location.

 

A registered object is something that an organization wants to publish for discovery and retrieval.  Registered objects may include: XML tags (elements), enumerated value lists, XML schemas, XML schema components, XML datatypes, XML namespaces, XML documents, trading partner agreements, and administrative documents (submittal and approval documentation). 

 

Section 3.0 of this document provides an introduction to the standards that are applicable to the XML Registry design and operation.  At this time, no single standard describes a comprehensive XML Registry to manage the full array of objects needed to support standards-based XML.  The data standards that support the dataflows on the Network should be fully documented in the XML Registry, and the Registry should provide Web services to support business to business transactions.  The registry will need to include documentation for the data elements) referred to by the XML tags in schemas as well as the XML schemas themselves.  The registry will need to include data elements and their definitions to help manage the semantics (meaning) of data from the time of creation through all stages of processing, analysis, and use.  To meet the requirements of the Network, the XML Registry will need to be based on a combination of standards, including ISO/IEC 11179, OASIS/ebXML, and UDDI.

 

1.4       System Architecture

 

The XML Registry described in this requirements document would provide a single source of metadata for data elements, XML schema, and Web services to support the development of harmonized, standards-based data exchanges.  An architecture is needed that will support the entire Network enterprise.  It is envisioned that state and EPA programs will be developing schemas to define data exchanges on the Network and searching for and using Network schemas to format instance documents used in actual data exchanges.  Following is a list of issues to be considered in selecting an XML Registry Architecture. 

 

C           Availability and Reliability.  As it is envisioned that the XML Registry will support day-to-day Network operations by serving schemas for the validation of data in instance documents, the registry needs to be deployed on a robust platform.  The registry needs to be reliably available during business hours across the entire United States, which  will require selection of an architecture that can provide the needed availability.

 

C           Currency.  As it is envisioned that the XML Registry will serve as the source of standardized XML components and the system of record for current schemas in use, it is important that the data be kept current.

 

C           Information Sharing.  There is a requirement that the XML Registry serve as a forum for collaborative development of schemas, which means that the architecture needs to support sharing information about standard XML components, and provide a forum for discussion about schema under development.

 

C           Security.  Security of information in the XML Registry is required to ensure that the data not be altered due to intentional or unintentional actions.  Standard Internet security methods, such as secure sockets layer, will be required to protect both the data and the servers hosting the data.

 

Architectural options to be considered include: 

 

C           A single, centralized XML Registry.

C           A distributed network of XML Registries.

C           Multiple registries operating in a peer-to-peer network.

 

A single, centralized XML Registry could manage the information about all of the dataflows on the Network. 

 

The following describes the benefits of a single, centralized registry.  The single registry option can provide the greatest benefits for easing information sharing and maintaining current information.  A single registry allows the Network to reference one location for all standard XML components, thus improving ease of query and retrieval.  A single registry could provide a sole discussion forum about schema under development, thus engaging all potentially interested parties in harmonizing schemas.  The single registry provides the simplest solution for maintaining current information on schemas in use since it avoids the problem of duplicating or replicating data and maintaining data in different locations.  The registry is intended to provide data update services.  If data is updated in a variety of registries, extra effort is needed to copy updates to the various registries on the Network to maintain currency.  The single registry can also provide greater data security as it ensures that data and system integrity are overseen by a single operation. 

 

The drawbacks to a single registry include its possible failure during Network operations.  The single registry does represent a single point of failure, a situation that can be overcome by the architectural solution chosen for implementation.  A computer center can provide a backup, mirrored environment to ensure continuous operation.  A single, centralized registry may also be overloaded by Network operations requests, which can be overcome by providing adequate telecommunications and processing capacity to support demand.

 

A distributed network of XML registries could be managed separately by the various participating organizations.  One benefit of the distributed network is that it enables each participating organization to manage its own registry for its own XML components.  For example, a state environmental agency could have an XML registry on its node where it could manage XML components for use on the Network, as well as other state-specific XML components.  If each registry on the network maintained a copy of all of the Network XML components, this distributed architecture would provide an automatic backup system in the event that one registry fails to operate.  However, keeping multiple copies of the XML Registry current across multiple registries is a major endeavor that requires the resources needed to automatically propagate changes to all the registries to avoid a problem with data currency.  Also, the automatic propagation creates a potential for errors caused by collisions with other XML registry content.  The distributed registry will also make it more difficult to query and retrieve the standard XML components for reuse.  In addition, one of the goals of the Registry is to serve as a collaboration tool for coordinated development of harmonized schema.  At this time, harmonization is easier to facilitate through a single source of current information about XML components that are undergoing change with resulting changes in versions.  With multiple registries, sharing information across systems and tracking changes/versions becomes more difficult. 

 

The third option is a peer-to-peer architecture in which multiple registries are networked together and XML objects can link to data on other network servers.  In this model, data would be shared among the systems.  The intent of a peer-to-peer model is to allow registry participants to link to XML components on a number of servers, building on XML products provided by a number of Network participants, and maintained by those participants on their individual registries operating in a shared environment.  This model could distribute the responsibility and the cost of XML registry data maintenance among all participants.  Although peer-to-peer architecture does not require replication of data across participating servers, some data replication would be needed to avoid the availability issue presented by the potential downtime of a single, central registry.  The need for some data replication adds costs and raises potential error, just as with the distributed network of registries.  In addition, peer-to-peer architecture presents potential security problems to those participating in the Network.

                                                           

The Environmental Information Exchange XML Registry requirements include a registry service that provides the means for managing objects in a repository and a registry client that is used to access them.  To support lifecycle management and human querying, the registry services will be implemented using a public Web site with some functions restricted to registered users via authentication using Secure Sockets Layer (SSL).  A Web services API will support the automated business to business transactions.  For example, a search command across multiple sites that are part of a UDDI network would enable an organization to find and retrieve schemas in the XML Registry using keywords in a search (like water or waste).  In addition, one of the requirements is the need for the registry to communicate to other registries that may contribute to or download from the central registry. 

                                               

 

2.0       REFERENCES

 

Blueprint for a National Environmental Information Exchange Network, (Information Management Working Group) Network Blueprint team, October 30, 2000; document amended June 2001.

 

Cooperation between XML Registries and Related Registries, A Collaborative Effort between the XML Working Group and Federal and State Government Agencies, XML Working Group Task 2.2.3.2 Registry Standards Harmonization http://xml.gov/documents/completed/lbnl/20020417status.htm

 

DISA Registry Initiative, http://www.disa.org/drive/Registry_resources.html

 

DoD XML Registry, http://diides.ncr.disa.mil/xmlreg/user/index.cfm

 

ebXMLSoft, Inc., http://www.ebxmlsoft.com/index.html