SOFTWARE AND DATA REQUIREMENTS FOR THE XML
REGISTRY FOR THE EPA-STATE NATIONAL ENVIRONMENTAL INFORMATION EXCHANGE NETWORK
CONTRACT
NO. 68-W-99-002
TASK
ORDER No. 021
Prepared
for:
United
States Environmental Protection Agency
Office
of Environmental Information
1200
Pennsylvania Avenue, NW.
Washington,
DC 20460
Task
Order Project Officer:
Michael
Pendleton
Prepared
by:
Systems
Development Center
Science
Applications International Corporation
6565
Arlington Boulevard
Falls
Church, VA 22042
CONTENTS
EXECUTIVE SUMMARY...................................................................................................... ES-1
1.0
INTRODUCTION............................................................................................................... 1
1.1 Purpose.......................................................................................................................... 2
1.2 Scope............................................................................................................................ 2
1.3 System Overview........................................................................................................... 2
1.4 System Architecture........................................................................................................ 3
2.0
REFERENCES.................................................................................................................... 6
3.0
APPLICABLE STANDARDS............................................................................................ 7
3.1 OASIS/ebXML............................................................................................................. 8
3.2.............. International Organization for Standardization
(ISO)/International Electrotechnical
Commission (IEC) 11179-3:2000 Information
technology – Metadata Registry (MDR) - Part 3, Registry metamodel and basic
attributes................................................................................... 8
3.3 Universal Description, Discovery and
Integration (UDDI)................................................ 9
3.4 Assumptions about Applicable Standards...................................................................... 11
4.0
SOFTWARE REQUIREMENTS..................................................................................... 11
4.1 Roles and Role Management......................................................................................... 11
4.1.1
Registration Authority........................................................................................ 12
4.1.2 Registry Administrator................................................................................... 12
4.1.3 Responsible Organization............................................................................... 12
4.1.4 Submitting Organization................................................................................. 13
4.1.5 Registry Clients.............................................................................................. 13
4.2 Accessibility................................................................................................................. 13
4.3 Lifecycle Management.................................................................................................. 14
4.3.1 Registration................................................................................................... 14
4.3.1.1 Registered Objects. ...................................................................... 14
4.3.1.2 XML tags....................................................................................... 15
4.3.1.3 XML datatypes.............................................................................. 15
4.3.1.4
XML schemas (DETs).................................................................... 15
4.3.1.5 XML namespaces.......................................................................... 15
4.3.1.6 XML Trading Partner Agreements.................................................. 16
4.3.1.7 XML document.............................................................................. 16
4.3.1.8 WSDL document. ........................................................................ 16
4.3.1.9 Registration Process....................................................................... 16
4.3.2 Development Forum...................................................................................... 17
4.3.3 Classification................................................................................................. 18
4.3.4 Administration............................................................................................... 18
4.3.5 Version Control............................................................................................. 19
4.3.6 Object Status Management........................................................................... 20
4.3.7 Validation..................................................................................................... 20
4.3.8 Modifying Content........................................................................................ 21
4.3.9 Approving Objects....................................................................................... 21
4.3.10 Retiring Objects........................................................................................... 21
4.3.11 Removing Objects....................................................................................... 21
4.3.12 Quality Control and Error Handling.............................................................. 22
4.3.13 Audit Trail Maintenance............................................................................... 22
4.4 Query Management...................................................................................................... 22
4.4.1 Discovery/Query........................................................................................... 22
4.4.2 Retrieval........................................................................................................ 23
5.0
DATA REQUIREMENTS................................................................................................ 23
5.1 XML Objects and Metadata......................................................................................... 23
5.2 Data Requirements of the OASIS/ebXML RIM
version 2.0.......................................... 25
5.3 Data Requirements of the UDDI Specification
version 3.0............................................. 26
5.4 Data Requirements of the ISO/IEC 11179
Metamodel.................................................. 27
5.5 Data Requirements Summary........................................................................................ 28
6.0
INTEROPERABILITY REQUIREMENTS.................................................................... 30
6.1
Security and Privacy .................................................................................................... 30
6.2 Linkages....................................................................................................................... 30
7.0
CONCEPT OF OPERATIONS........................................................................................ 31
8.0
PRELIMINARY REGISTRY TOOL OPTIONS............................................................ 33
8.1 Background................................................................................................................. 33
8.2 Existing Online Registries............................................................................................. 34
8.3 Available Registry Software......................................................................................... 35
8.4 Commercially Available Tools...................................................................................... 36
8.5 Related Software......................................................................................................... 37
9.0
ACCEPTANCE REQUIREMENTS................................................................................ 38
EXHIBITS
Exhibit 1. Major Metadata Groupings............................................................................................ 24
Exhibit 2
Data Requirements
Summary......................................................................................... 29
APPENDIXES
Appendix
A Summary of XML Registry Software Requirements
Appendix
B Data Requirements for the OASIS/ebXML Registry Information Model
v. 2.0
Appendix
C Data Requirements for the UDDI Specification v. 3.0
Appendix
D Data Requirements for the ISO/IEC 11179 Part 3 Metamodel
Appendix
E XML Registry Requirements Glossary
EXECUTIVE
SUMMARY
“In the simplest sense, the benefits of XML will
be achieved only if organizations of a significant number are using the same
XML definitions. Therefore, these XML
definitions must be available for partners to discover and retrieve. A registry/repository is a mechanism used to
discover and retrieve documents, templates, and software (i.e., objects and
resources) over the Internet.” (http://xml.gov)
The Environmental Protection Agency (EPA) and
its state and tribal information trading partners have initiated collaborative
design and development of an Internet-based voluntary National Environmental
Information Exchange Network (Network) for state, federal, and Native American
Tribal environmental agencies. An eXtensible Markup Language
(XML) Registry is proposed as a component of the Network to serve as a
clearinghouse of Network related information, as well as to provide operational
support for implementation of the State and EPA nodes of the Network. In addition, the State-EPA Network XML
registry may become part of a larger federation of federal XML registries. The registry will support both human and
automated interactions supporting XML object registration, object status
tracking, as well as querying and retrieval for reuse.
The goal of the Network Steering Board is to
provide a vehicle for standardizing information exchanges to improve the
quality and consistency of the data, and to reduce the reporting burden on the
states and tribes. Therefore, the
Network dataflows should be based on data standards that are stored in the
Environmental Data Registry. To ensure
the greatest interoperability, the XML Registry should achieve the linkage
between data standard metadata and the XML schemas and related documents that
are based upon the approved data standards.
To support harmonization of dataflows on the Network, it is important
that approved XML schemas and the standard XML tags and other component parts
be available for discovery and reuse and reference in new XML schemas.
To achieve all of these goals, the proposed XML
Registry will be developed based upon three standards: Organization for the
Advancement of Structured Information Standards/Electronic Business using
eXtensible Markup Language (OASIS/ebXML), International Organization for
Standardization/International Electrotechnical Commission (ISO/IEC) 11179, and
Universal Description, Discovery and Integration Initiative (UDDI). The OASIS/ebXML standard will be used as a
source of specifications for basic XML registry functionality and services. ISO/IEC 11179 will be used as a source of
specifications for the storage of XML tags that are related to corresponding,
well-documented data elements, along
with associated enumerated value lists, and their linkage to other XML objects
(documents, trading partner agreements, datatypes). The UDDI specification will guide the registration and discovery
of Web services that are part of the Network.
This XML Registry Requirements Document will
serve to inform the decision about whether to acquire or build an XML Registry
to support the Network. The document
outlines applicable standards, surveys available tools, and describes
functional and data requirements needed to support the Network. Once initial decisions have been made on the
requirements, an analysis of available implementation options will be
developed.
1.0 INTRODUCTION
“In the simplest sense, the benefits of XML will
be achieved only if organizations of a significant number are using the same
XML definitions. Therefore, these XML
definitions must be available for partners to discover and retrieve. A registry/repository is a mechanism used to
discover and retrieve documents, templates, and software (i.e., objects and
resources) over the Internet.” (http://xml.gov)
EPA and its state and tribal information trading
partners have initiated collaborative design and development of an
Internet-based voluntary National Environmental Information Exchange Network
(Network) for state, federal, and Native American Tribal environmental
agencies.
According to the State/EPA Information
Management Workgroup, “a Network based on standardized Internet language will
allow individual agencies to invest in internal data storage systems of their
choice at a pace they can afford, while also supporting easy exchange of
environmental data between agencies.”
The Network will facilitate information exchanges between “nodes”
maintained individually by participating partners that will use the Internet to exchange information via standardized
eXtensible Markup Language (XML) Data Exchange Templates (DETs) or schemas.
[The term schema will be used in this report to refer to an XML document
designed for data exchange]. Schemas
will be based upon the approved data standards to bring better consistency and
quality to the data that trading partners exchange. Exchange of data between nodes will be governed by Trading
Partner Agreements (TPAs) between the partners. TPAs document the agreed upon data, exchange format, frequency of
exchange, security, and related issues.
One of the critical nodes on the Network will be
an XML Registry that will provide the capability to share information about XML
schemas approved for use on the Network, as well as information about schemas
under development. An XML Registry
contains registry entries that contain descriptive information, or metadata,
about registered XML objects. The
objects may be stored in the registry or in a related repository. The registry supports the submission and
registration of objects, administration of the objects, and makes the metadata
available for discovery, understanding, and reuse. This XML Registry will serve as a location for one-stop shopping
of selected information related to the Network, including both a “clearing
house” for information and “operational support” for Node implementation. It should not duplicate functions provided
on other Network Nodes.
As the information on the Network should be
based on data standards approved by the Environmental Data Standards Council
(EDSC), the XML Registry should be related to the Environmental Data Registry
(EDR) that contains metadata about standard data elements, associated
enumerated value domains, and data element groups. Data standards are "documented agreements on formats and
definitions of common data” that are established to bring better consistency
and quality to the information that organizations maintain. The EDR also registers application data
elements. Data trading partners may
also develop XML schemas for data they want to share. It should be possible to document the data elements (as specified
by XML tags in an XML schema) in the EDR, even though the data may not be
“standardized” through any formal process.
1.1 Purpose
This XML Registry Requirements document serves
to document the requirements for an XML Registry to support the EPA/State
Network. It is part of a series of
documents designed to inform the decision about how to provide an XML Registry to
support the Network. The document describes
applicable standards, surveys available tools, and describes functional and
data requirements needed to support the Network.
1.2 Scope
This document identifies functional and data
requirements of the XML Registry software, as well as necessary
interconnections to related applications.
This document does not include design specifications for the XML
Registry, as it may be used to inform a decision to purchase an available
registry solution rather than to build a new one. An options analysis will be addressed in a follow-on
document. The document also includes a
high-level concept of operations based upon current understanding of the
Network architecture. A more detailed
concept of operations may be developed after a registry solution is selected
and the architecture of the Network is more fully defined.
1.3 System Overview
An XML Registry is planned as part of the
Network to serve as a central location for XML objects and related
resources. The XML Registry will
provide a lifecycle management interface that will be a tool to manage XML
objects through their development and implementation lifecycle. This interface will be accessible to a
limited set of authorized users who will make use of the registration and
update functions to manage the metadata about the XML objects, including their
status, version, and organizational contacts.
It will provide a forum for exchange of information about XML objects under
development to promote harmonization and reuse of schemas. It will provide a means of tracking an XML
object through its progress from development to review to approval. And, it will provide a source of standardized
formats for transmitting data.
The XML Registry will include a query interface
that will allow users such as system developers to access available resources
(such as schemas and trading partner agreements)
through a central registry, in order to promote
reuse and discourage development of disparate exchange formats. The query and retrieval functions will
include both a Web site to support human interactions with the XML Registry and
an Application Programming Interface (API) that will enable automatic query and
retrieval of objects from the Registry.
As the XML objects in the Registry will be linked to the related data
elements and definitions, users will be able to query the Registry based on
semantic content, assuring more efficient searching and effective query
results.
The EPA-State Network XML Registry will include
both a registry and a repository function.
A registry is a facility that stores relevant descriptive information
(metadata) about registered objects, and makes that information available for
discovery, understanding, and reuse. A
repository is a storage and retrieval facility for registered objects that can
be retrieved. Note that a registered
object can be stored in the registry, in a repository connected to the XML
Registry, or in another separate place since an XML object may be accessed
through use of a Unique Identifier (UID) that references the object’s location.
A registered object is something that an
organization wants to publish for discovery and retrieval. Registered objects may include: XML tags
(elements), enumerated value lists, XML schemas, XML schema components, XML
datatypes, XML namespaces, XML documents, trading partner agreements, and
administrative documents (submittal and approval documentation).
Section 3.0 of this document provides an
introduction to the standards that are applicable to the XML Registry design
and operation. At this time, no single
standard describes a comprehensive XML Registry to manage the full array of
objects needed to support standards-based XML.
The data standards that support the dataflows on the Network should be
fully documented in the XML Registry, and the Registry should provide Web
services to support business to business transactions. The registry will need to include
documentation for the data elements) referred to by the XML tags in schemas as
well as the XML schemas themselves. The
registry will need to include data elements and their definitions to help
manage the semantics (meaning) of data from the time of creation through all
stages of processing, analysis, and use.
To meet the requirements of the Network, the XML Registry will need to
be based on a combination of standards, including ISO/IEC 11179, OASIS/ebXML,
and UDDI.
1.4 System Architecture
The XML Registry described in this requirements
document would provide a single source of metadata for data elements, XML
schema, and Web services to support the development of harmonized,
standards-based data exchanges. An
architecture is needed that will support the entire Network enterprise. It is envisioned that state and EPA programs
will be developing schemas to define data exchanges on the Network and
searching for and using Network schemas to format instance documents used in
actual data exchanges. Following is a list
of issues to be considered in selecting an XML Registry Architecture.
C Availability and Reliability. As it is envisioned that the XML Registry
will support day-to-day Network operations by serving schemas for the
validation of data in instance documents, the registry needs to be deployed on
a robust platform. The registry needs
to be reliably available during business hours across the entire United States,
which will require selection of an
architecture that can provide the needed availability.
C Currency. As it is envisioned that the XML Registry will serve as the
source of standardized XML components and the system of record for current
schemas in use, it is important that the data be kept current.
C Information Sharing. There is a requirement that the XML Registry
serve as a forum for collaborative development of schemas, which means that the
architecture needs to support sharing information about standard XML
components, and provide a forum for discussion about schema under development.
C Security. Security of information in the XML Registry is required to ensure
that the data not be altered due to intentional or unintentional actions. Standard Internet security methods, such as
secure sockets layer, will be required to protect both the data and the servers
hosting the data.
Architectural options to be considered
include:
C A single, centralized XML Registry.
C A distributed network of XML
Registries.
C Multiple registries operating in a
peer-to-peer network.
A single, centralized XML Registry could manage
the information about all of the dataflows on the Network.
The following describes the benefits of a
single, centralized registry. The
single registry option can provide the greatest benefits for easing information
sharing and maintaining current information.
A single registry allows the Network to reference one location for all
standard XML components, thus improving ease of query and retrieval. A single registry could provide a sole
discussion forum about schema under development, thus engaging all potentially
interested parties in harmonizing schemas.
The single registry provides the simplest solution for maintaining
current information on schemas in use since it avoids the problem of
duplicating or replicating data and maintaining data in different
locations. The registry is intended to
provide data update services. If data
is updated in a variety of registries, extra effort is needed to copy updates
to the various registries on the Network to maintain currency. The single registry can also provide greater
data security as it ensures that data and system integrity are overseen by a
single operation.
The drawbacks to a single registry include its
possible failure during Network operations.
The single registry does represent a single point of failure, a
situation that can be overcome by the architectural solution chosen for
implementation. A computer center can
provide a backup, mirrored environment to ensure continuous operation. A single, centralized registry may also be
overloaded by Network operations requests, which can be overcome by providing
adequate telecommunications and processing capacity to support demand.
A distributed network of XML registries could be
managed separately by the various participating organizations. One benefit of the distributed network is
that it enables each participating organization to manage its own registry for
its own XML components. For example, a
state environmental agency could have an XML registry on its node where it
could manage XML components for use on the Network, as well as other
state-specific XML components. If each
registry on the network maintained a copy of all of the Network XML components,
this distributed architecture would provide an automatic backup system in the
event that one registry fails to operate.
However, keeping multiple copies of the XML Registry current across
multiple registries is a major endeavor that requires the resources needed to
automatically propagate changes to all the registries to avoid a problem with
data currency. Also, the automatic
propagation creates a potential for errors caused by collisions with other XML
registry content. The distributed
registry will also make it more difficult to query and retrieve the standard
XML components for reuse. In addition,
one of the goals of the Registry is to serve as a collaboration tool for
coordinated development of harmonized schema.
At this time, harmonization is easier to facilitate through a single
source of current information about XML components that are undergoing change
with resulting changes in versions.
With multiple registries, sharing information across systems and
tracking changes/versions becomes more difficult.
The third option is a peer-to-peer architecture
in which multiple registries are networked together and XML objects can link to
data on other network servers. In this
model, data would be shared among the systems.
The intent of a peer-to-peer model is to allow registry participants to
link to XML components on a number of servers, building on XML products
provided by a number of Network participants, and maintained by those
participants on their individual registries operating in a shared
environment. This model could
distribute the responsibility and the cost of XML registry data maintenance
among all participants. Although
peer-to-peer architecture does not require replication of data across
participating servers, some data replication would be needed to avoid the
availability issue presented by the potential downtime of a single, central
registry. The need for some data
replication adds costs and raises potential error, just as with the distributed
network of registries. In addition,
peer-to-peer architecture presents potential security problems to those
participating in the Network.
The Environmental Information Exchange XML
Registry requirements include a registry service that provides the means for
managing objects in a repository and a registry client that is used to access
them. To support lifecycle management
and human querying, the registry services will be implemented using a public
Web site with some functions restricted to registered users via authentication
using Secure Sockets Layer (SSL). A Web
services API will support the automated business to business transactions. For example, a search command across
multiple sites that are part of a UDDI network would enable an organization to
find and retrieve schemas in the XML Registry using keywords in a search (like
water or waste). In addition, one of
the requirements is the need for the registry to communicate to other
registries that may contribute to or download from the central registry.
2.0 REFERENCES
Blueprint
for a National Environmental Information Exchange Network, (Information Management Working Group) Network
Blueprint team, October 30, 2000; document amended June 2001.
Cooperation
between XML Registries and Related Registries, A Collaborative Effort between
the XML Working Group and Federal and State Government Agencies, XML Working Group Task 2.2.3.2 Registry
Standards Harmonization http://xml.gov/documents/completed/lbnl/20020417status.htm
DISA Registry Initiative, http://www.disa.org/drive/Registry_resources.html
DoD XML Registry, http://diides.ncr.disa.mil/xmlreg/user/index.cfm
ebXMLSoft, Inc., http://www.ebxmlsoft.com/index.html