Federal XML Working Group

 

Wednesday, September 10, 2003 Meeting Minutes

 

State Plaza Hotel

2117 E Street, N.W

Washington DC 20037

 

Please send all comments or corrections to these minutes to Glenn Little at glittle@lmi.org.

 

Mr. Owen Ambur:  I think most everyone here today knows each other but for the benefit of Norm Walsh, who’s on the telecon, let’s go ahead and introduce ourselves. I’m Owen Ambur, chairman of the Working Group, and I have some brief announcements:

 

1.      First, Lee Ellis has become the GSA co-chair of this group. We’re looking into becoming a Community of Practice, as opposed to a Working Group as in the past.  I think most of you know that Karen Evans has been named to the position at OMB [U.S. Office of Management and Budget, http://www.whitehouse.gov/omb/] that Mark Forman vacated [Administrator for IT and E-Government]. That’s of particular interest to us because when she was named vice chair of the CIO Council last year, she issued a statement of vision for the Council that expressly referenced XML.  So if any of you have not seen it, I have a copy you can look at.  [Editor’s note:  It is available online at http://www.cio.gov/documents/karen_memo_12_17_02.pdf ]

2.      I think most of you know about the XML Authoring and Editing Tool Forum on September 29, which we are cosponsoring in cooperation with the DC XML Users Group and Booz Allen Hamilton. Though the program was not posted yet, as of yesterday 120 folks have already registered and the facility only holds 150.  So if you have any colleagues who may be interested, I encourage you to have them register as soon as possible via the link in the “What’s New” section of the XML.gov home page [http://xml.gov/index.asp#new].  [Editor’s note: The forum has been moved to the Key Bridge Marriott.]

3.      Last, on the 22nd of September , I’m working with Martin Smith at the Department of Homeland Security to schedule an expert panel discussion on XML metadata high-level design issues.  A group of experts are being invited to discuss high-level issues that agencies should have in mind as they begin to implement XML.  It’s scheduled to be held at GSA.  I don’t know the size of the room, but when I agreed to have the XML Working Group cosponsor it, I insisted that a large enough facility be secured to accommodate some of our stakeholders.  [Editor’s note:  This event was moved to FEMA.]

 

[Introductions]

 

Mr. Ambur:  Alright, Norm [Walsh], I know you’re on the line. Is there anyone else?

 

[Brand Niemann and Marc Le Maitre introduced themselves.]

 

Mr. Ambur:  Marc, do you have anything new to tell us about XRI [OASIS Extensible Resource Identifier TC, http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xri] and identity theft?

 

Mr. Le Maitre:  Not at this time.

 

Mr. Ambur:  Well, if that’s all the introctions and announcements…

 

Mr. John Kane:  What’s Marion’s [Marion Royal of GSA] position to be, with this committee, if he’s no longer co-chair?

 

Mr. Ambur:  Lee Ellis is here. Lee, would you like to take a moment to shed some light on that?

 

Mr. Lee Ellis:  My name is Lee Ellis. I’m the new co-chair. I was appointed by GSA ‘Electronic Government’ to replace Marion in this position. We’ll be getting a new website up soon—in a week or two. It’ll be set up to be a mirror image of the current one, but will reflect that we’re a user group not associated with the CIO Council. We’re now linked to the [GSA] Office of Electronic Government and Technology. They’ll do the support we’re accustomed to. I’ve accepted the appointment that will replace Marion, but he’ll still work and collaborate with us.  The reason for this is that we view it as a new chapter.

 

Two contract RFPs [Requests for Proposal] were just let this past week:

1.      For a consolidated component registry, intended also to introduce a collaborative environment to develop and register components, schemas, documents, and other online endeavors. The registry is intended to grow and evolve as the technology progresses. It’s intended to establish it as a platform for federal government work.

  1. The second one was, we let a contract for an E-initiatives platform called BGeForms [Business Gateway e-Forms Project], co-sponsored with the Small Business Administration, to be a gateway for communication with small business. It will include a platform for registration and for XML-enabled electronic forms, all using a Tamino native XML database. It’s viewed as an opportunity for agencies to fulfill GPEA requirements. If any agencies have a problem, they need to speak up now. We’ll institute a way to get forms in, to hold them in a registry and be filled out, and in the future, to be auto-processed to the agency where they belong. In the future, under other contracts, as we grow, it will evolve into a full E-Form XML business gateway; a collaborative environment for XML, for sharing components, for records management. It should fit into everything.

 

Mr. Ambur: That’s exciting stuff -- because one of the reasons this group was formed in 2000 was to pursue the potential to render all .gov forms in XML and gather the data from them in XML.  So it’s very exciting for me, personally, to see that objective coming to fruition.

 

Mr. Ellis: Ken Sall is working with GSA and SiloSmashers to do schemas for e-travel—getting e-authentication online. There are six now?

 

Mr. Ken Sall:  When I last tracked them. Some are on the Fenestra website [http://www.fenestra.com/eforms/], as well as WebServices.gov [http://web-services.gov/]. 

 

Mr. Ambur:  There’s a link to that site from ‘What’s Important’ section on the xml.gov home page. [http://xml.gov/index.asp#important]

 

Mr. Ellis:  We also have an electronic tools form we’ll integrate into it. This is “proprietary agnostic,” so it will accommodate all the technologies that are there, so we’ve had Sun [Microsystems, http://www.sun.com/index.xml] and various vendors work with us to try to refine these architectural component problems and achieve resolution. We’ve tried to mitigate as many problems as possible, and technical situations that preclude any one vendor or portion of the citizenry. We also want to make it 508 [Section 508 of the Rehabilitation Act] compliant.

 

Mr. Kane:  Is Marion still the Contracting Officer for projects he’s been involved in until now?

 

Mr. Ellis:  I work with Marion. He’s usually the Program Manager, and I’m the Operations Manager.

 

Mr. Kane:  How does this consolidated registry contract fit in with the Yellow Dragon NIST contract?

 

Mr. Ellis:  We have a memo of understanding to coordinate with them. This is the second level of that. We had to open the bid to everyone. Instead of a pilot, this is the due portion of what we’ve been meeting for over the last two years—putting something in place. It’s not a pilot; it’s the operation of the E-Gov Consolidation Registry.

 

Mr. Kane:  Is Yellow Dragon still a pilot?

 

Mr. Ellis:  Still a pilot. As we evolve this, we can still do as much as possible with the pilot registry.

 

Mr. Kane:  If we have a schema on the pilot, should we move it somewhere else?

 

Mr. Ellis:  Yes.

 

Mr. Houser:  The definition of “component”…are we using the FEA [Federal Enterprise Architecture, [http://www.feapmo.gov/] definition of ‘component?’

 

Mr. Ellis:  Yes. I also work with the Architectural Component subcommittee, in which Marion is still active. That would be the link. Until recently, we haven’t knuckled down to “What is the component?’ We’ve broken them down to two arenas: technical and service components. For instance, Pay.gov would be considered a service component. A technical component would be more schema-based, or something you can drop into your code. We’ll try to have a registry where you can break these down into categories of usage.

 

Mr. Houser:  Maybe service components that are complex, like designation of benefits based upon income verification. Is that in your scope?

 

Mr. Ellis:  It’s probably not refined at this point down to that level. The first portion sets the platform as a collaboration zone. It’ll allow us to communicate as the entire federal government, and to share…let you log into your particular group or someone else’s group as a visitor or general user. We’re going incrementally at first to set the environment, set the registries, using the pilots to build a platform and build a working business model that will encompass the whole federal government, but be usable for your particular section.

 

One of the problems we tried to overcome by utilizing a collaboration zone is also to utilize roles and responsibilities, because some key management positions don’t have technical expertise that developers would utilize. So, we have those from super-technical to the managerial side, so as we evolve, the platform will evolve as well. One of the reasons why E-Forms was chosen was because on one contract, they have 170 business cases at OMB for E-Forms for the federal government. If we only have 28 executive agencies and 80 small bureaus, that means that some agencies are putting in two and three business cases for forms, so we’re combining them into one consolidated effort.

 

Mr. Houser:  Could you send this out specifically to the list [XML Working Group listserve] so I can forward it to my forms people?

 

Mr. Ellis:  Sure.

 

Mr. Kane:  What’s the timeline for implementation?

 

Mr. Ellis:  We hope to have the collaboration zone operating by the end of the calendar year.

 

Mr. Houser:  In the forms solutions, what vendors are participating?

 

Mr. Ellis:  All the major forms players are involved in some fashion. Only five vendors actually came in on the original platform. The first portion was to set up the gateway—I think it was Software AG, SETA, SAIC…I’m not sure whether Mitretek was one.

 

Mr. Houser:  What you’re talking about are integrators and developers. I’m talking about forms vendors like Adobe and Microsoft.

 

Mr. Ellis:  I don’t think any of them were participating solely.

 

Mr. Ambur:  The intent is to enable any eForms software vendor to interact with the XML registry to obtain the necessary XML schemas and implement them in their products, so that folks are free to use any software tool they wish to provide the valid XML instance documents required to do business with any government agency.

 

Mr. Houser:  Until you corral them, you’ll have lots of random e-forms initiatives persistent.

 

Mr. Ellis:  And that’s great. I applaud that effort. That’s why we try to start off by vendor-agnostic. To mitigate those problems, we came up with setting up a collaborative environment as far as the component registry. On the business gateway, we’re setting up a portal effort, because there are so many legacy systems. You can’t replace them all at once. You can’t tell people to use only one contractor. So the way to mitigate is to set up a business gateway, business routing, places to start—so people don’t have to use efforts aimed at their particular agencies, but can piggy-back on others. Keep your costs down, open it up to citizenry, business, and government, and make it open.

 

Mr. Houser:  Owen, you can join me on this, because you were at the Semantic Web seminar on Monday. The big obstacle in data administration is getting everyone to agree on terms and metadata. One of the benefits of semantic technologies is that they let people use their own ontologies. We can use the semantic technologies to concord the separate ontologies. I wonder if GSA is looking at that?

 

Mr. Ellis:  It’s a huge backbone of the effort. Agencies can have their own forms. We had to have a platform to let agencies control the configuration management of their arenas. We also wanted to set up a platform where they could find out more. We also wanted it Web-based, because not everyone is doing their thing from within the federal agency itself. People are in remote locations, universities, etc. Eventually we want to get into e-authentication, because not all people use electronic signatures. We wanted it Web-based, because of the disparity of time zones, military people…all of those were huge factors. So we’re setting the base to accomplish those goals. That’s why the original portions were portal efforts and collaboration zones—because agencies don’t want to give up control.

 

Mr. Ambur:  But collaboration services in conjunction with the XML registry will facilitate the creation of Communities of Practice who need to work together to share data.  So this is good; almost too good to be true.

 

Mr. Ellis:  That’s where this room will come into play: be facilitators to agencies, say “There’s stuff out there to explore,” make it win-win. The project is very big. OMB is 100% behind us. There’s no problem with funding at this juncture. We need to take the best logical steps, one-by-one, to build on the infrastructure. We’ll start out with the FEA model, and the XML Working Group, and build from there.

 

Mr. Morgan:  Do you imagine this group focusing on particular agencies that are eager to start, and taking on their work as pilots, and focusing on early implementation? As the system comes online in phases, we can be there.

 

Mr. Ellis:  Exactly. We decided this was to be a 4-phase project. We’ll exercise those based on how fast we can force-feed it to everyone. We have OMB’s backing. We looked at the Clinger-Cohen Act…cease-and-desist things that are old technology. If you have to be GPEA- and 508-compliant, and you have to get things organized in an enterprise architecture environment throughout the federal government, pretty soon you’ll not have a lot of options, because OMB is already looking at turning off some of the older systems that are not compliant. We already know we’ll miss GPEA compliance this year. We need to make the best effort for next year.

 

Mr. Sall:  Could you see this group—similar to what Roy is saying—could you see this group bringing use cases to the deployment? Not just specific forms and needs, but use cases in agencies?

 

Mr. Ellis:  Absolutely.

 

Mr. Ambur:  I’d say we should focus on stuff that works.

 

Mr. Ellis:  We’re looking at stuff that works. We’ll measure our efforts on what does work. We’ve done a lot of planning for several years. Now it’s time to do it in the “do world,” not the bureaucrat world.

 

Mr. Ambur:  That’s exciting stuff.  I appreciate your sharing it with us.  I didn’t know whether you were ready to discuss it.

 

Mr. Ellis:  It’s the tip of the iceberg.

 

Mr. Ambur:  I appreciate it.

 

Mr. Houser:  Please send out an email to round up the troops on this.

 

Mr. Ambur:  The sooner people know about it, the better.

 

Mr. Sall:  Is the Statement of Work on FedBizOpps [Federal Business Opportunities government procurement website, http://www.fedbizopps.gov/]?

 

Mr. Ellis:  Not yet.

 

Mr. Ambur:  We’re a little behind schedule, but the information that Lee had to share with us is very good, and probably more important than the other topics on our agenda anyway.

 

[Additional introductions]

 

Mr. Ambur:  With that, Ken and Norm, I’ll turn it over to you.

 

Mr. Sall:  We can’t get the Internet portion to work, but we do have your slides up here.

 

Mr. Walsh:  That’s not a problem.

 


Mr. Norm Walsh
Sun Microsystems

XML, RDF, RSS, and XSLT: A Mixture of Technologies

 

Mr. Walsh:  I’m sorry I can’t be there in person. I appreciate Owen’s invitation to speak about RSS and XML technologies. If I could get to D.C., I would have been there in person. Thanks, Ken, for being my fingers.

 

Slide 2  [About Norm]:  I’m an XML Standards Engineer at Sun. That’s a high-falootin’ way to say I work on lot of standards groups in a lot of organizations to make sure that core XML technologies come together in the most useful way in the world at large. I also work in Java standards and tools as well.

 

Slide 3  [About This Presentation]:  One of the things I point to is the W3C [World Wide Web Consortium, http://www.w3.org/] Technical Architecture Group and the Electronic group. They’re trying to put together the architecture of World Wide Web documents that describe how the pieces of the Web technologies fit together. At the moment, they’re supposed to have three things on how to identify things:

  1. How do you represent information in RDF [Resource Description Framework, http://www.w3.org/RDF/], etc.? Then, how do you reach them when you’ve found them? We hope to get the first draft out this year. The others are

2.      Semantic Web issues, and

  1. Web Services.

 

That’s not obvious in the Web document.

 

Slide 4  [Goals]:  We need to look at lots of interesting questions, in working with the 11 very intelligent people doing this. That’s why I built the website “nwalsh.com” that attracted Owen’s or Ken’s interest, or some number of you thought it might be interesting to have me talk about it.

 

Mr. Ambur:  Norm, the topic of our discussion later in this meeting will be the emerging technology lifecycle management process.  I have suggested we should use XML and related open standards technology to support the process on a widely distributed basis, so that we can more effectively collaborate with vendors as well as our .gov colleagues to come more quickly, efficiently, and effectively to an understanding of the merits of proposed emerging technology components.  So that’s the background on why we were interested in scheduling your talk—because you’re already doing it, by using technologies like XML, RDF, XSLT, and RSS.

 

Mr. Walsh:  I’m not sure how much exposure members of your Working Group have with emerging technology or XML technologies, so I’ll give a high-level overview. I’ll go over it on the next slide, and how they complement each other, but not go into it in so much detail that you’ll be bored out of your skull by my ramblings. So I’ll give an overview, and then open up the floor. In case you want me to describe how these relate to problems you have, I’d be happy to do that.

 

Slide 5  [Web Site as Information System Microcosm]:  So the high level is building an information system. For you, we’re talking about electronic government. I try something much smaller—the playground—but we have many of the same problems. First, you have content. Then you have “when it was created, who created it, who has rights, when do they expire, what are the related tools?” Any of these attributes you can add to say things about it. We need navigation tools, etc. Once people have found the content, you have to deliver it to them. They may want it in other formats; they may want Braille, other colors, etc. If you manage a large repository, how do you let people subscribe to your site and get things that they want to look at? Then, how do you update, change, or remove content?

Mr. Sall:  Exactly. What you’re talking about with content delivery, metadata, notification—these are all aspects of the repository Lee Ellis is talking about.

 

Mr. Walsh:  Terrific. So I talk about some of your issues. I’m not going to cover all the topics today—only three.

 

Slide 6  [Content = XML]:  The content, I’m assuming, is XML. I’m not going to spend a lot of time talking about it. I’m sure it’s familiar to you. Just two points, because it’s different for RDF, and I want to highlight the distinctions:

1.      First, XML is a tree structure. For any XML document, you get one tree structure, and for any tree structure, you get one XML document. There are variations, but the documents on the trees are more or less similar to each other.

2.      The other point is that XML documents stand by themselves. If I give you two and say, “Please merge them,” it doesn’t have any particular meaning. There’s no mechanical way of saying “Every third element has a certain attribute.” It means they’re basically islands. If you want to index or address them, or write a process, you have to go over each in turn.

 

Slide 7  [Metadata = RDF]:  The metadata format that a lot of people are using (and the one that it seems as if people are going to use) is RDF. It’s unlike XML. It’s a graph, not a tree.  It has multiple parents and multiple siblings.  If you want to write it as XML on disk, there are any number of ways to do it. There’s no one-to-one correspondence between stuff on disk and RDF.  Also, if you want to merge two graphs, there’s no confusion about what it means. You just have a bigger graph, with all the nodes and edges. If you want to write a process for RDF, you can write one for that one graph. 

 

Mr. Sall:  With the tree-versus-graph distinction, are you saying that in the XML case, you have explicit document order of how you traverse elements in documents, but with RDF it’s more like the traditional computer sense of trees that have multiple children, so there are multiple traverse orders, like pre and post? Is that what you’re talking about?

 

Mr. Walsh:  Not quite. It’s closer to what you said first. You have obvious document order in a tree. You know where you are by what’s seen and not seen. RDF is not like that. It’s just a bag of stuff. I have a slide later where I talk more about each of the technologies.

 

Mr. Houser:  More like the difference between a book and a map?

 

Mr. Walsh:  Yes. A book is like an XML document; there’s a beginning, middle, and end. RDF is more like a map; you can go anywhere from where you start.

 

Slide 8  [Notification = RSS]:  It’s an XML vocabulary; a format designed to address the idea of notification. The only points are, unlike RDF and XML, there’s no standards organization working on RSS. It’s a grass-roots development process, and it has its own community of people working on it, but the point is, use it anyway. It’s simple enough that whatever comes from the grass-roots process is easy to adapt to. There’s no need to get concerned about the lack of a standards organization on it. If you want a good example of a success story, the Simple API for XML [SAX, http://www.saxproject.org/] was grass-roots development. It’s almost universally used.

 

Mr. Ambur:  Why hasn’t RSS been taken to a voluntary consensus standards organization?  Commonly, software vendors have no incentive to collaborate quickly to develop standards because we, their customers, aren’t smart enough to insist upon it.  If we keep buying proprietary stovepipe systems, vendors will be more than happy to sell them to us, in the hope of locking us into doing business with them, and no one else.

 

Mr. Walsh:  When the thing started, they needed a syndication format. They came out with a version with a vocabulary of 10 or 15 elements, and they didn’t need a standards organization. Now they’re working on the next generation. It was never big enough for the critical mass to take it to a standards organization. It’s possible that the work now will bring all these flavors together as one. It might go to the W3C or somewhere else, but the process is now working well enough that collaboration hasn’t seemed necessary.

 

Mr. Ambur:  I will be very interested to see if we can capitalize on RSS in the emerging technology process, because at least in the early stages of considering emerging technology components you don’t need to manage all of the information about all of them in one place. Instead, we should syndicate and access relevant information wherever it exists on the Web.

 

Mr. Walsh:  Because for people with business models that need aggregation, it’s a natural choice for them.

 

Mr. Sall:  Can you comment on how the meaning of the acronym changes?

 

Mr. Walsh:  It was originally a rich site summary language; then people decided they could use it to publish summaries. There was a business model to be made to publish those and aggregate, so they started using it as a syndication format. After a while, they thought of…

 

Mr. Ambur:  Really simple syndication.

 

Mr. Walsh:  Thank you. No one frets too much about what it means. The work to define the next generation is struggling to come up with a name. I’m a fan of using the technology, and I’m not worrying about the details of what it’s called.

 

Mr. Sall:  Their own website describes RDF as “summary” now. It’s curious that they have that acronym.

 

Mr. Walsh:  Actually, RDF is one of the RSS flavors. I’ll talk about why in a bit.

 

The central point is that it’s a simple way to publish notifications, and let people subscribe and get things when they need them. It has value. Rather than invent your own, you might use the flavor of RSS you like best.

 

Slide 9  [Access/Delivery/Update]:  Access and delivery and update are interesting problems. I’d be happy to discuss it if you want to some other time, or you can send me an email, but I couldn’t put it all in today’s talk, so that’s the last on those.

 

[Slide bullets were the following:

Access =    Metadata as content

Delivery = Sending what the user wants (XML, HTML, PDF, etc.). (There are some 
                   interesting architectural and technical issues here.)

Update =    Adding content, providing for feedback]

 

Slide 10  [RDF Concepts]:  I’ll talk a little about what RDF is, because everyone knows what XML and XML documents look like and stand for. It was developed by the W3C as part of the Semantic Web activity, which it actually predates. It’s the same product base. It’s really a framework for representing metadata statements. By that, I mean a piece of content that is authored by a person, or “has these access rights,” or “is related to this organization.” RDF is simple. All it has is three parts: subject, predicate, and object (value for the predicate). It’s either a simple value, like “Name” or “Date,” or another subject also on the graph. Whatever else you take away from this, and about how complex you’ve heard it was, or issues about what XML format to use to represent things, the important thing is, it’s just a collection of simple tables or statements called “triples.”

 

Slide 11  [RDF Statements]:  Here are some informal examples of RDF statements. Each has a subject, a verb, and an object on the other end. If you download the presentation, the appendix at the end has the list in proper RDF. This is informal, not proper. All the RDF statements look like this. The only other thing is that the first statement is true. The third statement says I’ve created this document. Note that I appear as the subject in the first sentence and the object in the third. Any questions on what RDF is?

 

Mr. Ambur:  The way you have explained it is pretty clear.

 

Slide 12  [Why RDF?]:  So why use RDF? Two things: it extends easily. It’s easy to add new subjects and predicates. Now you have a distributed Working Group. If you adopt the RDF framework as a way to do metadata, then an agency such as the IRS can have its subject and predicate for its purposes, the EPA can have the same for its; each domain can define it for itself, and can establish common tools that work with all the vocabularies. It’s very easy to combine their data later on. If, later, the IRS and EPA need to bring their metadata together, because they’re graphs, it’s very easy to throw that metadata together and apply the same set of tools. It’s a very big win. So down the road on what the Semantic Web is working on, because statements are so simple, it’s possible to write tools that view logical input from a statement.

 

You can correlate separate metadata statements together. The nice thing about writing you own vocabulary is, suppose the IRS and EPA define different authors. When you combine the material later and you know they mean the same concept, you can add metadata that says they mean the same thing. Your tool will know they mean the same thing. So that’s where the promise of RDF is. It’s only in the last 18 months that there have been tools freely available that allow you to solve the problems. I’ll be honest—I didn’t use it for many years. Now, in the last 18 months, the tools are there. I’m not a convert to the Semantic Web vision, but RDF allows you to use them easily. That’s all it takes to impress me.

 

Mr. Houser:  Would you compare it to Dublin Core [Dublin Core Metadata Initiative, http://dublincore.org/] or Topic Maps [XTM (XML Topic Maps), http://www.topicmaps.org/]?

 

Mr. Walsh:  Dublin Core is an example of an RDF vocabulary, for example, of predicates you’d use. It’s very widely used. If you’re talking about document authorship and description, it makes sense to look at Dublin Core vocabulary. There’s still a benefit in choosing the same names you can. Topic Maps are a different thing. RDF is all about representing metadata. It does it with these triples. Topic Maps do also, but it’s a competing standard—ISO [International Organization for Standardization, http://www.iso.ch/iso/en/ISOOnline.openerpage] standard. It has a slightly different vision of what metadata is like, but both communities agree that they’re talking about the same problem underneath. I’ve been to several conferences about building a framework to unify them. I started out using RDF. If I had been in a different community, I might have been using Topic Maps.

 

Mr. Houser:  We’re considering a metadata standard for our community. Which should we choose as the basis for that policy?

 

Mr. Walsh:  Do you mean RDF or Topic Maps?

 

Mr. Houser:  Or Dublin Core.

 

Mr. Walsh:  Dublin Core is an example of RDF, so if you’re going to RDF, yes, you should do Dublin Core. I was trying to say at the end that I’m using RDF, but I’ve not given Topic Maps a fair shake. It makes sense to look at RDF and Topic Maps as well, if you’re thinking of going in that direction. Choosing Dublin Core or another predicate library that you know about is also a good idea.

 

Mr. Houser:  So there’s no straight transformation between the two?

 

Mr. Walsh:  I’ve seen papers on how to map one to the other, but there’s some disagreement. I think in the next few years there will be tools to map, but now they’re slightly different ways to look at metadata.

 

Mr. Ambur:  Walt, when you say “we’re looking at policy,” are you talking about the Veterans Administration?

 

Mr. Houser:  Yes, the Veterans Administration.

 

Mr. Ambur:  It’s my understanding that NARA had proposed a policy for the management of Web records. Does it address the metadata requirements?

 

Mr. Houser:  It doesn’t address that.  Norm, can you put OWL [Web Ontology Language, http://www.w3.org/TR/owl-ref/] in the context of this presentation?

 

Mr. Walsh:  OWL is Web Ontology Language. RDF is a collection of statements, but a collection isn’t a language. There’s not an obvious relationship between statements. They might have the same subject and predicate, but they’re just in a bag. OWL is an effort to define standard semantic statements for the languages—the range of subjects and predicates—so OWL is a natural extension of RDF, designed to help advance the concepts of the Semantic Web.

 

Mr. Ambur:  Another thing I’ll mention in this context is that the FirstGov folks at GSA are working on a content model for FirstGov. I talked to Dana Hallman about briefng this group on their content model when she’s ready to talk about it.

 

Mr. Sall:  The conference we went to on the Semantic Web—their website has a nice paper. It talks about the pros and cons…

 

Ms. Elizabeth Fong:  Can you give us that [website URL]?

 

Mr. Sall:  Not offhand, but I can send it to the list [TopQuadrant, http://www.topquadrant.com/].

 

Mr. Brand Niemann:  It’s on the web-services.gov [http://web-services.gov/] site for September 8. The link’s there.

 

Slide 13  [RSS]:  RSS is an XML vocabulary, schema, or DTD that defines elements for summarizing content—saying, “We wrote this, having this link, and having an abstract.” Some flavors let you have RSS in metadata content. That’s very popular for publishing what’s new and available about websites. There are end-user tools for aggregation, and some business models for aggregation. There are websites, like syndicate.com and userland.com, that are aggregating RSS flavors together. There are dozens, so you can go to one place and get all these RSS feeds.

 

Slide 14  [RSS Viewer]:  Here’s an example of the RSS viewer I use. This is what I mean by aggregation. On the left panel are RSS feeds. The highlighted one is about Norman Walsh. On the right side, you see a list about my most recent articles. You get the abstract of that article, a subcategory to “Daily Quotation.” There are a bunch of things that are hidden. You could add a reference to any website you want, so you can imagine a future where all agencies are getting XML online. They could use RSS to publish what’s new, and individual people in agencies that need to keep track have a tool like this RSS viewer to go out and grab those documents, and as new things occur, they’re highlighted. It’s an easy to way to subscribe to a “what’s new” website, for example.

 

Slide 15  [Other RSS Applications]:  So RSS is used for website updates. I recently saw someone propose that magazines and journals publish their indexes this way. A perfect example of how you can take this is, I publish my daily schedule. Every day my computer publishes an RSS feed of things I need to do in the next seven days, so I get this new notification in my RSS viewer, so I’m using it for a different sort of notification, but it gives the flavor of using it.

 

Mr. Ambur:  Norm, are you familiar with the RDF calendar initiative?

 

Mr. Walsh:  I have a slightly different set of representations for content in my Palm Pilot. They’re not yet harmonized, but putting them in there makes a lot of sense.

 

Mr. Ambur:  I first heard about it a year or two ago. I’m looking for a better understanding of how and when we might be able to use it on the XML.gov site.

 

Mr. Walsh:  I could imagine when that occurs in the RDF format, so people pick it up. The other thing is, they require a critical mass of people using them before they’ll take off. Initially it didn’t look like it was going anywhere. Now there are 400,000 users, so it looks like it’s taking off. As people take advantage of it, it encourages more people to use it.

 

Mr. Ambur:  I know of a specific example of a use case, where a publisher was co-sponsoring an XML event and wanted me to establish a link to it.  They had the event listed on their site but it was buried in a lengthy list along with other events of no particular interest to my stakeholders.  I wasn’t willing to link to their site from the XML.gov site and force my stakeholders to scroll to look for it.  I wanted to be able to point directly to it.

 

Mr. Walsh:  One thing—I’m doing Extreme Markup Language, or www.extreme.org. They had work I didn’t have. Between RDF and my Palm, it was actually the first work I did with RDF.

 

Slide 16  [XSLT]:  XSLT is a language to transform from one language to another. Some people thing it’s odd. I’m not inclined to, since I’m on the group that developed it. Why I used it now was to transform documents into XML. I’d be surprised if some of you weren’t using it. It’s used for formatting objects, then putting them into PDF for printing. It’s used for summaries, and then it’s also used for metadata extraction. Someone was talking about an initiative for metadata content on the website. One of the ways …I want content on my site in RDF for navigation, but I don’t want to maintain it in two places, so lots of it comes out of updates I publish there. I auto-extract the RDF.

 

If you’re working with XML and have to transform from one flavor to another, I recommend you look at XSLT. Michael Kay had the definitive book—also Jenny Peniston. At the end of my presentation, I have a recommendation page.

 

Slide 17  [XSLT and RDF: Oil and Water?]:  One complication of XSLT you run into if you’re using RDF: I can’t really explain it, but I’ll try to describe it. Remember, RDF is a graph. XSLT is a tree translation language. Graphs are not trees, so if you apply XSLT to RDF, you bump into problems of getting a tree view of RDF to do XSLT over it. I’ve written RDFTwig [http://rdftwig.sourceforge.net/] for an XSLT process that I use. That I’ll talk about later on, but you’re going to need to plan for some pain to get XSLT to process RDF. But it’s worth the pain, because it’s a phenomenal way to represent metadata.

 

Mr. Houser:  Would you have to pick “Point A” and “Point B,” and proceed across the graph to come up with the serialization?

 

Mr. Walsh:  That’s one of the things you have to do. One way is to always use the same tools for serialization. The RDFTwig technique is one way to build the tree: do Point A, then come back and do Point B. It allows you to dynamically do it.

 

Mr. Houser:  Why do XSLT on RDF?

 

Mr. Walsh:  The next slide will make that clear. I have ….I don’t know what your projector is like. Can you read the picture?

 

Slide 18  [Putting the Pieces Together]:  I have XML in the far left side. Those are actual essays I write for my website…I want to produce RSS, PDF, and an HTML version to publish on the website. Because I’m extracting the metadata, when it comes time to produce, for example, the HTML version, I need access to the document itself and the metadata as well. That’s why I have to apply XSLT to the RDF, because I’m already using XSLT to fix the transformation problem for which I already have the answer. So I’ll talk a little about what the diagram says, to give you a flavor of how the pieces fit together.

 

We start with the XML essay. We make an XML document with the title, author, and content. We want to add the metadata to the metadata for my site, so the first step is to process the essay with XSLT to produce the metadata. That’s new. I also have additional metadata from other essays, and some I track by hand. I put that through cwm [http://www.w3.org/2000/10/swap/doc/cwm.html]. In the middle, I have RDF. That’s the meat for my entire site…every essay, and the hierarchies, all the navigational information. I use RDFTwig to extract the meat I need for that, along with the original XML, and using those, I can build the HTML and PDF versions, and the updated RSS feed. It has all the articles I wrote in the last 30 days in reverse chronological order, built only from the RDF.

 

Why have metadata built by hand? One of the things we were keenly aware of when we built the site was, I did not know what I was going to write about. I wanted the information in topic hierarchy. I didn’t want information about what topic each essay was on in the essay itself, because later on I would have had to edit every essay that had topic information on it in order to add a branch. So I maintained it separately. Because it’s in RDF, it merges easily with RDF coming out of the essays for a unified view of the data sources.

 

Mr. Ambur:  I like this picture a lot, because it depicts what I think some people have posed as a false choice—namely, that you have to have either internal or external metadata.  I think the real answer is that we need to have to have both internal and external metadata.  It is important to have at least some metadata embedded directly within each record, but it will never be possible or feasible to anticipate and embed in any record all the metadata that may ever be appropriate to associate with it.  Thus, the architecture you are depicting is clearly the right answer.  The question I have is whether there are scalability issues associated with it.  For example, would it make sense for governmentwide use, as opposed to on a single, relatively small site like yours?

 

Mr. Walsh:  I’m absolutely concerned that it has scalability issues. The culprit is the type of cwm tool. It’s actually doing a bunch of work. If I gave it a million essays, it would fail. That doesn’t say it’s a bad design or information flow, but the evil to avoid is duplicate information, so if you have metadata, you want to use it directly, not maintain, for example, the title of the article in two places. You have topic hierarchies that are not represented in one place, but you don’t’ want to maintain the article in two places. The issue of scalability is not one I’ve addressed yet. It is doable. You would want to ask the people who are developing to consider scalability. It’s possible to augment the RDF without rebuilding the whole graph.

 

Slide 19  [Successes]:  So what have we learned? What works well? RDF lets you aggregate metadata. It has been phenomenally valuable. It lets me ignore the fact that they come from different sources. It allows me to modify one file with hierarchy in it. It’s a success story for deriving content from metadata.

 

One of the questions that came up was the way some people are doing markup in RSS.  I was asked to write an opinion piece for XML.com. I did, and it was published. I realized that I wanted people to know that I wrote that, but out of respect for XML.com, I couldn’t republish it on my site. It wouldn’t be fair to copy it to my site. I was thinking about it for five seconds when I realized that all the navigation on my site is metadata. All I had to do to make it appear to be on my site was construct five or six lines of metadata and put it in, and now there’s a summary in my RSS feed. It fits neatly into my site, and the fact that it’s published on a different site is more or less irrelevant. I was delighted when I realized I get that win out of the situation.

 

Slide 20  [Failures]:  You saw that picture. It was fairly complicated. There were lots of arrows, circles…it means there are lots of pieces fitting together. My guess is that, for the size and scale of your pieces, it’s not complicated. Compared to mine, it’s more complicated.

 

Mr. Ambur:  I think the issue relates to your second bullet. If each component can scale massively, then your depiction is not that complicated. We are supposed to think in terms of a component-based architecture, so if we have logically separated functions, then it makes perfect sense.  The question is whether the components, individually and collectively, can scale massively to meet the needs of government as a whole, or whether it will be necessary to have smaller, stovepiped applications of that duplicate components, despite the additional maintenance overhead and costs.

 

Mr. Walsh:  If you asked me to make it for EGov, then I’d have to think hard about scalability problems. For myself, if I wanted to do attachments, I would find a replacement for cwm. I spoke to Tim Berners-Lee about it. He admits it’s not the fast way. It’s the easy way. There are other engines that do the same thing, that are non-quadratic, so just replacing that single component might make all my problems go away for a while. I’m sure with your systems, you need to look at scalability for all components.

 

Slide 21  [Conclusions]:  So what conclusions can we draw?  XML is a big win. If there were any other conclusion, you’d have to be surprised. Storing metadata separately has its advantages. RDF is very useful. I’m happy I’m using it. It satisfies my most important test, which is it allows me to solve problems easily. RSS keeps people up-to-speed; I check my sites several times every day. I’m trying to keep RSS behind what’s published. XSLT is the obvious choice for a tool for transformations with a little work, and it can work with RDF as well.

 

That’s the end of my presentation. The next slide has some references. The appendix is three slides I mentioned before. It’s a more technical look of the example of RDF from earlier. It’s too technical for here, but it provides the whole story.

 

Mr. Kane:  The Adobe XMP [Extensible Metadata Platform, http://www.adobe.com/products/xmp/main.html] is RDF compliant. They’re talking about moving it to XML. Is that true—are they moving to XML, or will it always be RDF?

 

Mr. Walsh:  I can’t speak for Adobe, but the XSD [XML Schema Definition Tool] stuff they’re vetting looks like XML.

 

Mr. Kane:  It can be, but I don’t think it has to be yet. I was wondering, if Adobe had plans to make it such, would it indicate that they’re embracing XML more than they have?

 

Mr. Walsh:  I have no information from which to comment. It clearly can be XML. I would feel bad if I had a group of people who said, “It must be.” All the metadata for images in the date tag [on Mr. Walsh’s material] you can embed in XMP and JPEG. I happen to use a different method.

 

Mr. Ambur:  Adobe plans to be at the September 29 XML tool forum. They registered a little late, so I’m not sure whether they made it onto the agenda or not but, hopefully, they or one of their partners can answer the question. We’re a little over our time, go let’s go right into the break.  When we come back, I’ll give a brief presentation on what we’re going to do next.

 

Break

 

 

Owen Ambur
U.S. Fish & Wildlife Service
XML Working Group Co-chair

Emerging Technology:

Managing the IT Innovation LifeCycle: XSD for Stage 1 - Identification

 

Mr. Ambur:  Shall we get started again? I’m gong to whip through my presentation pretty quickly, since we have a small group and I think people here have a pretty good understanding of what we’re up to. We’re well behind schedule, but for good reason, because we got good information this morning, including some serendipitous information that I didn’t know whether Lee could share with us.  I think you all know what I’m up to, so I’ll whip through it.

 

Mr. Houser:  I’m not as sure…

 

Mr. Ambur:

 

Slide 2  [Context]:  The Emerging Technology subcommittee of the CIO Council has been asked to develop a process whereby the emerging technology [ET] lifecycle can be better managed. The driving force is the inability of .gov decision makers to deal with all the information coming at them, particularly from proponents of new and emerging technologies, which by definition are not well understood yet.  The ET Subcommittee has been tasked to develop this process in order to provide for more efficient and effective communication than we currently have, not only for the benefit of .gov folks but for vendors as well.  The suggestion I made early-on is to structure this communication in a way that takes advantage of open standards like XML, while recognizing the driving and shaping force of the FEA and the electronic government initiatives. First and foremost, we’re not doing this just as an academic exercise; we’re doing it to acquire ET components that work and make sense for use by government agencies.

 

Slide 3  [Principles and Assumptions (Stephen Covey, et al.)]:  The objective ultimately is to acquire and use technology more effectively than we currently are.  I suggested the target of this process is a fully completed OMB Circular A-11, Exhibit 300 for components to be acquired and used, if not government-wide, then at least by more than one government agency.  We can’t bite off the whole process and deal with it all at once; we have to deal with a component at a time, a manageable chunk at a time. We should take advantage of whatever information already exists, wherever it exists, but we should be clear about how we can understand what folks have to tell us about their proposed ET components.

 

Slide 4  [Principles and Assumptions (W3C, OASIS, et al.)]:  So XML has an important role to play—including something that Norm briefed us on—structuring meaning for more efficient and effective communication. We should practice what we preach with respect to component-based architecture, by taking one step at a time.  Indeed, this is not a new concept.  We can think back several years ago to Raines’ Rules.  Former director Raines of OMB instructed us to pursue investments in small chunks, each of which adds value in its own right. We should adhere to that principle in constructing our own process.  [Editor’s note:  Raines’ Rules are summarized at http://users.erols.com/ambur/itmra.htm  See especially Rule 7.]

 

Slide 5  [Today’s Objective]:  Today’s exercise is to agree substantially on the elements of the first stage of the pocess.  I suggest that the first data component of the ET process should be an XML schema to help us identify proposed components. The aim is to deliver this draft schema to the ET subcommittee at its meeting next week.  I’m hoping the schema will contain good semantics for the elements and that they’ll be in fairly good form.  I’m not aiming for perfection.  Also, this group is not a decision-making body.  Our role is to propose, draft, and advise.

 

Slide 6  [Key Issues]:  Some key issues I hope to address are:

 

1.      Best names for elements, so they clearly convey what we mean.  The semantics depend on deciding about the context.

  1. I propose eight elements for stage one. There might be one or two of those elements that we don’t need in the first stage.  Or we may need more.  However, I do believe we should stick to a small and manageable number for the first stage.

3.      A point that Ken [Sall] made, that was helpful in thinking this through—I was thinking the name of each element itself should be fully descriptive—I’ll say more about that in a minute.

  1. With respect to proper form, we need to make sure we’re not violating the XML Developer’s Guide, and that we’re adhering to any other relevant practices.

 

Slide 8  [Semantics - Just FEA?]:  With respect to semantics, should we just focus just on the context of the FEA, or should we bear in mind a broader context, like the World Wide Web?  For example, under the first sub-bullet, should we use the FEA TRM as a controlled vocabulary? Or should we think more broadly in terms of technical standards that may not yet be recognizied in the TRM?   With respect to the concept of emerging technology, [at this point, Mr. Ambur displayed slide 7 - ‘Semantics - Just ET?’], should we craft the schema broadly enough to encompass all types of technology, or should we focus it more narrowly on information technology?  I think it is a given that emerging technology is only emerging for part of its lifecycle and that at some point in its maturity ET becomes simply IT.  And assumptions like this affect how the elements of the process should be named.  I’ll have more specifics on the elements later. With respect to context, should we just focus on the FEA, which is all OMB and CIOs care about, even though proposed ET components may not fit neatly into the FEA?  It may be unrealistic for us to expect proponents of proposed ET components to plug into the controlled vocabulary in the FEA right away. We should think in the larger context of how to enable such components to be brought into and matured in the process.

 

Mr. Houser:  Is there a controlled vocabulary in the TRM [Technical Reference Model, http://feapmo.gov/featrm2.asp] or FEA?

 

Mr. Ambur:  They’ve identified specific terms for standards recognized in the TRM.  It’s a subset of all the standards in the world. With respect to ET, should we force proponents to limit their proposals to those standards?  I would prefer that we reference an external, more comprehensive listing of standards than just the FEA TRM.  I know, for example, that NIST has expressed concerned about the TRM with respect to where we start and stop such a listing of standards officially recognized by the federal government.

 

Mr. Houser:  They also have considerable experience in where to start and stop.

 

Mr. Ambur:  I don’t want to prematurely enforce structure on folks who want to bring good ideas. The process should move into more structure, but at least in the first stage, we should ask whether it is appropriate to try to force proposed ET components into the TRM or whether a larger, less well controlled vocabulary might be more applicable.  So those are the kinds of issues we’re looking for advice on.

 

Slide 9  [Other Semantics]:  In terms of context—with the end in mind—when I first thought about it, I thought of taking a subset of elements from Exhibit 300 and just using them in the ET process.  However, the schema for Exhibit 300 is not in conformance with the XML Developer’s Guide and it is also focused on projects -- which may involve many different components -- so it’s not appropriate to use ‘out-of-the-box’ if you will.  However, it is important that we map the elements of the ET process into Exhibit 300, while not being hung up on using the specific element names, which may be inapplicable in the early stages of the ET process.

 

Slide 10  [Semantics - Broader Context]:  On this slide, I tried to pick up the point Ken made about parent and child elements; so if we make “Technology” the root element and “Information” a child element, then we can use “ComponentName,” “ComponentDescription,” and “ComponentType,” and it’s clear what they mean in the context. Anything else on that, Ken?

 

Mr. Sall:  It comes down to intent. I’m for using this. Some would say you need full, expansive names to make them totally reusable by themselves, but my comment was that in this context, you gain a lot by doing it this way. It’s not totally inconsistent with what UBL [OASIS Universal Business Language, [http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl] does. UBL does data modeling with ISO 11179 [Specification and Standardization of Data Elements], and if I recall, most end up being a string anyway, so it’s when you have a constrained data type that we gain the most usefulness out of the reuse.

 

Slide 11  [Data Types & Constraints]:  If you look at the XSD for the OMB 300, most elements that map to these are OMB rich text data types. I don’t understand the benefit of rich text, and I don’t think, in the initial stages of the ET process, that we should constrain people to use OMB-specified terms if we’re reaching out to the world to come to us.  I do think that for “ComponentName” and “ComponentDescription” we should impose length constraints, because I propose that those two elements be indexed on the ET.gov site, so people can briefly scan through the component name and description to see if it may match their interest.  I propose that the other elements of the schema be used to provide different, selective views of the listings of the components -- so people could look, for example, only at those components that incorporate standards of interest to them.  Initially, I proposed that we use an element name DataTypeOrModel, because the schema for Exhibit 300 includes an element called dataTypesUtilized.  However, they don’t use that term in the same sense that database designers and XML developers do.  The examples OMB provides include natural resources and health data, so there may be a better term to use in naming that element.  The only one of the eight elements I am proposing that doesn’t have a direct analog in the schema for Exhibit 300 is one I call ‘ComponentType,’ for which I am proposing the controlled vocabulary should be “hardware, software, or data.”  I don’t think the Component Subcommittee or GSA is thinking that way. They seem to be thinking only of Web Services components.  That may make sense for E-Gov in general but it does not make sense in the early stages of the ET process.  For example, if you’re with Conformative Systems [http://www.conformative.com/] and you have a hardware device, do we say, “We don’t want to talk to you?”  Particularly in the early stages of the ET process, we need to be more flexible. When our draft schema gets to the ET Subcommittee or to the Components Subcommittee, they may say “No, that’s not the way we want to do it” but, from my perspective, it makes sense.

 

Mr. Sall:  Would you consider adding elements, like ‘Service,’ or ‘BusinessProcess?’

 

Mr. Ambur:  I think that’s more in line with the Service Component Reference Model [SRM, http://www.feapmo.gov/feaSrm2.asp].

 

Mr. Sall:  I was thinking of the Component subcommittee. When they talk about components, they include those things. They include a number of things—more than we normally think of when we talk about components. They specifically talk about service, business process, software, data models…

 

Mr. Ambur:  To me, it’s two different things, and should be reflected in two different elements.  The reason I put a question mark next to “ServiceType” is because I’m not sure it’s really relevant to the first stage, the identification stage of emerging technology.  I’m not averse to making it an optional element in the first stage, but I wouldn’t want to make it mandatory, because some components may not fit neatly into the categories of the SRM.  Likewise with respect to the TRM [Technical Reference Model, http://www.feapmo.gov/feaTrm2.asp] there may not yet be any standard established that is applicable to an emerging technology component.  And we should not preclude proponents of other models from advancing them.

 

Mr. Bruce Cox:  It may be useful to say, “In the context of the FEA, this is where it fits, but in a broader sense, here’s where it fits.” And maybe also for ‘ServiceType,’ so maybe we need two elements, and show in a larger context that “This is their position.”

 

Mr. Ambur:  I think that might happen in the second or third stage of the process.  Initially, it seems to me proponents should be free to identify any service or technical classification system they choose, but in the next stage and particularly in the third stage, where the ET subcommittee accepts stewardship, then they have to fit more specifically into the FEA.

 

Slide 12  [Other Overriding Issues?]:  We want to keep the scope narrow, to deliver something to the ET subcommittee, because we’re not the decision-makers anyway. We just put a draft on the table and try to stimulate action.  In that regard, it is noteworthy that the first draft was put together back in March by Jonathan Smith of Booz Allen, who was then under contract to provide support to the ET Subcommittee.  Since then, nothing further has really been done.  Part of Norm Lorentz and John Gilligan decided to eliminate the AIC’s working groups was because they were viewed as a distraction from the deliverables, but in fact what we are trying to do is deliver the schemas required to support the ET process, which no one else is planning to do.  It’s a “Catch 22” but we should work through it and deliver a draft schema to the ET Subcommittee next week, so at least if they say, “That’s wrong,” there’ll be some burden on them to say what’s right, and not just continue to engage in high-level theoretical discussions that don’t lead to any actionable steps.

 

That’s my quick spiel.  Ken, you have XML Spy to display the draft XML schema for review and editing…

 

Mr. Sall:  Sure.

 

Mr. Ambur:  Ken took the proposed elements and put them into XSD form. I understand that XML Spy is not the best for live editing, but maybe we can show what it looks like and go from there.  In the handout I gave you, the tabulation of the elements, if you see where it says “Derived from” at the top, the link in the online version points to the March draft that Jonathan Smith prepared.  Jonathan initially identified the elements of Exhibit 300 he thought were applicable to the ET process.  Then Kevin Phelps, who staffs the ET Subcommittee for Mark Day, added some additional value.  Then I took those elements and mapped them to elements of the XSD that Susie Adams of Microsoft compiled under contract to OMB. Those elements are in the first column of the table. That column also includes the data types OMB has specified for those elements.

 

Mr. Sall:  Are you looking at the first stage?

 

Mr. Ambur:  I think you can go right to the draft of the XSD. Matthew [McKennirey], you’re not on the line are you? [No response.] Matthew made one suggestion regarding the WebAddress element -- that we use the Data Type “anyURI”.

 

In the middle column are my first cuts at the elements I believe should be contained in the schema for the first stage of the process.  Three of them are ComponentName, ComponentDescription, and WebAddress.  ComponentName and Compo