Z39.50 Introduction, XML, RDF Ideas
From NISO and ZIG sessions at
San Antonio, ALA Meeting, January, 2000
John Robert Gardner
Resources
There is a raft of resources from which this report draws many of its specifics, and which also suffice to replace redundant repetitions thereof. I started with Duane Harbin's excellent summary from Diktuon (November, 1999) "An Overview of Z39.50" (see links from http://purl.org/CERTR/ under "articles") From there, a slightly more detailed nuance is available at
http://www.ariadne.ac.uk/issue21/z3950/#20
(reading time: 30-45 minutes)
orFrom there, you'll have a healthy grasp not only of basic conceptual terminology and framework, but also the "issues" with Z39.50. No standard is ever perfect: implicit in the notion of standard is accommodation of the range of differences it was designed to mediate--no standard is a solution, but rather, a good standard is an intersection for solutions.
Next you'll want to look at a bit more detail, such as implications for various library services, relevant software, etc. Go to:
http://www.biblio-tech.com/html/z39_50.html(reading time: 30 minutes)
Then it's time to get a peak under the hood, and nice detail-by-detail set of summaries (broken into palatable but still palpable sub-sections) is the continuation of the biblio-tech.com article:
(reading time: 60-80 minutes)
That will provide you with the basics to be conversant with the issues related to Z39.50 and enable you to evaluate solutions and proposals. For more links to resources, try William Moen's: http://www.unt.edu/wmoen/Z3950/BasicZReferences.htm
Generalities
Z39.50- Think of it as a sort of database query language-meets-"http" protocol-meets-search engine-meets-Esperanto. Z39.50 has a higher pedigree than http in that Z39.50 was born to serve information interchange specifically. By contrast, http is a "one size fits all" gateway which allows anyone and anything (almost) to pass through. Z39.50 requires information packets to have specific pedigrees, and so these packets are "smarter" than the average net traffic. In the mid-'80's, the OCLC and other such noble library resources wanted a way to standardize comparisons of library holdings records. After some work, the first Z39.50 standard came out in 1988, followed by a "fix" in 1992 (Version 2- what many U.S. institutions support), finally coming of information age in 1995 (Version 3- what most of Europe uses). It's not really a query language , but an attempt at a translation brokerage between query languages which depends wholly upon adoption for its success--hence the Esperanto analogy.
Think of a summit meeting between world leaders of different language cultures. The leaders talk to each other, but the exchange is "brokered" by the interpreters. Much as the interpreters must recognize that one-to-one correspondence between vocabularies is often not possible, Z39.50 recognizes that one-to-one equivalence between the resources it negotiates is frequently impossible (see below under issues).
Facilities
"Facilities" is the technical word for the 11 things Z39.50 covers or "does." Some of them are obvious, some less so, and not all are supported by all Z39.50 systems (see issues below). Here are the basic summaries of the Z39.50 facilities, and key feature potentials (based upon the information provided at http://www.biblio-tech.com/html/z39_50_part_2.html):
Facility | Notes |
1. Initialization setting up the Z-Association, negotiating levels of service |
Analogous to the handshake you hear when your mode hooks up to a dialup connection, this enables non-authorized users to know we're out here, and--see #5 below--confirm a resource discovery without giving away the record, or only giving a part of it (see #9 too). |
2. Search sending a search string at a database and getting back a result set and the first few records |
The basic act of searching performed from any OPAC, enables searches to hit on/have discoveries on other Z39.50 resources (e.g., CORC, etc.), and vice-versa (cf. # 5, #9) |
3. Retrieval Retrieval of records from the result set as specified by the Z-client |
Z39.50 enables a search to "broadcast" to the online sources of the world, and net only those resources which conform to information sciences specifications --in other words, Olivia Newton-John songs won't turn up in your search for Coleridge's "Xanadu," but you will get resources from as far as Timbuktu. Non-authorized users might receive information of a title of an article, or simply that there are resources there, as a prompt/marketing option to garner subscribers. |
4. Result-set-delete deleting a set of search results held on the Z-server |
Just what it says, and an added security bonus so searches are not left "open" |
5. Access Control allowing the Z-server to ask for passwords etc. |
If someone finds a resource, Z39.50 will enable it to be communicated that the MARC record--and or online article--is there, but intercepts automatically with a password call. This can enable varied levels of access, for instance, if desired. |
Accounting / Resource Control allowing accounting, credit control etc. |
Yes, you can even do per-use billing if you want to, as well as, "one-time free introductory accesses" -- but you have to program this in to your server |
7. Sort sorting a result set in a defined order on the Z-server |
what order you want the results in-- "sort by author, ascending alphabetical order, " or, "sort with Whitman project holdings first" |
8. Browse scanning an index on the Z-server |
This provides a starter-hint, of sorts, as a reference to what kinds of keywords/subject field words are available for searching -- a way to maximize the careful work with 6xx fields in MARC records. |
9. Extended Services allowing Z-client to start a "task packages" e.g. ILL on Z-server |
Think of this as sort of setting up little applications, or Applets which can be triggered for various user levels, perhaps even a "subscription" interface for online acquisition of new subscribers |
10. Explain allowing the Z-client to query a database of implementation details on the Z-server |
Newly-arising feature, previously largely unsupported, but available on free Zebra Z39.50 software, Z'mbol from the same group, and SIM/Structure Information Manager. Sort of like the way two translators introduce themselves to each other before brokering a state leaders' meeting. |
11. Termination Closes down a Z-Association. |
(self-evident) |
Z39.50: "under the hood"
Before we get into all the implementation issues, it's important to know just what sort of message Z39.50 is transmitting, and how it enables a communication/translation to take place. In the following section, I am indebted to the clearly-spoken NISO tutorial given by William E. Moen, University of North Texas, in San Antonio, January 18, 2000.
A series of "attributes" qualify a given Z39.50 query. Each query is an information "packet" with its own Object Identifier or "OID." These attributes are all numerical which enhances speed, but makes human readability a bit forbidding. Z39.50 has its own "namespace" or data-type identifying prefix. If you think of a Z39.50 target sitting with its ear to the virtual ground, when it hears the "hoofbeats" of this namespace, it picks this signal from the other data noise of the web:
Which looks like: 1.2.840.10003. That's the prefix on every Z39.50 OID. What comes after that specifies the kind of query and the actual human-readable content of the query. Next comes the indicator saying that this query packet has a Z39.50 attribute set (which has its own OID of 3) and that this will be a Bib-1 attribute set (the main attribute set, the defacto default attribute set, which has six sub-types), so a 1 is added. Now our OID is: 1.2.840.10003.3.1
All this just says what the packet is, where it comes from, and what language it's going to speak. Now, let's let it do some of that speaking. Everything that comes next explains the nature of the human-readable search query which comes at the end of the packet. There are six types of attributes in Bib-1 (see http://www.biblio-tech.com/html/z39_50_bib-1.html for more info, or ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt for a complete set with MARC equivalents):
Numbers 1-4 are pretty clear. For #2, Relation, this allows you to say, everything prior to 1975 by specifying "less than." A note on #5, truncation, this says is "run" the beginning of a word (do not truncate), or the whole word (right truncate). As to #6, this says, is "run" the entire title, or are other words allowed to also be in the title.
So, a search for a match (relation/2, equals/3) on Shakespeare as an author's (use/1, author/1003) last name (structure/4, lastname un-normalized/102), anywhere in the field (position/3, any/3), and just those letters (well, that's his whole last name, isn't it?, but what about getting rid of a hack naming themselves "Shakespearea"-- so truncation/5, right truncate/1), and allowing "William" to be included (completeness/6, incomplete/1)? It looks like this:
1.2.840.10003.3.1 (1,1003) (2,3) (3,3) (4,102) (5,1) (6,1) Shakespeare
So, that all seems pretty clear when peeked at under the hood, right? Well, no one's giving a test on this right now, but you get the idea. The problem arises when some institutions do some of Bib-1, but not other parts. What if I'm asking for a name un-normalized, as above (4,102), but your institution's Z39.50 OPAC only has phrase (4,1); word (4,2); and key word (4,3)--or worse, no structure attribute at all? What does it do with my (4,102) request? What if I don't specify a structure field, but your Z39.50 OPAC has this data?
This is why the same query to five Z39.50 OPACs often yields five different result sets. Some systems will "guess" at what you meant to include, or what they think the unsupported data is. Best practice recommends that an error be returned saying 'such and such is not supported here, please refine your query.' That is the current best practice.
More on this below, as these agreed thresholds--called "profiles"--which are usually disciplinarily focused and not unlike DTD's (Document Type Definitions in SGML and XML), suffer from many of the same problems as DTD's (everyone agreeing, DTD's/attribute sets that fit everyone's system or needs, etc.). Ironically, the same solution for the DTD problem which is surfacing in the XML world with XSchema's and RDF (Resource Description Framework), may ultimately resolve the implementation granularity/profiles issue (cf. Poul Henrik Jorgensen, phj@dbc.dk).
The trick that folks have worked out is that a bunch of OPACs get together and say, "okay, we're doing these attributes, and this many variables in each subset, so let's all agree to do it that way and keep it that way." This way, if you are in--say--Texas' library system, you know what everyone's using. If you are an OPAC in Europe's network, you know what they are using. This certainly widens the circle of interoperability, but does not solve the fundamental issue designed to be addressed in the specification itself by Explain. As long as you can get other folks to join your profile group, it improves. And certain obvious commonalties are always there, most anyone wants to search on an author name or title-- but the levels of detail below that can vary (e.g., a normalized lastname-firstname order, or an un-normalized one?).
So profiles reflect the fledgling beginnings of consensus about how to end-run the absence of the standards-based solution of Explain. Profiles are not ISO, but Explain is. Now that Explain is supported in software, the question of profiles becomes more a set of implementation guidelines than actual proscriptions for insuring interoperability.
There are several of these, including those listed below, as well as museum-specific sets (CIMI) and geo-spatial (GEO):
XML is seen most frequently in library implementations in the Dublin Core set of 15 basic elements (all of which are included in the Bath Profile, and are supported in Bib-1). It is interesting to note that Dublin Core, and the XML/RDF set of solutions arose from the Z39.50 community as part of the Warwick Framework of April, 1996.
As mentioned above under "Explain" in this section, there is a great deal of potential for XML to resolve many of these issues. The opportunity for information necessary to systems not supporting Explain to be routed through RDF structures (a series of empty tags, with namespaces for different institutions identifying which attributes and facilities their Z39.50 OPAC supports) could serve as a bridge to the developing future wherein Explain and the other rich facilities of Z39.50 are more widely adopted and deployed.
Dissertation | XML Home | Other | atman@vedavid.org |