A Pilot for Evaluating a Possible Procedure for Acceptance of

Extensible Markup Language (XML)

Electronic Theses and Dissertations (ETD's) at the University of Iowa
(original pilot model for 1997-1998)

Table of Contents

(short list)

As this is a pilot and no final policy has been set, there is still a great deal to be discussed and developed. However, the folks who have helped up to this point deserve credit for their efforts and contributions.

A multitude of people deserve thanks for helping us get this far: the University's ITS and library staff at the University has provided unparalleled help, most notably Webmaster Chris Pruess, and Library ITS director Paul Soderdahl, Graduate College computer specialist Patty Strabala, also Phil Potter of ITS regarding training. Additionally, technical advice from Peter Flynn, Neill Kipp, John Simpson, and James Tauber from the XML-List e-mail group.

Administratively, thanks are owed to the Graduate Deans Leslie B. Sims, John Keller, William Welburn, and Sandra Barkan. Procedural assistance has been provided by Graduate Examiner Caren Cox and her staff. Technical help and quick solutions come from Graduate College Program Assistant Patty Strabala.


Traditionally, theses and dissertations have consisted of written texts produced and bound in book or document form and archived and available through the library. Recognizing the limitations of written text to capture and represent artistic accomplishment and creation, the University of Iowa in the 1940's became the first university to accept the artistic product (painting, musical or choreographic composition, poetry, fictional or play writing, etc.) with a brief textual description of the creative project and how it was achieved, in lieu of a traditional text-based thesis for students in the creative and performing arts. This ultimately resulted in the development of the Master of Fine Arts (MFA) degree as the preferred degree for these areas of study.

Electronic Theses and Dissertations (ETD's)

Increasingly, scholars and students find the required and allowed text-based thesis or dissertation insufficient to fully document and represent their work. It is now possible using standard computer software to produce documents that include, for example, enhanced graphics, sound, and animation. Such documents also provide features that benefit the user; e.g., search procedures which are much more detailed than traditional indices and which allow, for example, searches for combinations of words, phrases or symbols within a document.

The Electronic Thesis and Dissertation (ETD) movement began less than a decade ago and has been led by two institutions: Virginia Tech (under a FIPSE grant awarded to their Graduate College and Library) and University Microfilms International (UMI). A few other universities now allow options for students to provide electronic materials either in lieu of or, more commonly, in addition to or supplemental to text-based theses and dissertations.

University of Iowa students and faculty are among those on the forefront of utilizing new technologies for teaching, research, and creative endeavors. The University has received targeted funding to promote these efforts, through such programs as n-TITLE and the CIC Learning Technology Initiative; it also chose a focussed NCA Accreditation in the area of technology-based learning. As a result of these initiatives, the University is conducting a pilot in order to provide options for graduate students to submit electronic or digital materials in lieu of or supplemental to text-based theses or dissertations.

Why XML? - Format, Archiving, and Institutional Memory

XML promises a great deal, but it is still very young, only a handful of sites even use it, and no major manufacturer has released a full-blown browser to access its various capabilities (Microsoft has with Internet Explorer 5.0).

This pilot has been designed to take advantage of how eXtensible Markup Language (XML) answers multiple issues confronting academic institutions considering electronic thesis/dissertation (ETD) projects. One of the foremost issues is preservation and archiving in the inevitably changing arena of digital technology. Flat data (e.g., for example, ASCII text, plain old DOS typing, or 'just the regular letters of the keyboard without special characters') is easiest to preserve and "migrate" or convert, to later innovations in formats or technology without significant risk of data loss.

Unfortunately--and sometimes infuriatingly--XML's current fledgling status has left options for display and usability impossible or inordinately limited. There are some higher-end programs for generating and managing XML documents, but these cost from $500 to $1,000's, placing it out of reach of most higher education institutions and certainly out of reach of most graduate students' budgets. Further, it is not practical to ask graduate students and their professors in the crucial stages of dissertation/thesis writing to install layers of software and trial versions on computers just so their work can be reviewed for the defense. Most current implementations of XML require a plethora of post-processing procedures and as many as 3-5 additional files, all of which simply produce HTML!! This is, of course, not an issue for corporations and firms with a full technical staff to implement these advanced kinds of documents, but it does not fit the budget or limited personnel of most public universities.

Fortunately a fundamental design principle in the minds of the authors of the XML standard was that it would be easy for individual programmers to create software compatible with it. This has provided a host of affordable--or even free--XML word-processors for almost any computer platform. In addition, companies such as Corel (e.g., WordPerfect) and Infrastructures-4-Information are building software for XML with the familiar look and feel of mainstream wordprocessors. The remaining challenge was how to display the XML ETD's.

A major consideration is that these documents must both stand the test of time, as well as permit a reasonable representation in print output for those occasions where scholars, publishers, and job search committees still require hardcopy presentations. Our test approach is to work with an HTML-base of tags for format, add Text Encoding Initiative tags for content specificity, and augment this with a handful of University of Iowa-specific tags for the front matter (e.g., certificate of approval, etc.). Here's a sample DTD from the Iowa project.

The students are working with software for Windows, Mac, or Unix which is either free or less than $100. In addition, we've made simple changes to our server to allow non-xml browsers to read our xml demo's (click here to see online files of a mock-up XML ETD which is augmented by an illustrated instruction manual with starting information for how to create one). A more detailed escription of the software and procedures is included below.

XML is Flat Data for ETD Longevity and Access

Flat data is essential to the integrity of library, information, and archival considerations, and has been canonized as the fundamental reliable format by key organizations in the information management world. It meets multiple criteria:

Based on these considerations, our pilot builds upon the existing standard of the Text Encoding for Initiative (TEI, lite version) of information designed for the particulars of academic needs for preservation and access to electronic data. The TEI standard is an implementation of the internation standard (of which hypertext, or HTML, is a part) called Standard General Markup Language (SGML), ISO 8879. All three systems can be reviewed at the World Wide Web Consortium site for electronic information resources. For more detailed annotated information on XML, SGML, and TEI, see the OASIS site and the XML.com site. For software resources, see James Tauber's site.

Institutional Solutions: Pilot Program for XML ETD's at the University of Iowa

The Graduate College is currently conducting a pilot to test and assess a possible procedure for electronic submission of theses and dissertations for the 1999 Spring or Summer sessions in electronic format.

Among the guiding principles in this pilot, is the facilitation of the smoothest possible transition from the era of hardcopy and HTML to the era of digital dissertations and XML. However, we are also trying to accomodate two groups of faculty members: those who have embraced the web and internet uses in higher education and those who have been waiting for more standardizatin and user-friendliness. Accordingly, the corresponding design of the eXtensible Markup Language (XML) Document Type Definition (DTD) for Electronic Theses and Dissertations (ETD)--called "Thesis and Dissertation Markup" (click here for an annotated version of the DTD "tdm.dtd"- the phonetic pun with "tedium" was suggested by our Library ITS director, Paul Soderdahl)--allows the materials of the ETD to be readily displayed on browsers backward-compatible as far as Netscape and Internet Explorer 2.0+, without the need of any additional plug-ins or software for ease of access and user friendliness. For all intents and purposes, it works like HTML which has wide support and the chances of a student or professor getting help with it is much more likely than --currently-- would be the case with pure XML devoid of more familiar metatag sets.

We have initiated the acceptance of ETD's in this pilot evaluation phase as a voluntary option for graduates. However, they must first attain written permission from each member of their thesis/dissertation committee before they can participate. Part of that permission also entails that the commitee, in signing off on an approved ETD, has determined that the rhetorical and disciplinary academic value of the ETD would not be compromised if links to outside web sites were to someday be invalid (due to moved or closed sites). Any material from other sites deemed indispensable must be included with the submitted ETD along with appropriate permissions from the site's creator.

It remains to determine a range of issues, including costs to the University ITS resources, interface for accessing finished ETD's, options for students to test-post their ETD's online for their professors, and the various permissions for posting the ETD (e.g., if the ETD contains patent-sensitive data and so must be witheld for a period of time before public posting). These issues and more are being considered in policy review by a committee of ITS, Library, and other key administrators from the Graduate College and the campus.

A Summary of the Current Process Being Tested
with Links to Online XML (no special software required)

A test mock-ETD in XML according to the University of Iowa's Thesis and Dissertation Markup document type (tdm.dtd) doubles as a manual for how to use the software and convert existing work into XML is available online. It includes instructions for the Thesis Examiner procedure, use of XML editing software, and several demo examples of code.

In short, students are presented with a mechanism to convert their existing work from Rich Text Format to XML, using a program freely available from TetraSix called Majix. Subsequently, a variety of software packages can be used for creating and editing XML. For the MacIntosh, a program from Media Design In-progress, called Emile, takes a document type definition (such as our Thesis and Dissertation Markup/tdm.dtd) and generates a console of buttons which writes the tag codes for the students, prompting them for required input values. For Windows, we've used XED, or IBM's Xeena, as well as beta-testing of Corel's WordPerfect 9.0. Both Emile and WordPerfect will retail to educational buyers for under $100, while XED and Xeena are free. For UNIX, there is a version of Xeena as well, in addition to a major mode of Emacs for XML

The deposit mechanism entails a simple transfer from a student LAN account to a read-only restricted-access directory accessible only by Graduate College Examiner staff for both first and final deposits. Assurance of timeliness is simplified by a time-cut-off of access to the deposit directory which closes deposit access at 5:00pm on deposit day.

The Graduate Examiner would then perform 4 checks on the ETD. These are outlined in the online mock-up. A combination of freeware and a batch file by Graduate College LAN administrator Patty Strabala checks that only the correct formats are submitted, and that the tdm.dtd file is included so the ETD is "stand-alone" complete. Next, it is essential to ascertain integrity of all links and that all necessary media and graphic files are included, various packages can do this. Then the ETD is tested for strict conformity to the document type Thesis and Dissertation Markup using the very user-friendly validation tool in Corel's WordPerfect 9.0. Finally, using a simple internet browser, the checkers test for basic formatting not unlike the traditional hard-copy tests-- that chapter headings are rightly capitalized, that titles in the table of contents match those in the text, etc. Failure in any one of these tests, for any file, fails the ETD as a whole. This strictness is intended to insure the integrity, completeness, and longevity of the ETD and its working links or components.

After the Graduate Examiner completes all tests of a given semester's theses and dissertations--ETD and hardcopies alike--the University of Iowa Main Library ITS staff would install those which have passed on the library server. If there are restrictions on release (e.g., pending a patent related to the dissertation, etc.), of course, these determine how much--if any--access to the ETD to the general or University public is allowed.

