SGML: Standard Generalized Mark-up Language

This is a set of rules for how to make a Document Type Defintion (DTD). HTML is a DTD, anyone can write one. It's a set of instructions that says "when I want to indicate that Xanadu is a place talked about by Coleridge--instead of an obnoxious song, for instance--I will write this: '<poemgeography>Xanadu</poemgeography>'". Right now, WordPerfect 8 for Windows creates it, and there are various gizmos for other software. Internet Explorer 4+ reads it, as does the Panorama browser. Future Microsoft software text outputs will be in SGML.

The value of this is that it enables searching to be more precise (i.e., "Where is the Xanadu of Kubla Khan discussed") for research. THere are a multitude of other uses. Another advantage of SGML is that it uses the simplest computer coding (it's hard to include a virus), it is not proprietary (no one owns it), and it preserves content for even the simplest of machines (it's also 2000 compliant).

Because anyone can write a DTD, this has been a problem. Scholars set up TEI-ML (Text Encoding Initiative Mark-up Language) some years ago, but it's too big--it tries to do everything. VT has made ETD-ML (but this means everyone must agree to use it for dissertations) which is similar to TEI, but omits some things and adds others. There's lots of other "ML's" out there too. It is impossible to write software to cover all the variables. Hence the value of XML.

It's also very hard to print SGML. It only identifies the kind of information in a document, not how it looks. This requires a very complicated "style sheet" (in computerese) called a DSL (Document Style Language) file, which says--in effect--any time I start a new chapter, center it, put it in bold, and font size such and such.

Learn More from the UVA Experts