A Document Type Definition for Electronic texts
XML DTD for Academic Manuscripts and Electronic Texts
# Author: John Robert Gardner, Ph.D.
# Version: 1.0
# Organization: Vedavid
# Date: November, 1998
<!-- This Document Type Definition (DTD) is designed for academic use
in the tagging and preparation for research of electronic editions of
primary sources. It was made with a wonderful tool by Duncan Chen called
ezDTD for Windows95. Since eXtensible Markup Language (XML) has such
great flexibility, impending ubiquity, and general precision power for
rendering precise categorization which maintaining a "flat data" file
format (which means you can even read it in DOS if you don't have a special
software or browser); it is logical to design a set of tags for high-end
research. Thus the term I have coined, "e-textnology" is duly applicable
as XML is sufficiently user-friendly and robust for serious implementation
throughout the academic humanities. This DTD began as a schema for ancient
South Asian texts, the Veda's. You are welcome to incorporate it in your
own e-texts via parameter entities because it will remain in this public
directory http://vedavid.org/xml/ until further notice. If you adapt your
own from it, and you are of course welcome to, please observe netiquette
and credit at least the URI from which you got this file. -->
**********************************************************************
ELEMENT: etext
COMMENT: The three main divisions are the heritage or "pedigree"
of the etext and the project under which it is encoded and
tagged as well as the pedigree of MSS from which the e-text is
derived, the "content" itself, and any ancilliary materials, or
"postcontent."
**********************************************************************
<!ELEMENT etext ( pedigree, content, postcontent ) >
**********************************************************************
ELEMENT: pedigree
COMMENT: In this part of the front or preliminary matter, due
credits are provided for those who have contributed to the
production of the etext. This is the pedigree of the etext, not
of the MSS from which the etext is derived. The MSS pedigree is
addressed in the "msspedigree" tag under the "content" element.
**********************************************************************
<!ELEMENT pedigree ( title, editor+, copyright, grant*,
acknowledgments?, msspedigree ) >
**********************************************************************
ELEMENT: title
COMMENT: Well, the title goes here!
**********************************************************************
<!ELEMENT title
( #PCDATA | br ) >
**********************************************************************
ELEMENT: editor
COMMENT: I divide the name up so that searching can be more
precise: e.g., "James" can be a first or last name, so this
simply makes the data encoded more precise for scholars doing
searches. In other words, be sure an give you name precisely
if you're going to all this trouble!
**********************************************************************
<!ELEMENT editor ( given+, surname, suffix? ) >
**********************************************************************
ELEMENT: given
COMMENT: The first name, and middle if desired.
**********************************************************************
<!ELEMENT given ( #PCDATA )* >
**********************************************************************
ELEMENT: surname
COMMENT: The last name of the person.
**********************************************************************
<!ELEMENT surname ( #PCDATA )* >
**********************************************************************
ELEMENT: suffix
COMMENT: Titles of honor, birth such as "jr."; etc.
**********************************************************************
<!ELEMENT suffix ( #PCDATA )* >
**********************************************************************
ELEMENT: copyright
COMMENT: This is some of the most important pedigree matter and,
since the country may not have "states" I have made it optional
with the "?" tag. Any element with simply a comma after it, such
as date and city here, is not only required, but required in
that order.
**********************************************************************
<!ELEMENT copyright ( date, city, state?, country? ) >
**********************************************************************
ELEMENT: country
COMMENT: this is optional for citations, but should be included
for other pedigree matter.
**********************************************************************
<!ELEMENT country ( #PCDATA )* >
**********************************************************************
ELEMENT: state
COMMENT: Optional, but could also refer to region, canton (as in
Switzerland), etc.
**********************************************************************
<!ELEMENT state ( #PCDATA )* >
**********************************************************************
ELEMENT: city
COMMENT: Usually required for citations, the city of publication
**********************************************************************
<!ELEMENT city ( #PCDATA | br )* >
**********************************************************************
ELEMENT: date
COMMENT: Any format is suitable, though it is possible to set an
attribute specifying different date styles (e.g., europe vs.
u.s.a.)
**********************************************************************
<!ELEMENT date ( #PCDATA )* >
**********************************************************************
ELEMENT: grant
COMMENT: Hey, somebody's gotta pay the bills and they have a
habit of liking credit for it. Give the granting foundation's
name here and, in the attribute section, you can give the AMOUNT
of money if you wish.
**********************************************************************
<!ELEMENT grant
( #PCDATA | br )* >
<!ATTLIST grant
AMOUNT CDATA #IMPLIED >
**********************************************************************
ELEMENT: acknowledgments
COMMENT: Sort of like those who pay the bills, those who help,
support, or otherwise put up with the odd headspaces of
cyber-demics deserve notation for their sacrifices.
**********************************************************************
<!ELEMENT acknowledgments ( #PCDATA | br )* >
**********************************************************************
ELEMENT: msspedigree
COMMENT: Herein be sure to list any and all data regarding the
MSS from which the etext is derived. When listing several MSS,
it is worth adding a parenthetical abbreviation because this DTD
will allow you to include/embed variant readings of passages
according to different MSS for a greater degree of critical
inquiry into the e-text; for example, something simple like 'A';
'B'; etc. This is also vital for identifying insertions of
alternate readings with the "mssvar" tag.
**********************************************************************
<!ELEMENT msspedigree (#PCDATA | editor | copyright | br )* >
**********************************************************************
ELEMENT: content
COMMENT: Since there can be hymns, poems, prose, and so forth
according to different traditions, I have made this a generic
tag name (technically an ELEMENT) as simply "segment." Since
some texts embed metric and prose together (such as the Black
Yajur Veda texts of ancient India), each segment and subsegment
must be numbered, thus there is an attribute REQUIRED. In
effect, following the number above, if one were doing the bible,
the "passages" would be the different subsections of the books
of the bible (the books of the bible would be the "segments" in
this schema). Some MSS have multi-level numbering, such as
1.1.1.1; so I am allowing for these refined structures and
requiring a number at each level so precision of text
arrangement is preserved while also encoded for searching, etc.
Since some texts are used in others in a sort of evolving
self-commentary (such as the Veda's), I've also allowed for a
"cited" element which allows for a sort of "cross-reference"
linking.
**********************************************************************
<!ELEMENT content ( segment )+ >
**********************************************************************
ELEMENT: segment
COMMENT: Different texts have different levels of subsections
and subnumbering, so to allow for the variables, I have left the
option of going straight to content entry (whether metric or
prose), or to a subsection, or to a passage. In addition, at
every level the TYPE and COMMENT attributes are allowed so that
issues of historical sequence, subperiods of composition,
regional variation, etc. can be annotated. Also, there is an
optional TYPE attribute so that, for instance in Vedic, one can
specify "mandala" or the TYPE of name given to a section or
subsection which in many traditions is a technical designation.
**********************************************************************
<!ELEMENT segment ( subsegment | passage | metric | prose ) >
<!ATTLIST segment
id ID #REQUIRED
TYPE CDATA #IMPLIED
COMMENT CDATA #IMPLIED
>
**********************************************************************
ELEMENT: subsegment
COMMENT: This segment is employed only if the MSS divides the
text to this level of detail.
**********************************************************************
<!ELEMENT subsegment ( passage | metric | prose )+ >
<!ATTLIST subsegment
id ID #REQUIRED
TYPE CDATA #IMPLIED
COMMENT CDATA #IMPLIED
>
**********************************************************************
ELEMENT: passage
COMMENT: Now you choose what type of text below this level,
metric or prose.
**********************************************************************
<!ELEMENT passage ( metric | prose )+ >
<!ATTLIST passage
id ID #REQUIRED
TYPE CDATA #IMPLIED
COMMENT CDATA #IMPLIED
>
**********************************************************************
ELEMENT: metric
COMMENT: TYPE might be the particular meter (e.g., iambic, or
jagatii, etc.). I'm also allowing insertions of alternate MSS
variants here if need be so that they can be included more
easily. These variants will act like traditional hyperlinks.
**********************************************************************
<!ELEMENT metric ( verse | mssvar )+ >
<!ATTLIST metric
id ID #REQUIRED
TYPE CDATA #IMPLIED
COMMENT CDATA #IMPLIED >
**********************************************************************
ELEMENT: prose
COMMENT: I'm also allowing insertions of alternate MSS variants
here if need be so that they can be included more easily. These
variants will act like traditional hyperlinks.
**********************************************************************
<!ELEMENT prose ( line | mssvar | p )+ >
<!ATTLIST prose
id ID #REQUIRED
TYPE CDATA #IMPLIED
COMMENT CDATA #IMPLIED
>
**********************************************************************
ELEMENT: verse
COMMENT: It is true that verses might be only one line, or might
be many. For this reason, the id number is required here (but is
optional at the line level) because if the verse were only one
line, they will both have the same number and therefore need
only be entered once, at this level. Actual content can only be
entered at the line level, however, so that the heirarchy of
subelements can be preserved and so that the only tag/element in
which live data entry occurs is at the line level, to reduce
errors.
**********************************************************************
<!ELEMENT verse ( line )+ >
<!ATTLIST verse
id ID #REQUIRED
TYPE CDATA #IMPLIED
COMMENT CDATA #IMPLIED
>
**********************************************************************
ELEMENT: line
COMMENT: At this level the line number might be optional if the
verse has only one line, or if the prose text does not
sub-number the lines (though if it does not already, I highly
recommend doing so anyway). In addtion, we can embed media here
such as images, audio recitations, etc. Links to alternate MSS
readings can also be embedded here. Often there is a particular
variation of speakers in a MSS much as in a drama, so I allow
for those kinds of identifications to be made here if they are
not part of the actual authored MSS. In other words, to say
"Hamlet" at the beginning of one of his lines is part of the
content proper because Shakespeare wrote him as the speaker,
thus it is not correct to use the SPEAKER attribute. If, as with
an ancient myth, there are different speakers which later
scholars or commentators have identified, that is not part of
the text proper and so you would use this interpretive SPEAKER
attribute. The other reason for setting all this MSS and
historical metadata as attributes is simply for the cosmetics of
the primary source such that the actual display of the MSS is
not marred by line upon line of metacategorical data.
**********************************************************************
<!ELEMENT line ( #PCDATA | link | br | media | mssvar | cited )+ >
<!ATTLIST line
id ID #IMPLIED
TYPE CDATA #IMPLIED
COMMENT CDATA #IMPLIED
SPEAKER CDATA #IMPLIED
>
**********************************************************************
ELEMENT: p
COMMENT: Some prose sections of MSS work best if put in paragraph tags,
and many are very familiar with this tag. Paragraphs don't work with
poetry verses, so they are allowed only here in prose, and below in the
appendix.
**********************************************************************
<!ELEMENT p ( #PCDATA | link | br | media | mssvar | cited | line )+ >
<!ATTLIST p
id ID #IMPLIED
TYPE CDATA #IMPLIED
COMMENT CDATA #IMPLIED
SPEAKER CDATA #IMPLIED
>
**********************************************************************
ELEMENT: author
COMMENT: Use this for actual authored works, for editors of a
MSS or critical edition, use the "editor" tag instead.
**********************************************************************
<!ELEMENT author ( given, surname, suffix? )+ >
**********************************************************************
ELEMENT: br
COMMENT: This is a "hard return" or line break.
**********************************************************************
<!ELEMENT br EMPTY >
**********************************************************************
ELEMENT: media
COMMENT: This functions a lot like a traditional link, only it
is meant to specifically refer to multimedia types.
**********************************************************************
<!ELEMENT media (#PCDATA) >
<!ATTLIST media
LINK CDATA #FIXED "simple"
SHOW ( new | embed | replace ) "embed"
ACTUATE ( user | auto ) "user"
>
**********************************************************************
ELEMENT: mssvar
COMMENT: This is a variant reading and so should be identified
according to the abbreviation set forth in the "msspedigree"
section. It functions like a link which can embed the alternate
reading (tagged separately as "mssvarsrc")
**********************************************************************
<!ELEMENT mssvar ( #PCDATA )* >
<!ATTLIST mssvar
CDATA IDREF #REQUIRED >
**********************************************************************
ELEMENT: link
COMMENT: This is different from the insertion of a MSS variant.
Typically, this will be a traditional "a
href='http://www.vitalsomething.org/'" such as you see in HTML
codes all the time, it operates roughly the same way, but with
SHOW you can embed the linked content (bring it into your
document), replace your content (sort of like embedding, but
whatever was there before the link is activated is not visible
once it's activated, kind of like traditional web links, so I've
made it the default ), or open a new browser window to display
the linked content. The IDREF is the actual location of what
your linking to. ACTUATE is really cool, it lets the link
automatically activate, or requires the user to click upon it.
I've set the default to user-activated.
**********************************************************************
<!ELEMENT link ( #PCDATA | emph | br | edition | mssvar )* >
<!ATTLIST link
id IDREF #REQUIRED
LINK CDATA #FIXED "simple"
ACTUATE ( user | auto ) "user"
SHOW ( new | replace | embed ) "replace"
>
**********************************************************************
ELEMENT: emph
COMMENT: This is simply anything you need or want to emphasize,
to later be rendered as bold, underline, whatever.
**********************************************************************
<!ELEMENT emph ( #PCDATA )* >
**********************************************************************
ELEMENT: head
COMMENT: Give a title here.
**********************************************************************
<!ELEMENT head (#PCDATA) >
**********************************************************************
ELEMENT: cited
COMMENT: Since some texts are used in others in a sort of
evolving self-commentary (such as the Veda's), I've also allowed
for a "cited" element which allows for a sort of
"cross-reference" linking. Using the "link" element, you can
have the cf.'d text be embedded, opened in a new window, or
simply replace the current segment.
**********************************************************************
<!ELEMENT cited (#PCDATA | link ) >
**********************************************************************
ELEMENT: postcontent
COMMENT: In this location, at the end of the etext, the
bibliography, appendix (if any) and the readings for the MSS
variations are included.
**********************************************************************
<!ELEMENT postcontent ( bibliography, appendix*, mssvarsrc* ) >
**********************************************************************
ELEMENT: bibliography
COMMENT: By all means, the sources for your e-text, at the very
least, should be included.
**********************************************************************
<!ELEMENT bibliography ( citation )+ >
**********************************************************************
ELEMENT: citation
COMMENT: You can enter the editors and editions of MSS, or the
authors and titles of critical commentaries, etc.
**********************************************************************
<!ELEMENT citation ( #PCDATA | emph | link | br | title | editor |
author | copyright )* >
**********************************************************************
ELEMENT: appendix
COMMENT: These can be just about anything.
**********************************************************************
<!ELEMENT appendix ( head, ( #PCDATA | emph | link | br | mssvar |
media | p )* ) >
**********************************************************************
ELEMENT: mssvarsrc
COMMENT: This is how to tag the specific text segments of
variant MSS readings, use this tag to identify each one. In
other words, the variant reading from MSS "A" (see "msspedigree"
tag) for passage 1.2.3 would be entered as the "id" below like
this "A1.2.3".
**********************************************************************
<!ELEMENT mssvarsrc ( #PCDATA | br )+ >
<!ATTLIST mssvarsrc
id ID #REQUIRED
SHOW ( new | embed | replace ) "embed"
>