Updated for Production schedule with entity references in the teiHeader
DTD name changed to ATLAS_teixlite.dtd
Changes From Previous Drafts marked in RED
John Robert Gardner
XML Engineer
This is a final draft encoding description is for the ATLAS project and will receive minor changes based upon anomalies encountered in journals other than those utilized in the test encoding sample with Proceedings and JBL. Certain key features in the following description are consistent for all journals, though the examples are based upon a single hypothetical edition of the Journal of the American Academy of Religion and/or a Journal of Biblical Literature.
<ref id="cov1.gif">outside-front</ref> <p n="issn"> issn# </p> <p n="date">date as printed on cover </p> <p n="voliss">volume and issue, as printed on cover </p>Otherwise for the inside front, inside back, and outside back, follow this model (i.e., minus the issn, date, and volume/issue data):
<ref id="cov2.gif">inside-front</ref> <ref id="cov3.gif">inside-back</ref> <ref id="cov4.gif">outside-back</ref>Note that the identification of issn, date, volume and issue are given, but are identified only in attributes by type so as not to violate validation and replicate <titlePart> statement (cf. title page below).
The Table of Contents will be generated automatically by the software. Start tagging with the first article. NOTE: each article and review is tagged as a separate document instance.
German, French and other European Languages should be accurately representated with entities for grave, &c. Hebrew, Greek, &c. are to be transliterated per the table available separately from this document (http://rosetta.atla-certr.org/xml/entities/). The following standard entity sets may be needed (let me know if you don't already have access to any of these):
General rule is that non-Roman script languages (e.g., Hebrew) are to be keyed according to the transliteration guide, and tagged with the <foreign> tag, adding the "lang" attribute, e.g., <foreign lang="heb">. For language designations, use the 3-character abbreviations presented in ISO 639-2:1998 ( http://www.indigo.ie/egt/standards/iso639/iso639-2-en.html). Roman scripts should use proper entities for accent, umlaut, etc. as defined in the Latin I and Latin II sets. If Roman script languages such as German are used for a single term and are offset with italics or quote marks in a single word instance, follow the rules for Keywords and Quoted Terms noted below.
<tei.2 id="n0021-9231_114_01_0385"> [this and other attr. explained below] <teiHeader> {see section immediately below} </teiHeader> <text> <front> <titlePage> {titlePage elements, see below} <titlePage> </front> <body> {follow DTD for body} </body> <back> {back/bibl matter} </back> </text> </tei.2>The teiHeader is explained immediately below and is templated--with items for PDCC to enter as marked by ***--for each article. Each article forms a distinct document file instance.
This would be the file name for the first article of the first issue of the 114th volume of the journal whose ISSN was 0021-9231 (JAAR). On those few occasions where two issues are combined (or two volumes), the numbering should reflect this anomalously:n0021-9231_114_01_0001.xml
n0021-9231_114_0203_0001.xml
*******************
Note, you are no longer seeing
the complete header in the keyed files done by PDCC, additional materials
are not keyed but instead are referenced--not keyed--via the entity references
required below.
*********************
Sections marked in red indicate that an "external entity" will be typed in. These files have been provided for separately at the ATLAS web site, http://rosetta.atla-certr.org/CERTR/ATLAS/xml/. They are referenced by fixed attributes in the appropriate elements which are done as entities (e.g., "&editionheader;") during keying entry by PDCC (all "respStmt" elements in the editionStmt element, the entire set of paragraphs under the encodingDesc element, and the profileDesc element).
In specific, the following excerpts show which entities are to be entered and where. The following &editionheader; entity will take care of the <respStmt>, <resp>, and <name> elements required in the DTD:
<editionStmt n="editionheader"> <!-- *** PDCC enters: This date is the date of the edition
of the journal being encoded -->
<edition> <date>
***1998***
</date> </edition>
<!-- PDCC types only this "editionheader" entity reference here -->
&editionheader;</editionStmt>
The following &encoding; entity will take care of the <projectDesc>, <editorialDecl>, <refsDecl>, and various <p> elements required in the DTD:
<encodingDesc n="encoding"> <!-- PDCC only enters this "encoding" entity reference--> &encoding;</encodingDesc>
The following &profile; entity will take care of the <langUsage> and <language> elements required in the DTD:
<profileDesc n="profile"> <!-- PDCC only enters this "profile" entity reference -->&profile;
</profileDesc>
The #FIXED values for the "n" attribute in the <editionStmt>, <encodingDesc>, <profileDesc> elements remain as before. This remains unchanged in the DTD.
PDCC will have to enter the reference to the specific journal, ISSN, issue, etc., as denoted below by ***. Use the following template per article (e.g., each file, as each article is to be its own document instance), with keying to be varied only in those fields denoted by ***. The ID attribute contains a canonical reference in the form (for example) "***n0002-7189_XXX_zz_yyyy" where 0002-7189 is the ISSN number of the journal, XXX is the volume number of this issue, "zz" indicates an issue zero-justified, and yyyy is the page number of the original text in Arabic numerals (in all such cases "0"-justified: Pages are always 4-digit RJZF, volumes are always 3-digit RJZF, issues are always 2-digit RJZF. For identifying the article as a whole, the page number of the starting page, 4-digit RJZF, is given.
<!DOCTYPE tei.2 SYSTEM "http://purl.org/CERTR/ATLAS/xml/ATLAS_teixlite.dtd"> <tei.2 id="n0002-7189_114_01_0385"> <teiHeader> <fileDesc> <titleStmt> <!-- ***PDCC enters this title, vol, issue, and "electronic edition" for each journal --> <title>
***Journal of the American Academy of Religion, Volume 66:1: electronic edition***
</title>
</titleStmt> <editionStmt n="editionheader"> <!-- *** PDCC enters: This date is the date of the edition
of the journal being encoded -->
<edition> <date>
***1998***
</date> </edition>
<!-- PDCC types only this "editionheader" entity reference here -->
&editionheader;
</editionStmt>
<publicationStmt> <!-- ***PDCC enters this title, distrib, address, for each journal --> <publisher>
***American Academy of Religion***
</publisher>
<distributor>
***Scholars Press***
</distributor> <address> <addrLine>
***P.O. Box 15399, Atlanta, GA 30333-0399***
</addrLine> </address> <!-- ***PDCC enters his date is the date of the encoding itself,
not the edition encoded -->
<date>
01-11-00
</date>
</publicationStmt>
<sourceDesc> <!-- ***PDCC enters this title, vol, issue, date for each journal --> <p>
***Journal of the American Academy of Religion, Volume 66, Number 1, 1998***
</p> </sourceDesc> </fileDesc> <encodingDesc n="encoding"> <!-- PDCC only enters this "encoding" entity reference--> &encoding;
</encodingDesc> <profileDesc n="profile"> <!-- PDCC only enters this "profile" entity reference -->
&profile; </profileDesc> </teiHeader>
The "tei.2" and "text" element must contain the
article's id. Again, the ID attribute value MUST begin with the letter
"n", followed by the numerical value with no whitespace in between. The
ID contains a canonical reference in the form (for example) "***n0002-7189_xxx_zz_yyyy"
where 0002-7189 is the ISSN number of the journal, xxx is the volume number
of this issue, "zz" indicates the issue, and yyyy is the page number of
the original text in Arabic numerals, all are RJZF, "0"-justified. Thus
for a simple article:
<tei.2
id="n0002-7189_XXX_zz_yyyy">
On those few occasions where two issues are combined
(or two volumes), the numbering should reflect this anomalously:
<tei.2 id="n0021-9231_114_0203_0001">Review articles (article-length reviews, or surveys of literature) are treated like regular articles. If more than one review begins on the same pages, the second is numbered "yyyya", the third as "yyyyb," etc., "0"-justified. In addition, for Book Reviews, an "n" attribute is added to the tei.2 element wherein the title of the book being reviewed is to be entered:
<tei.2 id="n0002-7189_XXX_zz_yyyy" n="Title of the Book Reviewed">While the "n" attribute for the "text" element is set to the furnished "review" value option (Note, "id" is no longer used in the "text" element):
<text n="review">
Cf. attribute list, the "n" value is required as follows:
<!ATTLIST text id ID #IMPLIED n (article | book | review | errata | supplement | index) #REQUIRED lang IDREF #IMPLIED rend CDATA #IMPLIED decls IDREFS #IMPLIED >A list of elements with #REQUIRED attributes which must follow this syntax is included:
As noted, the "id" attribute must be included on all <pb>, <tei.2>, <figure>, <anchor>, and <note> , elements, as well as <bibl> elements in a bibliography, so that "target" attributes in <xptr> and <xref> tags can point to them. Again, the <note> and <bibl> element id differs in their syntax from the id's for elements the list above.
For <pb> elements, use this pattern: n0002-7189_vol_iss_page, e.g., make an empty element as follows: <pb id="n0002-7189_066_01_0003" /> (page 3). Note that the page break goes at the top of the page, so page numbers refer to the following page. If a page is completely blank, you must include a <pb> element for it. It is necessary to include a <pb> element for every page with content (including advertisements), even if no page number appears on the page (if there are any). For blank pages and advertisement pages, a place-holder page-break tag (following the empty tag model above) must be created with proper ID identifier.
For <bibl> elements, use this pattern: bnum, e.g., <bibl id="b2">. Note that the entries are numbered consecutively. If the bibliography comes in several parts, number all entries consecutively across all parts in one numerical sequence.
For <note> elements, use this pattern: nnum, e.g., <note id="n3">. This refers to note 3 in the article that begins on page 817. Articles often begin with unlabeled footnotes, usually describing the author. Use lowercase letters to designate these notes, e.g., <note id="n0002-7189_066_817_n_a">, the note at the bottom of the article beginning on page 817 that tells something about the author. If a second such note appears in an article, label it "b," and so forth.
For <tei.2>elements, use this pattern: n0002-7189_vol_iss_page, and include the attribute, e.g., <tei.2 id="n0002-7189_066_iss_0817">. This refers to the article that begins on page 817. NOTE: Pages are always 4-digit RJZF, volumes are always 3-digit RJZF, and issues are 2-digit RJZF.
For "ptr," target points to the respective bibliography entry id if the article has a bibliography, just as is the case with ref. So, for instance, in a given paragraph if you have ". . . .(Jones 1986:35)," would be tagged as ". . .(<ref target="b15">Jones 1986</ref>:35)" and if you later had ". . . (36)" you would tag this as ". . .(<ptr target="b15" />36)" assuming that the book by Jones in 1986 is #15 entry in the <bibl>.
The element "xref," used to point to places beyond/outside the given article's instance requires one of three values for the "type" attribute: "pub" for a publication of any kind (book, article, film, etc.); "can" for a canonical citation (see immediately below); and "web" for any online resource.
IMPORTANT, for citations and notes: Using the determinations for titles listed below, after a title is identified, it is used in creating the id for any reference as follows: title.volume.kind.pages. In this syntax, "title" is the title--or abbreviation thereof--for the book or journal in the reference, volume is 3 digits, and all are 000 if it is a single-volume book, otherwise the volume number for the series is given, "kind" is "bk" for book, or "xx" (where "xx" is teh issue number, 2-digit RJZF, for articles in a journal (in the case of "spring," "summer," "fall," "winter" being used in lieu of issue numbers, "01," "02," "03," and "04" would be the appropriate issue numbers), and pages is four digit RJZF. The value is the first page of the citation. In the case of a reference to an entire book, "0000" is given. Cf. the following examples (note also the required "type" attribute):
<xref target="The Language and Linguistic Background of the Isaiah Scroll (1QLSAa)_000_bk_0422" type="pub"> E. Y. kutscher, <title level="m">The language and Linguistic Background of the Isaiah Scroll (1QISa<emph rend="sup">a</emph>)</title> (STJD 6; Leiden: Brill, 1974) 422 </xref> <xref target="Laternum_048_00_0070" type="pub"> J. A. Soggin, “Il ‘segno di Giona’ nel libro del profeta Giona,” <title level="j">Lateranum</title> 48 (1982) 70–74 </xref> NOTE: as in this case with Laternum, where there is no specific issue number, "00" is given for the issue.NOTE: where a title has special characters marked in the actual text with "emph" (such as the superscript "a" in <title level="a">1QISa<emph rend="sup">a</emph><title> above, remove the "emph" tags and just type the letters contained in the emph as normal characters.
NB: it is therefore necessary for the ATLA-CERTR team to write scripts which search for these "00" occasions and place them in a standard issue numbering framework.
Table for standardized abbreviations to be provided separately to
this document. See http://rosetta.atla-certr.org/xml/entities/ .
Conceptually, we are making a general categorical distinction between lexigraphic distinctions (e.g., one letter of a word in superscript) which are tagged with "emph" and the appropriate rend value; and semantic distinctions (e.g., a whole word in italics), to be marked with "hi" and the appropriate rend value. It is possible, therefore, that within a "hi" you might have one letter marked with "emph."
For proper names, the "name" tag is used, with the required "n" attribute for any "author," or "topic" (some term from the article's title), or "place" for some geographical references.
Abbreviations are of two kinds, with a required attribute of "type" allowing either "acron" for an all-capital/uppercase acronym (unless in italics, in which case a title tag applies, usually for a journal) and "other" for any other abbreviation.
It is possible to distinguish titles from keywords (i.e., items to be tagged as <hi>) based on a few contextual norms. Title's are often distinguishable by the presence of surrounding double-quote marks, italic or underline style, and are usually preceded by a preposition: e.g., "in Moby Dick," or "from Moby Dick we have," or "with Moby Dick," for example. In addition, titles are also used as proper nouns to initiate a clause or begin a sentence: "Moby Dick contains a multitude of references . . . ." Finally, titles frequently follow a reference in the possesive: "Melville's Moby Dick . . .. " or "his Moby Dick . . . ." These contextual occasions should be tagged with "<title>" with the best possible practice for the value of the "level" attribute. NOTE: To distinguish between "Moby Dick" the character and Moby Dick the work title, look for font/context clues. Finally, abbreviations in italics are also likely to be titles of journals (with level="j" attribute).
The second option is that a keyword is being identified (if it is found within a quoted passage, citation, or blockquote, do not tag it separately, provide a generic "emph" tag with a "rend" attribute denoting the format--e.g., as above, u/underline, i/italic, b/bold, u2/double-underline, sub/subscript, sup/superscript, strk/strike-through) and so it should be tagged as <hi rend="i"> or <hi rend="u"> depending on the way the text is formated. In any case, if the term is foreign, be sure to add the "lang" attribute with a suitable value from the table above. Terms in bold should receive an "emph" tag, with the addition of the "lang" attribute as appropriate.
IMPORTANT: For titles, it is also required that the level be provided, e.g., the "m" for monograph (book, etc.), "a" (for article), etc.
IMPORTANT: All smart quotes--single or double (i.e., right and left hooked quotes)--are to be normalized to straight quotes.
Block quotes should be marked with the <q rend="block"> tag. Special attention needs to be paid to how block quotes interact with paragraphs. A block quote may appear in the middle of a paragraph or at the end of a paragraph (rarely at the beginning), and it may contain more than one paragraph itself (though typically it doesn't). A block quote that appears in the middle of a paragraph should be tagged like this:
<p>This is the paragraph leading up to the block quote, and here comes the quote: <q rend="block">This is a block quote. It is not a separate paragraph.</q> This is a continuation of the original paragraph. You can tell it is a continuation because the first word of the preceding sentence is not indented.</p>If the block quote appeared at the end of the paragraph, this is what you would see:
<p>This is the beginning of the paragraph. It's followed by a block quote (without a colon this time--the punctuation can vary). <q rend="block">This is the block quote. It ends the paragraph.</q></p> <p>This is a new paragraph.</p>A block quote that contains one or more paragraphs will indicate this by indentation within the block quote itself. It should be tagged thus:
<p>Here's the beginning of the paragraph. It contains a block quote that is broken into two paragraphs. However, the block quote is still part of this paragraph. <q rend="block"><p>Note that the quote begins with a paragraph tag. Now here's another paragraph, still within the block quote.</p> <p>This is a second paragraph within the block quote. It's also the end of the block quote.</p></q> The paragraph continues, because there is no indentation in the source.</p>
<lg xml:space="preserve">
<l rend="verse" xml:space="preserve">A line of verse</l>
<l rend="verse" xml:space="preserve">A line of verse indented</l>
<l rend="verse" xml:space="preserve">A line of verse</l>
<l rend="verse" xml:space="preserve">A line of verse with stanza spacing</l>
</lg>
IMPORTANT: Preservation of indenting and stanza spacing is imperative. The "l" element has the "xml:space" attribute FIXED at the "preserve" value, and should have its value set to "preserve" for "lg" to preserve stanza and line spacing, and "l" must also use the "preserve" option for this attribute to maintain proper indenting based upon spaces, tabs, and hard returns keyed.
appropria<pb id="***n0002-7189_066_01_0657">tionIt is also important that dashes ("em-dash") in a word or sentence punctuation be designated by the proper — instead of the standard keyboard input of the dash or "minus" key.
<add place="supratext" resp="pub" n="map" id="n0021-9231_114_01_0395i"> <figure id="n0021-9231_114_01_0395i">Insert</figure> </add>The DTD has been revised to allow for values of "map" or "other" for required "n" attribute, and "pub" is the FIXED value for "resp" as only the publisher will be responsible for adding an insert. Place is also to be FIXED as "supratext." Id is required and should match the id value of the "figure" element.