Encoding Guidelines

for The American Theological Libraries Association
Serials Project

Final Draft May 31, 2000


Updated for Production schedule with entity references in the teiHeader
DTD name changed to ATLAS_teixlite.dtd
Changes From Previous Drafts marked in RED

John Robert Gardner
XML Engineer

This is a final draft encoding description is for the ATLAS project and will receive minor changes based upon anomalies encountered in journals other than those utilized in the test encoding sample with Proceedings and JBL. Certain key features in the following description are consistent for all journals, though the examples are based upon a single hypothetical edition of the Journal of the American Academy of Religion and/or a Journal of Biblical Literature.


The journal issues you receive will be encoded in XML using the TEI-Lite DTD (available from http://www.uic.edu/orgs/tei/lite/teixlite.dtd) . The ATLAS project is using a trimmed-down version of this DTD called atlas_teixlite.dtd available from the ATLA-CERTR web site: http://rosetta.atla-certr.org/CERTR/ATLAS/xml/. Each article is a separate XML file with its own TEI header.

Extent of Material to be Encoded

Articles, Book Reviews, and Books Received need to be entered and tagged. Outside and inside covers, advertisements, and the Table of Contents will be tagged with place-holders designating "outside-front," "outside-back" and so forth in the form of a <link> tag, and the publication title, ISSN, and Volume/issue data. This link will be to same-directory according to the scanning designations for the covers in the "Specifications for Image Scanning." E.g., for outside front cover:
          <ref id="cov1.gif">outside-front</ref>
          <p n="issn"> issn# </p>
          <p n="date">date as printed on cover  </p>
          <p n="voliss">volume and issue, as printed on cover  </p>

Otherwise for the inside front, inside back, and outside back, follow this model (i.e., minus the issn, date, and volume/issue data):
          <ref id="cov2.gif">inside-front</ref>
          <ref id="cov3.gif">inside-back</ref>
          <ref id="cov4.gif">outside-back</ref>
Note that the identification of issn, date, volume and issue are given, but are identified only in attributes by type so as not to violate validation and replicate <titlePart> statement (cf. title page below).

The Table of Contents will be generated automatically by the software. Start tagging with the first article. NOTE: each article and review is tagged as a separate document instance.

Entity Sets and Foreign Languages

Foreign language items are to be tagged with the "foreign" tag. Required values for "rend"  must be filled in.  The "lang" three-letter designation is to be obtained from the "B" category of the ISO 639-2:1998 (http://www.indigo.ie/egt/standards/iso639/iso639-2-en.html), the "rend" value specifies either "script" for a non-Roman character script entry (such as Sanskrit/Devanagari characters), or "trans" for any language in Roman script (e.g., "english-type" letters, including transliterations). As Hebrew and Greek are the most likely common scripts (i.e., non-Roman chararcters), special options under "rend" are included under each, per ISO 15924:1999 (draft, http://www.egt.ie/standards/iso15294/document/index.html), with "Ell" as the value for Greek, and "Heb" for Hebrew. Note that, while default options are specified in the DTD, the Green and Hebrew "rend" options are to have their first letter in upper case: Heb/Ell.

German, French and other European Languages should be accurately representated with entities for grave, &c. Hebrew, Greek, &c. are to be transliterated per the table available separately from this document (http://rosetta.atla-certr.org/xml/entities/). The following standard entity sets may be needed (let me know if you don't already have access to any of these):

Though some journals do not contain foreign language text in the original script, they often make use of transliteration extensively. Use the following entity sets for foreign languages and transliterations (you can find copies of these and other entity sets at http://rosetta.atla-certr.org/CERTR/ATLAS/xml/ ): Take special care with the transliterated texts to mark diacritical marks such as a dot above a letter, dot below a letter, macrons, accents, etc. In the transliteration of Arabic and other Semitic languages, be careful to distinguish the left half ring from the right half ring. Neither should be marked with an apostrophe or accent mark.

General rule is that non-Roman script languages (e.g., Hebrew) are to be keyed according to the transliteration guide, and tagged with the <foreign> tag, adding the "lang" attribute, e.g., <foreign lang="heb">. For language designations, use the 3-character abbreviations presented in ISO 639-2:1998 ( http://www.indigo.ie/egt/standards/iso639/iso639-2-en.html). Roman scripts should use proper entities for accent, umlaut, etc. as defined in the Latin I and Latin II sets. If Roman script languages such as German are used for a single term and are offset with italics or quote marks in a single word instance, follow the rules for Keywords and Quoted Terms noted below.

Basic document structure

Following the TEIxLite DTD, the basic document structure is:
                <tei.2  id="n0021-9231_114_01_0385"> [this and other attr. explained below]
                        <teiHeader> {see section immediately below} </teiHeader>
                                        <titlePage> {titlePage elements, see below} <titlePage>
                                <body> {follow DTD for body} </body>
                                <back> {back/bibl matter} </back>
The teiHeader is explained immediately below and is templated--with items for PDCC to enter as marked by ***--for each article. Each article forms a distinct document file instance.

File Naming

To insure compatibility and transparent linking for all ATLAS materials, file naming for each article/review (which, in turn, is a discreet document instance) is to reflect the same syntax as the "id" system described below.  Each instance is to be named according to its ISSN, volume, issue, and page number of the page on which the article begins, followed by the ".xml" extension, nXXXX-XXXX_nnn_zz_yyyy.xml, where "nnn" is the volume in 3-digit RJZF, "zz" is the issue in 2-digit RJZF, and "yyyy" is the page number of the page on which the article begins in 4-digit RJZF.  Hence:
This would be the file name for the first article of the first issue of the 114th volume of the journal whose ISSN was 0021-9231 (JAAR).  On those few occasions where two issues are combined (or two volumes), the numbering should reflect this anomalously:


TEI Header

You can use the following as a template for the invocation of the DTD and the TEI header of each journal issue, according to the sample below from Jnl. of the Am. Acad. of Religion.

Note, you are no longer seeing the complete header in the keyed files done by PDCC, additional materials are not keyed but instead are referenced--not keyed--via the entity references required below.

Sections marked in red indicate that an "external entity" will be typed in.  These files have been provided for separately at the ATLAS web site, http://rosetta.atla-certr.org/CERTR/ATLAS/xml/. They are referenced by fixed attributes in the appropriate elements which are done as entities (e.g., "&editionheader;") during keying entry by PDCC (all "respStmt" elements in the editionStmt element, the entire set of paragraphs under the encodingDesc element, and the profileDesc element).

In specific, the following excerpts show which entities are to be entered and where.  The following &editionheader; entity will take care of the <respStmt>, <resp>, and <name> elements required in the DTD:

<editionStmt n="editionheader">

<!-- *** PDCC enters: This date is the date of the edition
of the journal being encoded -->
<!-- PDCC types only this "editionheader" entity reference here -->

The following &encoding; entity will take care of the <projectDesc>, <editorialDecl>, <refsDecl>, and various <p> elements required in the DTD:

<encodingDesc n="encoding">

<!-- PDCC only enters this "encoding" entity reference-->


The following &profile; entity will take care of the <langUsage> and <language> elements required in the DTD:

<profileDesc n="profile">

<!-- PDCC only enters this "profile" entity reference -->


The #FIXED values for the "n" attribute in the <editionStmt>, <encodingDesc>, <profileDesc> elements remain as before. This remains unchanged in the DTD.

PDCC will have to enter the reference to the specific journal, ISSN, issue, etc., as denoted below by ***. Use the following template per article (e.g., each file, as each article is to be its own document instance), with keying to be varied only in those fields denoted by ***. The ID attribute contains a canonical reference in the form (for example) "***n0002-7189_XXX_zz_yyyy" where 0002-7189 is the ISSN number of the journal, XXX is the volume number of this issue, "zz" indicates an issue zero-justified, and yyyy is the page number of the original text in Arabic numerals (in all such cases "0"-justified: Pages are always 4-digit RJZF, volumes are always 3-digit RJZF, issues are always 2-digit RJZF. For identifying the article as a whole, the page number of the starting page, 4-digit RJZF, is given.

<!DOCTYPE tei.2 SYSTEM "http://purl.org/CERTR/ATLAS/xml/ATLAS_teixlite.dtd">

<tei.2 id="n0002-7189_114_01_0385">




<!-- ***PDCC enters this title, vol, issue, and 
"electronic edition" for each journal -->

***Journal of the American Academy of Religion, 
Volume 66:1: electronic edition***

<editionStmt n="editionheader">

<!-- *** PDCC enters: This date is the date of the edition
of the journal being encoded -->
<!-- PDCC types only this "editionheader" entity reference here -->

<!-- ***PDCC enters this title, distrib, address, 
for each journal -->

***American Academy of
***Scholars Press***

***P.O. Box 15399, Atlanta, GA

<!-- ***PDCC enters his date is the date of the encoding itself,
not the edition encoded -->

<!-- ***PDCC enters this title, vol, issue, date
 for each journal -->

***Journal of the American Academy of Religion, 
Volume 66, Number 1, 1998***



<encodingDesc n="encoding">

<!-- PDCC only enters this "encoding" entity reference-->


<profileDesc n="profile">

<!-- PDCC only enters this "profile" entity reference -->



Title Page

Follow TEI for the title page with all all <docTitle> subelement structure. The title page is the first child of the <front> element. Information from the author's associated institution is included in an "address" sub-element to the docAuthor tag. The docTitle is the title of the particular article being encoded. The rest of the titlePage is per DTD as applicable.

IDs and Targets

IMPORTANT: All attribute values for "ID" or "id" must begin with the letter "n." This letter "n" is never to be followed by any whitespace  Also, and in general, ALL page numbers in any "id" attribute are to be 4-digit, RJZF.

The "tei.2" and "text" element must contain the article's id. Again, the ID attribute value MUST begin with the letter "n", followed by the numerical value with no whitespace in between. The ID contains a canonical reference in the form (for example) "***n0002-7189_xxx_zz_yyyy" where 0002-7189 is the ISSN number of the journal, xxx is the volume number of this issue, "zz" indicates the issue, and yyyy is the page number of the original text in Arabic numerals, all are RJZF, "0"-justified. Thus for a simple article:
        <tei.2 id="n0002-7189_XXX_zz_yyyy">

On those few occasions where two issues are combined (or two volumes), the numbering should reflect this anomalously:

                           <tei.2 id="n0021-9231_114_0203_0001">
Review articles (article-length reviews, or surveys of literature) are treated like regular articles. If more than one review begins on the same pages, the second is numbered "yyyya", the third as "yyyyb," etc., "0"-justified. In addition, for Book Reviews, an "n" attribute is added to the tei.2 element wherein the title of the book being reviewed is to be entered:
        <tei.2 id="n0002-7189_XXX_zz_yyyy" n="Title of the Book Reviewed">

While the "n" attribute for the "text" element is set to the furnished "review" value option (Note, "id" is no longer used in the "text" element):
        <text n="review">
Cf. attribute list, the "n" value is required as follows:
<!ATTLIST text 
        id ID #IMPLIED
        n (article | book | review | errata | supplement | index) #REQUIRED
        lang IDREF #IMPLIED
        rend CDATA #IMPLIED
        decls IDREFS #IMPLIED
A list of elements with #REQUIRED attributes which must follow this syntax is included: In each case, the required value will ultimately be of the ID data type (but for keying and early validating owing to book titles--see below--it is set to CDATA), and must begin with the letter "n" and follow this syntax. The "note" element begins with "n" but does not use this full syntax of ISSN &c.

As noted, the "id" attribute must be included on all <pb>, <tei.2>,  <figure>, <anchor>, and <note> , elements, as well as <bibl> elements in a bibliography, so that "target" attributes in <xptr> and <xref> tags can point to them. Again, the <note> and <bibl> element id differs in their syntax from the id's for elements the list above.

For <pb> elements, use this pattern: n0002-7189_vol_iss_page, e.g., make an empty element as follows: <pb id="n0002-7189_066_01_0003" /> (page 3). Note that the page break goes at the top of the page, so page numbers refer to the following page. If a page is completely blank, you must include a <pb> element for it. It is necessary to include a <pb> element for every page with content (including advertisements), even if no page number appears on the page (if there are any). For blank pages and advertisement pages, a place-holder page-break tag (following the empty tag model above) must be created with proper ID identifier.

For <bibl> elements, use this pattern: bnum, e.g., <bibl id="b2">. Note that the entries are numbered consecutively.  If the bibliography comes in several parts, number all entries consecutively across all parts in one numerical sequence.

For <note> elements, use this pattern: nnum, e.g., <note id="n3">. This refers to note 3 in the article that begins on page 817. Articles often begin with unlabeled footnotes, usually describing the author. Use lowercase letters to designate these notes, e.g., <note id="n0002-7189_066_817_n_a">, the note at the bottom of the article beginning on page 817 that tells something about the author. If a second such note appears in an article, label it "b," and so forth.

For <tei.2>elements, use this pattern: n0002-7189_vol_iss_page, and include the attribute, e.g., <tei.2 id="n0002-7189_066_iss_0817">. This refers to the article that begins on page 817. NOTE: Pages are always 4-digit RJZF, volumes are always 3-digit RJZF, and issues are 2-digit RJZF.

REFs and PTRs

<ref> and <ptr/> (reference and pointer) tags are used throughout the text to create hyperlinks to articles, footnotes, bibliography entries, and other elements within the text, or given article. The "id" of the target in the case of both <ref>s and <ptr/>s is the same as the element to which it is pointing. The difference between the two is that <ref> and </ref> surround text (e.g., a footnote label or a reference to a bibliography entry), whereas the <ptr/> element stands alone (it is represented in the text by a pointer icon, generated automatically by the software). The <ptr/> element is an empty element, and the value of its target attribute points to the id of the appropriate <bibl> element which is the target of the most recent, preceding, <ref> element (usually in the same paragraph), unless the context dictates otherwise (e.g., a subsequent author's work has been re-referenced, thus changing the target context in the corresponding <bibl> element).

For "ptr," target points to the respective bibliography entry id if the article has a bibliography, just as is the case with ref.  So, for instance, in a given paragraph if you have ". . . .(Jones 1986:35)," would be tagged as ". . .(<ref target="b15">Jones 1986</ref>:35)" and if you later had ". . . (36)" you would tag this as ".  . .(<ptr target="b15" />36)" assuming that the book by Jones in 1986 is #15 entry in the <bibl>.

The element "xref," used to point to places beyond/outside the given article's instance requires one of three values for the "type" attribute: "pub" for a publication of any kind (book, article, film, etc.); "can" for a canonical citation (see immediately below); and "web" for any online resource.

IMPORTANT, for citations and notes: Using the determinations for titles listed below, after a title is identified, it is used in creating the id for any reference as follows: title.volume.kind.pages. In this syntax, "title" is the title--or abbreviation thereof--for the book or journal in the reference, volume is 3 digits, and all are 000 if it is a single-volume book, otherwise the volume number for the series is given, "kind" is "bk" for book, or "xx" (where "xx" is teh issue number, 2-digit RJZF, for articles in a journal (in the case of "spring," "summer," "fall," "winter" being used in lieu of issue numbers, "01," "02," "03," and "04" would be the appropriate issue numbers), and pages is four digit RJZF. The value is the first page of the citation. In the case of a reference to an entire book, "0000" is given. Cf. the following examples (note also the required "type" attribute):

<xref target="The Language and Linguistic Background of the Isaiah Scroll (1QLSAa)_000_bk_0422" type="pub">
E. Y. kutscher, <title level="m">The language and Linguistic Background of the Isaiah Scroll (1QISa<emph rend="sup">a</emph>)</title>
(STJD 6; Leiden: Brill, 1974) 422

<xref target="Laternum_048_00_0070" type="pub">
J. A. Soggin, &ldquo;Il &lsquo;segno di Giona&rsquo; nel libro del profeta Giona,&rdquo; <title level="j">Lateranum</title> 48 (1982) 70&ndash;74

NOTE: as in this case with Laternum, where there is no specific issue number, "00" is given for the issue.
NOTE: where a title has special characters marked in the actual text with "emph" (such as the superscript "a" in <title level="a">1QISa<emph rend="sup">a</emph><title> above, remove the "emph" tags and just type the letters contained in the emph as normal characters.

NB: it is therefore necessary for the ATLA-CERTR team to write scripts which search for these "00" occasions and place them in a standard issue numbering framework.

Canonical Citations

Biblical and other citations will take one of several forms: I Cor. 1.1; or I Cor. 1:1; or I Corinthians 1.1/1:1; or 1 Cor/Corinthians 1.1/1:1; Gen. 1:1/1.1; or Genesis 1.1/1:1; and so on. By default, any text string beginning with an Arabic or Roman numeral, followed by an additional text string beginning with a capitalized/upper case letter, followed by any string of Arabic numerals should be tagged with a <xref> tag as indicated above, with the target attribute value generated from the tagged text wherein: E.g., <xref target="1Cor1.1" type="can">I Cor. 1:1</xref>. A list of valid abbreviations for canonical references will be supplied separately. Here is a sample from the test document instance:

Table for standardized abbreviations to be provided separately to this document. See http://rosetta.atla-certr.org/xml/entities/ .

Keywords, Names, Abbreviations, and Quoted Terms

Keywords will be identified contextually based upon the following parameters. All terms offset by single " ' " or double " " " quote marks will be tagged as <hi> with the attribute 'rend="sq"' or 'rend="dq"' respectively. Terms found within quoted passages, citations, titles, or block quotes will not be tagged in this manner. Keywords designated by bold text will receive a <hi> tag with the attribute 'rend="b"'. Where "hi" is used, the "rend" attribute must be supplied wherein one of the following is found: dq, sq, u/underline, i/italic, b/bold, u2/double-underline, sub/subscript, sup/superscript, strk/strike-through.

Conceptually, we are making a general categorical distinction between lexigraphic distinctions (e.g., one letter of a word in superscript) which are tagged with "emph" and the appropriate rend value; and semantic distinctions (e.g., a whole word in italics), to be marked with "hi" and the appropriate rend value. It is possible, therefore, that within a "hi" you might have one letter marked with "emph."

For proper names, the "name" tag is used, with the required "n" attribute for any "author," or "topic" (some term from the article's title), or "place" for some geographical references.

Abbreviations are of two kinds, with a required attribute of "type" allowing either "acron" for an all-capital/uppercase acronym (unless in italics, in which case a title tag applies, usually for a journal) and "other" for any other abbreviation.

Italics, Bold, and Underline & Titles

Work titles will usually have the first letter of the first word capitalized, in which case a <title> tag should be used. In the case of italic or underline text, the most likely option is that a work title is being referenced. Use the <title> tag and the "level" attribute where discernable from context, e.g., "a" for a title of a poem, article, or other subunit of a larger item such as a book; use "m" for a book, collection, or other item published as a distinct item such as even volumes of a series like encyclopedia's; "j" is a journal title; "s" is a series title; and "u" is an unpublished material's title such as a thesis or dissertation.

It is possible to distinguish titles from keywords (i.e., items to be tagged as <hi>) based on a few contextual norms. Title's are often distinguishable by the presence of surrounding double-quote marks, italic or underline style, and are usually preceded by a preposition: e.g., "in Moby Dick," or "from Moby Dick we have," or "with Moby Dick," for example. In addition, titles are also used as proper nouns to initiate a clause or begin a sentence: "Moby Dick contains a multitude of references . . . ." Finally, titles frequently follow a reference in the possesive: "Melville's Moby Dick . . .. " or "his Moby Dick . . . ." These contextual occasions should be tagged with "<title>" with the best possible practice for the value of the "level" attribute. NOTE: To distinguish between "Moby Dick" the character and Moby Dick the work title, look for font/context clues. Finally, abbreviations in italics are also likely to be titles of journals (with level="j" attribute).

The second option is that a keyword is being identified (if it is found within a quoted passage, citation, or blockquote, do not tag it separately, provide a generic "emph" tag with a "rend" attribute denoting the format--e.g., as above, u/underline, i/italic, b/bold, u2/double-underline, sub/subscript, sup/superscript, strk/strike-through) and so it should be tagged as <hi rend="i"> or <hi rend="u"> depending on the way the text is formated. In any case, if the term is foreign, be sure to add the "lang" attribute with a suitable value from the table above. Terms in bold should receive an "emph" tag, with the addition of the "lang" attribute as appropriate.

IMPORTANT: For titles, it is also required that the level be provided, e.g., the "m" for monograph (book, etc.), "a" (for article), etc.

Quotations and Quotation Marks

All direct quotations should be marked as <q rend="inline">. If a word is emphasized for some reason using quotation marks, use <hi rend="dq"> for double quote, and <hi rend="sq"> for single quotes as noted above. Quotes marked with "q" elements have a required attribute of "rend" for "block," "other," or "poem." Poems use the "l" line tag.

IMPORTANT: All smart quotes--single or double (i.e., right and left hooked quotes)--are to be normalized to straight quotes.

Block quotes should be marked with the <q rend="block"> tag. Special attention needs to be paid to how block quotes interact with paragraphs. A block quote may appear in the middle of a paragraph or at the end of a paragraph (rarely at the beginning), and it may contain more than one paragraph itself (though typically it doesn't). A block quote that appears in the middle of a paragraph should be tagged like this:

<p>This is the paragraph leading up to the block quote, and here comes
the quote:
<q rend="block">This is a block quote.  It is not a separate
This is a continuation of the original paragraph.  You can tell it is a
continuation because the first 
word of the preceding sentence is not indented.</p>
If the block quote appeared at the end of the paragraph, this is what you would see:
<p>This is the beginning of the paragraph.  It's followed by a block
quote (without a colon 
this time--the punctuation can vary).
<q rend="block">This is the block quote.  It ends the
<p>This is a new paragraph.</p>
A block quote that contains one or more paragraphs will indicate this by indentation within the block quote itself. It should be tagged thus:
<p>Here's the beginning of the paragraph.  It contains a block quote that
is broken into two 
paragraphs.  However, the block quote is still part of this paragraph.
<q rend="block"><p>Note that the quote begins with a paragraph tag. 
Now here's 
another paragraph, still within the block quote.</p>
<p>This is a second paragraph within the block quote.  It's also the end
of the block 
The paragraph continues, because there is no indentation in the

Poems and Lines of Verse

The occasions for lines of verse will use the "lg" and "l" tag heirarchy.  The entire section will be marked with "lg" and individual lines will be marked with "l."   "l" also requires that the "rend" attribute be required, to specify either "verse," "drama," or "other." Such as:

                        <lg xml:space="preserve">
                            <l rend="verse" xml:space="preserve">A line of verse</l>
                                        <l rend="verse" xml:space="preserve">A line of verse indented</l>
                            <l rend="verse" xml:space="preserve">A line of verse</l>

                            <l rend="verse" xml:space="preserve">A line of verse with stanza spacing</l>

IMPORTANT: Preservation of indenting and stanza spacing is imperative.  The "l" element has the "xml:space" attribute FIXED at the "preserve" value, and should have its value set to "preserve" for "lg" to preserve stanza and line spacing, and "l" must also use the "preserve" option for this attribute to maintain proper indenting based upon spaces, tabs, and hard returns keyed.


All hyphens at the end of lines should be removed, unless the word is normally hyphenated. If a word is broken across a page boundary by a hyphen, the hyphen should be replaced by a <pb> tag, like this (from pp. 656-657):
appropria<pb id="***n0002-7189_066_01_0657">tion
It is also important that dashes ("em-dash") in a word or sentence punctuation be designated by the proper &#x2014; instead of the standard keyboard input of the dash or "minus" key.


On rare occasions, an issue--e.g., of Biblical Archeology, will have a loose insert of a map. For keying, these will be handled with the "add" tag for additions. This tag will be inserted at the beginning of the first <p> child of the div1 element, and shall contain a "figure" tag wrapped around the text string "Insert" with standard identifiers, and the letter "i" appended to the page number string, the page number will be that of the first page of the article. PDCC will receive word from quality control and Digitization Coordination as to which article a given insert is to be attached to, or which articles thereof.
<add place="supratext" resp="pub" n="map" id="n0021-9231_114_01_0395i">
<figure id="n0021-9231_114_01_0395i">Insert</figure>
The DTD has been revised to allow for values of "map" or "other" for required "n" attribute, and "pub" is the FIXED value for "resp" as only the publisher will be responsible for adding an insert. Place is also to be FIXED as "supratext." Id is required and should match the id value of the "figure" element.


Pages with images should have a "figure" tag such as the following, with the syntax for the required "id" attribute value to be issn.volume.issue.page#.gif in the standard 3-,2-, and 4-digit RJZF format. In the example below, this refers to a figure for page 395, but it  points to the entire GIF of that page, in volume 114 of journal 0021-9321:
<figure id="n0021-9231_114_01_0395.gif"></figure>