- 3.1.Encoded Text Areas
- 3.2.Normalization of Characters and Spellings
- 3.2.1.Silent Normalizations
- 3.2.2.Non-Standard Characters
- 3.2.2.1.Documentation of Non-Standard Characters
- 3.2.3.Numerations
- 3.2.3.1.Pagination
- 3.2.3.2Column Numbers
- 3.2.3.3.Marginal Numbers
- 3.2.4.Abbreviations and Printing Errors
- 3.2.4.1.Abbreviations
- 3.2.4.2.Corrections
- 3.3.Page, Column, and Line Breaks
- 3.4.Hyphenation
- 3.5.Loss of text and conjectures
- 3.6.Typographic Styles and Text Alignment
- 3.7.Graphic Elements
- 4.1.Structuring
- 4.1.1.Multi-Volume Works
- 4.1.2.General Structure of a Text
- 4.1.3.Title Page(s)
- 4.1.4.Structural Text Units
- 4.1.5.Typographic and Argumentative Paragraphs
- 4.1.6.Lists
- 4.1.7.Verse Text
- 4.1.8.Notes and Comments
- 4.1.8.1.Position
- 4.1.8.2.Symbols
- 4.2.Identification and Linking of Text Elements
- 4.2.1. xml:id
- 4.2.2.Cross-References
- 4.3.References and Semantic Text Enrichment
- 4.3.1.Attributes for References and Normalizations
- 4.3.1.1. ref
- 4.3.1.2. Variants of @ref
- 4.3.1.3 key
- 4.3.1.3.1.Normalization of Proper Names and Work Titles
- 4.3.1.4. sortKey
- 4.3.1.5. n
- 4.3.2.Internal References
- 4.3.2.1.Lemmata
- 4.3.2.2.Authors
- 4.3.2.3.Works
- 4.3.3.External Linked Data
- 4.3.3.1.Persons
- 4.3.3.2.Places
- 4.4.Bibliographic References
Edition Guidelines
For questions, please contact us at info.salamanca@adwmainz.de.
Andreas Wagner, Cindy Rico Carmona, Marie-Astrid Hugel - Last Updated: December 2024
1. Basics
1.1. Introduction
The documents published on this website have been encoded within the scope of the project "The School of Salamanca. A Digital Collection of Sources and a Dictionary of its Juridical-Political Language" of the Akademie der Wissenschaften und der Literatur | Mainz. Besides creating a dictionary, the project aims at a freely available and easily accessible collection of important texts from the discoursive context of the so called "School of Salamanca", which are being digitized for this purpose (usually in the form of their initial edition). A critical revision, or a recourse to manuscript templates, for instance, has been refrained from. The total corpus of works can be found at here.
1.2. About the Digital Edition
The digital edition is organized into two main sections based on content. The first section includes fully edited works, selected for their significance as the most important contributions of the School of Salamanca. The second section contains reference works, frequently cited by the authors.
In terms of technical treatment, fully edited works undergo thorough technical revisions and scientific corrections to ensure their suitability for online publication. These works are divided into two groups:
- was selected as a representative corpus with texts of varying scope, text types, and languages. It consists of works that are both manually and technically corrected, serving as models (training corpus) from which linguistic resources are extracted for the development of automated correction tools.
- which includes the remaining works to be fully edited, in which only automated correction tools are applied.
In contrast, reference works receive basic technical editing and are published primarily as resources to support the research community. They are not subjected to the same level of thorough revision as the fully edited works, but they provide essential references and context for primary research.
The following table illustrates the plan of edition of these groups.
Plan of Edition: Annotation, Enrichment and Publication | Fully Edited Works Group A: Training Corpus | Fully Edited Works Group B | Reference Work |
---|---|---|---|
1. Facsimiles and Catalogue: Online Publication | ✓ | ✓ | ✓ |
2. Transcription in TEI-Tite through double keying and/or manually corrected OCR | ✓ | ✓ | ✓ |
3. Semiautomatic Structural Annotation and Unclear Resolution (First Round) | ✓ | ✓ | ✓ |
4. Automatic Abbreviation Annotation with Regex XSLT* | ✓ | ✓ | - |
5. Automatic TEI-Transformation: TEI-Tite to TEI-All | ✓ | ✓ | ✓ |
6. Automatic Transcribed Hyphenation Annotation | ✓ | ✓ | ✓ |
7. Automatic Special Characters Annotation | ✓ | ✓ | ✓ |
8. Automatic xml:id(s) Annotation | ✓ | ✓ | ✓ |
9. Automatic Unmarked Hyphenation Annotation with Dictionary* | ✓ | ✓ | ✓ |
10. Automatic Abbreviation Annotation with XML-Lists and XSLT* | ✓ | ✓ | - |
11. Automatic Hyphenated Abbreviation Annotation with Python* | ✓ | ✓ | - |
12. Manual Resolution of Remaining Unclear Cases (Second Round) | ✓ | ✓ | - |
13. Manual Editing (Remaining Abbreviation Resolution and Print Error Correction) | ✓ | - | - |
14. Online Publication: (text-image) reading view, full text search | ✓ | ✓ | ✓ |
15. Diplomatic View: close to the original source text | ✓ | ✓ | ✓ |
16. Constitutive View: normalized | ✓ | ✓/- | - |
* Tools, constantly being improved and derived from Group A
For validation and well-formedness control, schematrons and a schema (SvSal TEI) are employed, specifically designed to meet the research demands of the project.
At this, the edition of the sources is not to be considered a strictly sequential process of text preparation and enrichment towards a single and "ultimately" published edition text, but rather makes use of the possibilities of the digital medium in that already published works are further enrichable (possibly involving the larger scholarly community). The multidimensionality of the research demands as well as the step-by-step (re-)editability of the texts pose, at the same time, challenges to the digital edition when it comes to editing and publication of the texts, which are to be encountered, for instance, documented in the <teiHeader> under <revisionDesc>.
1.3. About the Edition Guidelines
The edition guidelines at hand (henceforth "the guidelines") are to be understood as XML encoding guidelines with the purpose of:
- documenting in a precise way the regulations and special features with regards to which the texts of the digital edition were created.
- describing the fundamental structure of the text encoding and markup. In case of potential uncertainties during preparation steps (Double Keying, semi-automatic corrections, and prearrangements) these guidelines are to be consulted.
- providing support during the process of scholarly annotation of the texts. Particularly in cases of doubt, the guidelines are meant to be the first reference.
- pointing out which of the recommendations stated here are to be understood as optional. This type of recommendation is marked by means of a OPTIONAL symbol. If not explicitely stated as optional, recommendations are mandatory.
The comprehensive SalTEI tagset cannot be covered in its completeness at this point; for this purpose, please refer to the technical reference documents linked below. The guidelines rather aim at describing the applied standards of text encoding, such as the handling of special characters, abbreviations, foot notes or marginal notes, damages in original samples, citations, annotations of person and place names, etc. They explain how special characters are represented, or how references to lemmata, works, or authors are realized by means of TEI XML.
In addition to the examples mentioned in the following, exemplary use cases (in fact, all use cases) can be extracted automatically from our texts on our codesharing page. For that, we have installed the comfortable codesharing service developed by Martin Holmes for the Map of Early Modern London project.
1.4. Technical Documentation and Downloads
Definitions and documentations for the edition's TEI P5 adaptation are available in the form of the following data:
Furthermore, the following data are available for download:
- .xml (the documentation file at hand)
- .pdf (overview of the standard and non-standard characters utilized in this edition)
2. Text Editing: Statuses and Revision History
2.1. Editing Status
In the process of editing and annotation, texts generally traverse different statuses of processing, their ordering not everywhere to be understood as sequential; for instance, a text may traverse several times the h_revised and g_revised_approved statuses within the scope of scholarly annotation and editing. These editing statuses also serve internally to indicate when a text is ready for publica-tion. Once g_enriched_approved is applied, the text can be published.
-
a_raw
The text, encoded through double keying and/or OCR, has been converted to a simple TEI file conforming to the schema specifications. -
b_cleared
Uncertainty marks by the typists have been resolved through comparison with the original reference. -
c_hyph_proposed
Propositions for hyphenations, corrections, and/or abbreviation expansions have been created. -
d_hyph_approved
Propositions for hyphenations, corrections, and/or abbreviation expansions have been accepted by the project's scholars and have been implemented in the text. -
e_emended_unenriched
Enrichment of the text (annotation of special characters, annotation of persons, places, bibliographic references, and linking to datasets of authors and of the dictionary) is in progress. -
f_enriched
Enrichment of the text has been done. Four-eye examination and correction of the enrichment annotations is still pending. -
g_enriched_approved
The examination and/or correction of the enrichment has been done. The text has been released officially, publicly and in a persistently quotable way and thus stands at the disposal of the scholarly community. -
h_revised
Further propositions for correction and enrichment of the text, collected during a qualifying period and potentially originating from third parties within the scholarly community, have been added to the text. A four-eye examination and correction of the further annotations is still pending. -
i_revised_approved
Further propositions for correction and enrichment of the text have been accepted by the project's scholars, the text has been released anew (i.e., with new persistent identifiers), although older versions are still available. -
z_final
The text has been "frozen" in the form of a version not further modifiable at the end of all editing phases and for the purpose of long-term archival and/or presentation of the project results.
2.2. Revision History
The revision history for a text records all changes of/within
the TEI document and is maintained within the revisionDesc
element of the teiHeader
. At this,
the current editing status is stated in the status
attribute of revisionDesc
.
<revisionDesc status="g_enriched_approved"> <listChange> <change status="g_enriched_approved">Generated @xml:id.</change> <change status="g_enriched_approved">teiHeader update.</change> <change status="g_enriched_approved">Tagged special characters.</change> <change status="g_enriched_approved">Correct choice/(pb|cb|lb) pairings.</change> <change status="g_enriched_approved">Fixed order of break attributes (@rendition and @break) and removed whitespace before non-breaking elements.</change> <change status="g_enriched_approved">Post-correction fixes.</change> <change status="g_enriched_approved">Reduced excessive whitespace.</change> <change status="g_enriched_approved">Second round of corrections (CB).</change> <change status="g_enriched_approved">Reduced excessive whitespace.</change> <change status="g_enriched_approved">First round of corrections (CB).</change> <change status="f_enriched">Automatically expanded abbreviations (la-main).</change> <change status="f_enriched">Tag unmarked breaks (la).</change> <change status="f_enriched">Generated @xml:id.</change> <change status="f_enriched">Numbered lines.</change> <change status="f_enriched">Tagged special characters.</change> <change status="c_hyph_proposed">Annotated hyphenated breaks.</change> <change status="a_raw">Transformation from TEI-Tite to TEI-All.</change> <change status="a_raw">Added @targets to ref in summaries from milestones' @xml:id.</change> </listChange> </revisionDesc>
3. Editorial Interventions
3.1. Encoded Text Areas
The parts of text relevant, transcribed and encoded within the scope of the edition comprise, firstly, the "main area" of the original text, that is, the (potentially multi-column) part of text at the center of a page that itself does not encompass other parts of text on that page. Furthermore, from the marginal area of text encompassing the aforementioned main area, marginal and foot notes are encoded, as well as page or folio numbers. However, running heads/titles, signature marks, or catchwords are not encoded. Equally, manually written elements (handwritten notes, additions, symbols, drawings, marks, etc.) occurring on print pages are not encoded or documented.
For the annotation of graphic elements, please refer to the specific section.
3.2. Normalization of Characters and Spellings
3.2.1. Silent Normalizations
- Ligatures (, etc.) are silently resolved; however, the digraphs are adopted from the original text. The hereby underlying assumption is that these digraphs form an orthographic unit consisting of two letters and, accordingly, one sound, whereas ligatures can be regarded as a merely typographical phenomenon. The systematic differentiation in this case is not simple, but rather controversial; yet, in our case it only applies to these mentioned three digraphs. At any rate, both phenomena (digraphs and ligatures) are, furthermore, possibly subject to normalization or expansion (see below).
-
The long, or descending, s () was handled depending on the origin of the double-keyed
text transcriptions. How a text handles the spelling of s is documented within the
teiHeader
'sencodingDesc
. Furthermore, in case of a text not containing resolved , a formal algorithm for converting to is provided; however, the contrary case of converting from to cannot be defined unambiguously in our case. - The utilization of blank space was adjusted, to the greatest extent, according to modern conventions, making clear the correct compounding and hyphenation of words, which is not necessarily visible in the print image of the original sample.
-
Citations/quotations are represented by means of
q
orquote
elements, respectively (cf. the TEI Guidelines), potentially existing quotation signs from the original text are thereby omitted. - Identified transcription errors (introduced during the double keying phase of text encoding and not existent in the original source) are silently corrected in Group A.
3.2.2. Non-Standard Characters
Non-Standard characters are encoded as Unicode characters as far as possible. Characters of the so called "Codepage Latin-1", meaning the first two Unicode blocks [Basic Latin] and [Latin-1 Supplement] – hence, characters with a codepoint below x0100, the first character of the [Latin Extended-A] block – are encoded directly based on the Unicode chart, i.e., as a precomposed character (for example, or ).
Further Unicode characters are encoded either as precomposed characters as well, or as numeric character references (NCR) (for instance: ẽ stands for ). In both cases, potentially, character combinations using combining characters such as accents, tildes, or apostrophes, may be utilized (see, for instance, ẽ as ẽ). Generally, the edition's XML data are declared as <?xml encoding="UTF-8"?> (see TEI P5 Guidelines, vi.2).
If non-standard characters are not unambiguously definable as Unicode characters and are also not expressible as a combination of such characters, the recommendations of the Medieval Unicode Font Initiative (in version 4.0) apply, if possible: .
At least all non-standard characters not available in the (see above)
are declared, with recourse to the
TEI module,
within the teiHeader
's character declaration charDecl
(even some characters available in the are declared
if they are to be normalized for the reading view, such as ). The charDecl
also states which replacement for a specific character may be utilized as a normalized
variant of that character. Accordingly, the rendering of the web application's reading
view
usually conducts a replacement based on the information of the charDecl
, although
the replacement may also be extracted from the content of the g
tag (the latter being used
for in-text references to non-standard characters) at the respective point in text.
The declaration of a non-standard character and its mappings (to a standardized or
normalized character, to
an HTML entity, with regards to the recommendations, etc.) are
obtained, primarily, from the ENRICH gBank;
then, the "prerendered" Unicode characters (if available) or NCR entities stated in
the
(or in the respective MUFI specification) are added to the declaration as
mapping
s of the type
s of
composed and precomposed.
Generally, the compliance with the current MUFI recommendation is reviewed and the
url
value of the graphic
element is updated.
Expansions of words that are abbreviated by means of non-standard characters
are not resolved via the teiHeader
's declaration about normalized/standardized variants,
but are individually stated by the editing person within the scope of a "genuine"
abbreviation expansion
by means of the expan
element (within choice
) and without the use of
a g
reference for the non-standard character (which then has been resolved) in the expan
ded word.
Alternatively, the abbreviation character may also be resolved within the text content
of the g
tag.
For more detailed information about the handling of abbreviations, please refer to
the specific section.
Examples of non-standard character declarations:
<teiHeader> <charDecl> <char xml:id="char2184"> <desc>LATIN ABBREVIATION SIGN SMALL CON</desc> <charProp> <unicodeName>entity</unicodeName> <value>conbase</value> </charProp> <mapping type="precomposed">ↄ</mapping> <mapping type="standardized">con</mapping> </char> <char xml:id="charebd1"> <desc>LATIN SMALL LETTER D ROTUNDA WITH DOT ABOVE</desc> <desc>LATIN SMALL LETTER D ROTUNDA + COMBINING DOT ABOVE</desc> <charProp> <localName>entity</localName> <value>drotdot</value> </charProp> <charProp> <localName>combined-entity</localName> <value> drotdot = drot + combdot </value> </charProp> <mapping type="MUFI" subtype="PUA">U+EBD1</mapping> <mapping type="MUFI" subtype="Combined">U+EBD1 = A77A + 0307</mapping> <mapping type="precomposed"></mapping> <mapping type="composed">ꝺ̇</mapping> <mapping type="standardized">d</mapping> <graphic mimeType="image/png" url="http://www.manuscriptorium.com/apps/gbank/data/mufi-graphic/ebd1.png"/> </char> <char xml:id="char0111"> <desc>LATIN SMALL LETTER D WITH STROKE</desc> <charProp> <unicodeName>entity</unicodeName> <value>dstrok</value> </charProp> <mapping type="precomposed">đ</mapping> <mapping type="composed">d̄</mapping> <mapping type="standardized">d</mapping> </char> <char xml:id="chari0303"> <desc>LATIN SMALL LETTER I WITH TILDE</desc> <mapping type="composed">ĩ</mapping> <mapping type="standardized">i</mapping> </char> <char xml:id="charp0301"> <desc>LATIN SMALL LETTER P WITH ACUTE ACCENT</desc> <mapping type="composed">ṕ</mapping> <mapping type="standardized">p</mapping> </char> <char xml:id="charp0307"> <desc>LATIN SMALL LETTER P WITH DOT ABOVE</desc> <mapping type="composed">ṗ</mapping> <mapping type="standardized">p</mapping> </char> <char xml:id="chara7590303"> <desc>LATIN SMALL LETTER Q WITH DIAGONAL STROKE AND TILDE</desc> <mapping type="composed">ꝙ̃</mapping> <mapping type="standardized">q</mapping> </char> <char xml:id="chare8bf"> <desc>LATIN SMALL LETTER Q LIGATED WITH FINAL ET</desc> <charProp> <localName>entity</localName> <value>q3app</value> </charProp> <mapping type="MUFI" subtype="PUA">U+E8BF</mapping> <mapping type="precomposed"></mapping> <mapping type="standardized">q</mapping> <graphic mimeType="image/png" url="http://www.manuscriptorium.com/apps/gbank/data/mufi-graphic/e8bf.png"/> </char> </charDecl> </teiHeader> <body> ... <choice><abbr>at<g ref="#chare8bf">q</g></abbr><expan resp="#AW" cert="high">atque</expan></choice> ... <g ref="#e665">p</g> ... </body>
3.2.2.1. Documentation of Non-Standard Characters
An overview of the complete set of non-standard characters encountered thus far within the scope of the digital edition, and of their respective encodings, is available in the Encoding Table of (Non-Standard) Characters.
The XML declarations of all these characters and their encodings are to
be found in the following TEI file:
specialchars.xml
(The encodingDesc
within this file is part of each TEI dataset for a work in this edition.)
With regards to the encoding of non-standard characters, abbreviations, and ligatures please also refer to the section above, and for the ("historic") practices of text encodings within the scope of this project during the transcription phase, see also the general transcription guidelines and the specific transcription guidelines (available in German only).
3.2.3. Numerations
3.2.3.1. Pagination
Page or folio numbers are encoded through the n
attribute of the pb
element (see also Page, column, and line breaks). Generally,
page or folio numbers are adopted from the original source; however, in the event
of original numbers
being false or not existent, the numeration is corrected or added, and the correct
or added number
is marked by means of square brackets "" within the value of n
:
<pb n="[443]" facs="..." xml:id="..." resp="#DG" cert="high"/>
In case a work (or part of it) is paginated folio-wise, with only the front side ("recto") of a sheet being numbered, the (existing) folio number of the front side is complemented through an "r" suffix, whereas the (non-existing) back page ("verso") number is added by adopting the front page number and adding a "v" suffix:
<pb n="26r" facs="..." xml:id="..."/> ... <pb n="[26]v" facs="..." xml:id="..." resp="#DG" cert="high"/> ... <pb n="27r" facs="..." xml:id="..."/>
With sections before the main part of the text (see also General Structure of a Work/Volume) that do not have any numeration, regularly incrementing pagination (regardless of the type of numeration of the main part) by means of Roman numerals is added; with unnumbered parts after the main part, the (type of) numeration of the main part is carried on in the way explained above (for example, marked through square brackets).
If existent in the original source, column numbers
are encoded – in a way analogous to
page numbers
– through the n
attribute of element cb
, possibly
in a "normalized" form by means of added or corrected numbers (in case of
partially lacking or incorrect numbers in the original source). If there is no
numeration of columns in the original source, neither is one added in the TEI encoding.
3.2.3.3. Marginal Numbers
Argumentative shifts in (the content of) the text, usually signified on a
typographic level by means of
marginal numbers, or or symbols, are encoded by means of milestone
tags (see also
Typographic and Argumentative Paragraphs). At that,
marginal numbers signifying shifts in such a way are recorded through the n
attribute
of the respective milestone
tag, with definitely missing or incorrect numbers being
added in a "normalized" form (i.e., marked by means of square brackets).
<milestone unit="article" n="[99] rendition="#dagger" xml:id="..." resp="#DG" cert="high"/>
3.2.4. Abbreviations and Printing Errors
Abbreviations and printing errors are encoded and resolved within choice
elements (thereby being
documented through the XML code, although not visible, for example, in the default
reading view of the digital edition).
These normalizations/corrections are always conducted on a per-token base, resolving
whole words instead
of single characters. Within choice
elements, those sub-elements being used for recording the expansion/resolution
of an abbreviated (expan
) token, or the correction of a token (corr
), are
provided with attributes stating the editor responsible for the normalization (resp
)
as well as her/his certainty (cert
) regarding the resolved text.
In case of a page, column, or line break (see also
Page, Column, and Line Breaks)
occuring within a token to be expanded (encoded
by means of element abbr
) or corrected (sic
), the respective break is
recorded also within the expanding (expan
) or correcting (corr
) element, at the point within the token
corresponding as far as possible to the original point, and a corresp
attribute in the
respective break element (pb
, cb
, lb
) referring to the
xml:id
of the "original" break element.
3.2.4.1. Abbreviations
Abbreviations are marked through abbr
elements and, in conjunction
with their expanded form recorded in expan
, embedded within choice
elements.
The editing person responsible for the expansion states her/his certainty with regards
to the expansion by means of an
cert
attribute:
<choice> <abbr>Reverẽdiss.</abbr> <expan resp="#AW" cert="high">Reverendissimum</expan> </choice>
The set of abbreviations also contains the "" and "" breviographs, either of them being resolved by means of the "" expansion.
IMPORTANT When tokens abbreviated through non-standard characters (see also
Non-Standard Characters) are
resolved, the respective non-standard character only is stated within the non-expanded
element (abbr
)
by means of a g
tag, not within the expanded element (expan
), which contains the
resolved token not including the original non-standard character (or g
tag, respectively) anymore.
... <choice> <abbr>at<g ref="#chare8bf">q</g></abbr> <expan resp="#AW" cert="high">atque</expan> </choice> ...
3.2.4.2. Corrections
Identified printing errors are encoded by means of sic
(containg
the erroneous form of the token) and
corr
(containing the corrected form) elements within a choice
element:
<choice> <sic>Vitora</sic> <corr resp="#IC" cert="high">Vitoria</corr> </choice>
By contrast, identified transcription errors – resulting, for example, from the double keying of the text – are resolved silently.
3.3. Page, Column, and Line Breaks
Page, column, and line breaks are marked by means of empty elements
pb
, cb
, or lb
at the beginning
of a page, column, or line, respectively. With page or column beginnings, original
page numbers or (if existent)
column numbers are recorded within the n
attribute.
In case of a shift of column layout within the scope of a page, the beginning of a
multi-column
layout is marked through an attribute type="start" in cb
, its end through type="end" (these
elements imply, at the same time, the beginning of the new/first column,
or of the non-column layout, respectively).
pb
, cb
, and lb
all contain xml:id
attributes
(see xml:id). With lb
, the value of xml:id
– more
precisely, its last four alphanumeric signs – contain information in the form of a
line numbering (with other
elements, the hindmost signs of xml:id
contain no relevant information). At this, the first of these
four signs signifies the page layout area of the current line (0 = main area of the
text, no columns;
1 = main area, first column; 2 = main area, second column, and so forth; m = marginal
area), the latter
three signs state the position of the line, relative to previous lines in the same
area of the page
layout. This type of line numbering serves primarily for project-specific or corpus
linguistic means,
but plays no role, for instance, when it comes to the display of the edition's reading
view, where line numbers are
not stated.
With regards to the positioning of the previously mentioned break elements – of which
there currently appears to be no clear consent in the TEI community – the following
rule applies
within the scope of this edition: In case a page, column, or line beginning occurs
in conjunction with
a respectively encoded "conceptual" text element (e.g., head
, p
, div
, note
,
list
, item
, titlePage
, titlePart
, and others),
the break element (pb
, cb
, lb
) is positioned as child occurring in the structure of this conjunction. In the event of more than
one break element (also including "anchor" elements such as milestone
) occurring conjunctly, the
following order applies: pb
, cb
, lb
, milestone
, other elements.
Consider the following, comprehensive example:
<div> <head> <pb facs="facs:W9998-B-0015" n="34" xml:id="W9998-02-pb-0015-d78a"/> <lb xml:id="W9998-02-0015-lb-0001"/><hi rendition="#r-center">Caput VIII</hi> </head><!-- heading centered above the two following columns --> <cb type="start" xml:id="W9998-02-0015-cb-66d7"/><!-- beginning of multi-column layout --> <lb xml:id="W9998-02-0015-lb-1001"/>... first line of first column ... <lb xml:id="W9998-02-0015-lb-1002"/>... second line of first column ... ... <cb xml:id="W9998-02-0015-cb-66d8"/> <lb xml:id="W9998-02-0015-lb-2001"/>... first line of second column ... ... <cb type="end" xml:id="W9998-02-0015-cb-66d9"/> <!-- end of multi-column layout --> <lb xml:id="W9998-02-0015-lb-0002"/> ... </div>
Blank pages (i.e., pages without any content) after the title page and before the last page are represented by means of <pb type="blank"/>; blank pages before the title page or after the last page of a work may be omitted altogether.
3.4. Hyphenation
Hyphens occurring at the end of lines are not retained in
the text, but encoded by means of an attribute rendition="#hyphen" within
the respective lb
element. In the event of several immediately consecutive breaks
(e.g., pb
+cb
+lb
) this attribute is only set within the first
such break (element).
<lb xml:id="W0013-02-0927-lb-0131"/>Simonia omnis an sit iure di<lb break="no" rendition="#hyphen" xml:id="W0013-02-0927-lb-0132"/>uino prohibita, & an ali<pb break="no" rendition="#hyphen" facs="facs:W0013-B-0928" n="[437]" xml:id="W0013-02-0928-pb-6c77"/><cb break="no" rendition="#noHyphen" xml:id="W0013-02-0928-cb-0d4b"/><lb break="no" rendition="#noHyphen" xml:id="W0013-02-0928-lb-0001"/>qua sit solùm iure positiuo. 12. 172
- Separations of syllables marked in the original source text by means of hyphens
are encoded through a respective break element obtaining an attribute
rendition
with value #hyphen. - Separations of syllables not marked in the source
text by means of hyphens are encoded through the
rendition
attribute with value #noHyphen within the respective break element. - Furthermore, separations – regardless of their
markedness in the source text – are annotated
through an attribute break="no" within any intervening
break elements. At this, break="no" signifies the coherence
of a word divided by means of the respective break element(s). It is important
that in such cases no whitespace (i.e., no blank, tab, or newline character) occurs
outside
of these elements; newlines are encoded in the XML document within
pb
,cb
, andlb
elements, by preference immediately after the element name and before thebreak
attribute. - By contrast, a "normal" line break not dividing a word is encoded
by means of a simple
lb
element (see Page, Column, and Line Breaks).
3.5. Loss of text and conjectures
In case of text loss due to censorship on the scale of single characters,
words, or a sentence, the respective range of text is marked through a del
tag; text loss on the
scale of longer passages of text is annotated by means of an empty delSpan
tag at its beginning, with an attribute spanTo
within delSpan
referencing the
xml:id
of an (empty) anchor
tag at the end of the passage.
The attribute cause
of del
or delSpan
, respectively, obtains the value
censorship.
Text passages barely legible or utterly illegible due to reasons other than censorship
are marked by means of damage
or (analogous to delSpan
, see above) damageSpan
elements, with OPTIONAL an attribute agent
stating the reason of damage (such
as water, rubbing, tearing, ink, amongst others).
If the text within a passage of these types is still readable, it is annotated by
means
of an unclear
element, the reason
attribute of which corresponds to the larger "surrounding"
element in that it states either damage or deletion.
The editing person records her/his responsibility for the passage within resp
(using a #xx abbreviation of the name initials as value)
and his/her certainty with regards to the resolved text within cert
, which
obtains a value of high, medium, or low.
Should the text not be readable anymore, but the editing person
has a readable substitution page of the same edition of the work at her disposal,
IMPORTANT the text is plainly encoded, without a specific
annotation. Instead, the usage of pages from an other original sample of the same
edition
is stated within the teiHeader
(see Bibliographic Description)..
If the text is not readable anymore, but the editing person has
a legible substitution page of an other, differing edition of the work at her disposal,
the utilized source
is stated in the source
attribute of element supplied
(which is made use of
for encoding the supplied text, see below):
<unclear reason="damage" resp="#AW" cert="high">2 Ut ordinate proced <supplied resp="#AW" cert="high" source="Azpilcueta, Martin de: Manual de confessores y penitentes [...] - Anvers : Nucio, 1555, S. 67 /SBBpK"> amus ... </supplied> </unclear>
In this case (readable substitution page of a different edition is on hand), or in
the
case of the editing person having an assumption about the supplementation of the missing
text,
the supplementation is added by means of the supplied
element. The reason
,
resp
, and cert
attributes are utilized as described previously.
<div type="section" xml:id="..."> <p>... <damageSpan agent="water" spanTo="#W0998-00-0024-an-9fc7"/> <unclear reason="damage" resp="#AW" cert="high" xml:id="...">est, consequamur.</unclear> </p> </div> <div type="section" xml:id="..." n="De sacramenti nomine"> <p> <unclear reason="damage" resp="#AW" cert="high" xml:id="...">2 Ut ordinate proced<supplied resp="#AW" cert="high">amus</supplied> </unclear> <anchor xml:id="W0998-00-0024-an-9fc7"/> antequam seorsim ... </p> </div>
Should the text not be readable and there is also no assumption
to be made about its supplementation, the existing gap is signified by means of
a gap
element, its attributes being utilized, again, as described previously.
3.6. Typographic Styles and Text Alignment
Typographic chacteristics in the text are encoded by means
of the rendition
attribute. This comprises the following types of
characteristics:
- Characters meaningful with regards textual or argumentative
structures, such as hyphens, asterisks or daggers, are captured in different types
of
elements (e.g.,
milestone
) by means of therendition
attribute. (With regards to non-standard characters without specific meaning, please refer to Non-Standard Characters.) The encoding of typographic features within the respective (type of) element appears reasonable due to the immediate correlation of the character and the text phenomenon expressed by means of the respective element. - Specific formattings of the font (such as initials, italics, recte, bold,
small caps, superscript, subscript, or spaced) are usually encoded through
hi
elements withrendition
attributes. This is supposed to serve, firstly, as a simple rule for consistency, and is also meant, secondly, to express a logical separation between text phenomena (and the respective elements) of a rather "conceptual", structural nature (such asp
orhead
) on the one side, and text phenomena and elements of a typographic nature on the other side. - Text alignment: Centered or right-aligned formatting of a text block is usually
encoded through the
hi
element with arendition
attribute if the text block is a "simple" block (e.g., captured by means of ap
element) without special formatting within the scope of this edition, and not a "conceptual" block (such as ahead
heading) with inherent formatting.
The following formattings by means of rendition
are currently available:
<styleDefDecl scheme="css"/> <tagsDecl> <rendition xml:id="hyphen">content:'-';</rendition> <!-- hyphen --> <rendition xml:id="noHyphen">content:'';</rendition> <!-- no hyphen within a coherent word --> <rendition xml:id="asterisk">content:'*';</rendition> <!-- asterisk --> <rendition xml:id="initCaps" scope="first-letter">font-size: xx-large;</rendition> <!-- initial --> <rendition xml:id="it">font-style: italic;</rendition> <!-- italics --> <rendition xml:id="b">font-weight: bold;</rendition> <!-- bold --> <rendition xml:id="sc">font-variant: small-caps;</rendition> <!-- small caps --> <rendition xml:id="sup">vertical-align: sup; font-size: .83em;</rendition> <!-- superscript --> <rendition xml:id="sub">vertical-align: sub; font-size: .83em;</rendition> <!-- subscript --> <rendition xml:id="spc">letter-spacing: 3px;</rendition> <!-- letter spacing --> <rendition xml:id="recte">font-style: normal;</rendition> <!-- recte, not italicized within italicized passage--> <rendition xml:id="r-center">text-align: center;</rendition> <!-- centered --> <rendition xml:id="right">text-align: right;</rendition> <!-- right-aligned --> <!-- <namespace name="http://www.tei-c.org/ns/1.0"> <tagUsage gi="lb" rendition="#noHyphen"/> <tagUsage gi="milestone" rendition="#asterisk"/> </namespace>--> </tagsDecl>
3.7. Graphic Elements
Larger images, illustrations or graphics
are not recorded as such but marked in the text by means of figure
tags (as "placeholders").
Thus, these types of elements of the print image are not rendered within the reading
view of the digital edition,
although their occurrence is indicated as a "reference" for comparison with the image
view of the according facsimile page.
Smaller graphic elements such as (ornamental) asterisks, solid or curly lines, smaller illustrations, etc., that perform a structural function – for instance, as separators between headings and/or sections – are captured as:
<figure place="inline" type="ornament"/>
4. Structuring and Enrichment of the Works
4.1. Structuring
4.1.1. Multi-Volume Works
Multi-volume works are encoded as one XML dataset
that contains groups (group
) of texts (namely volumes: text
type="work_volume"). The overarching text
element in the dataset of the multi-volume work
is, in this case, of the type
work_multivolume. With single-volume works, type
has the value work_monograph.
Within the text
element of a volume or work, respectively, the structure is further annotated by
means of
front
, body
, back
, div
, etc., as described in the following.
<TEI> <!-- dataset of a multi-volume work --> <teiHeader> ... </teiHeader> <text xml:lang="la" type="work_multivolume" xml:id="completeWork"> <group> <text xml:id="Vol01" xml:lang="la" type="work_volume" n="1"> <front><!-- front matter of the first volume --> ... </front> <body> <div type="lecture" xml:id="..." n="De potestate Ecclesiae I"> ... </div> </body> </text> <text xml:id="Vol02" xml:lang="la" type="work_volume" n="2"> <front xml:id="..."><!-- front matter of the second volume --> ... </front> </text> </group> </text> </TEI>
4.1.2. General Structure of a Text
On the highest structural level of a text (i.e., directly below
the text
element), its main part is embedded in the
body
element. The title page (see also Title Page(s))
and, potentially, further sections before the main part (such as devotions, prologues,
tables of contents,
statements about privileges, etc.) are embedded in the front
(front matter) element; possibly
occurring sections after the main part (indices, errata, tables of contents, etc.)
are encoded within
the back
(back matter) element.
4.1.3. Title Page(s)
Title pages are each (there may be several) encoded by means of the
respective elements (described, for instance, in the
TEI P5 guidelines).
At this, the title page is altogether captured in a titlePage
element.
The complete title is annotated through the docTitle
element, within which the title or the title's
components, respectively, is/are annotated as one or several titlePart
(s), with each
titlePart
obtaining a type
attribute of one of the following values:
main (main title of the work), sub (subtitle),
alt (alternative title), desc (descriptive paraphrase of the work's/volume's content).
Further elements of the title page, if existing, are annotated through the following types of tags (the descriptions are taken from the respective part of the guidelines linked above):
byline
: "... the primary statement of responsibility given for a work".docAuthor
: "contains the name of the author of the [work/volume]"; in itskey
, the normalized form of the author's name is stated (see also the respective section).imprimatur
: "contains a formal statement authorizing the publiction of a work".docEdition
: "contains an edition statement" (including, for instance, statements about the current edition in comparison with previous ones – although the "School of Salamanca" project usually collects a text's first edition).docImprint
: "contains the imprint statement (place and date of publication, publisher name)";docImprint
may includedocDate
, see below.docDate
: "contains the date of a [work/volume]" (potentially occurs withindocImprint
); thewhen
attribute contains the year of publication as a four-digit number.
The thus encoded metadata of a text as given on its title page may also serve for
cross-checking with the Bibliographic Description of the work.
All text blocks on a title page not mentioned here are annotated as
typographic paragraphs (p
) by default.
4.1.4. Structural Text Units
The structural units of the text (e.g., lectures, books, chapters,
questions) that are marked in the original text by means of headings or numerations,
for
instance, are annotated OPTIONAL as far as possible
as div
elements of different type
s. The naming
of the different types reflects an English terminology ("book", "part", "chapter",
"question",
"foreword", etc.) and it basically follows the specifications given in the
DFG-Viewer Strukturdatenset (German),
although some of the elements described there were omitted and others were
added. Details can be obtained from our
TEI schema.
IMPORTANTIn order to allow for a differentiated searchability of the texts,
the following values of type
always need to be stated:
book, contained_work, corrigenda, contents,
index, lecture, map, and part.
Each div
element is completed, if possible, through
a head
element (heading) on its child axis.
IMPORTANT This means, by implication, that the headings of sections marked as div
must not be encoded as head
elements within
other child elements of div
such as list
or lg
.
OPTIONAL Very long headings may be abbreviatedly stated in the respective div
element's n
attribute.
4.1.5. Typographic and Argumentative Paragraphs
The typographic paragraph divisions, marked in the source
by means of vertical margins, first-line indentation, shorter line endings (not reaching
the end of the justification), or others,
are annotated as p
paragraphs. Potentially existing paragraph symbols ('¶') are
not deleted.
Shifts from one argumentative paragraph
to the next, usually marked through marginal numbers, "*" or "†" symbols, or others,
are annotated by means of milestone
tags. This applies for the case that
shifts marked in such a way occur within the "ordinary" continuous text, and also
for the case
that a shift occurs together with the beginning of a typographic p
paragraph
(in the latter case, the milestone
tag is set as (one of) the first child
elements of p
even before the first text node). The symbols are not encoded in
the text, but rather in the rendition
attribute of the respective milestone
tag.
<milestone unit="article" n="2" rendition="#dagger" xml:id="W0998-00-0099-mi-34ca"/>
4.1.6. Lists
Tables of contents, indices, dictionaries, and other types of list structures
are annotated as list
elements containing item
s (which, by taking the examples of dictionaries,
may be used to annotate each term within the dictionary). Lists are possibly headed
by titles (head
)
and annotated as list
with type
attributes containing a value such as
contents, index, or dict. The annotation of the single
(term) entries takes place on the lowest list
/item
level. In case of a list
containing superior structures (e.g., the subsumption of entries in indices with regards
to
their initial characters), the subsumption of item
s are themselves annotated as (sub-)list
s
containing the low-level item
s (and perhaps their own sub-headings)
and being part of a larger overall list
; in this sense, list
s may be nested.
4.1.7. Verse Text
OPTIONAL Text blocks consisting exclusively of verses
– usually identifiable by deeper indentation than the surrounding text and, potentially,
italicization – are annotated by means of the lg
element. If a text block
of this type contains clearly identifiable stanzas, the stanzas themselves are embedded
in
lg
tags (within the overall lg
). The single verse lines, possibly reaching beyond
typographical line breaks (which may be marked in the original text through deeper
indentation
of the following typographic line) are each annotated through the l
element.
<head> <lb xml:id="..."/>EIVSDEM AD LIBRVM.</head> <lg xml:id="..."> <l> <lb xml:id="..."/>I liber in lucem tineis blattísque sepultus: <lb xml:id="..."/>Iam ter quinque annos delituisse sat est. </l> <l> <lb xml:id="..."/>Iam p<g ref="#char0153">oe</g>nas patri nimium, nimium<g ref="#chare8bf0301">que</g> dedisti: <lb xml:id="..."/>Zoilus haud, qui te mordeat, ullus erit. </l> ... </lg>
4.1.8. Notes and Comments
Comments, bottom notes, or marginal notes are annotated through
the note
element, the place
attribute of which states whether the
note is a marginal (margin), bottom/foot (bottom) or other type of note.
At this, several different cases are possible: The note may be identified in
the text of the original source by means of a symbol (e.g., a superscript character,
or an asterisk); this symbol may occur at the exact position in the text from which
the note is referenced, at the beginning of the note, or at either position.
4.1.8.1. Position
If the exact position of the note can be clearly identified, the note (including all its text and markup content) is encoded at this very position. Otherwise, the note is encoded (completely) at the end of the line occurring on the same height on the page as the note, and it obtains an attribute anchored="false".
In the event of a page or column break occurring within the note,
the respective element (pb
or cb
) refers to the xml:id
of the corresponding
break element (pb
or cb
) in the main area of the text by means of an sameAs
attribute (see also xml:id).
4.1.8.2. Symbols
A symbol that references a note in
the main area of the text is annotated by means of a ref
element of type
note-anchor and immediately precedes the note (see also
Cross-References). The target
attribute
of ref
refers to the xml:id
of the note
element which
the symbol refers to.
A symbol identifying/labeling the note at the
beginning of a note is solely encoded as value of the n
attribute of the note (and, hence,
deleted from the text of the note).
<ref type="note-anchor" n="d" target="#W0998-00-0099-nm-363d"> <hi rendition="#sup">d</hi> </ref> <note place="margin" xml:id="W0998-00-0099-nm-363d" n="d"> refert <persName>Sixtus.</persName> et Tritem. ... </note>
4.2. Identification and Linking of Text Elements
4.2.1. xml:id
Generally, xml:id
attributes are made use of in order to
uniquely identify elements in the XML document (which is then utilized, not least,
for linking and referencing text passages, or search results, in the web application).
By virtue of the xml:id
attribute, an element becomes "addressable" as the
target of a link or a reference in the first place, while other attributes (such
as ref
or target
) at the "jumping-off point" of the link/reference
state the point to be linked to, or if the "jumping-off point" is to be processed
or rendered
in a specific manner. Within the scope of this edition, a large part of the structural
and semantic elements (such as div
, milestone
, head
,
p
, note
, item
, or term
)
within/below the TEI text
element obtain by default an
xml:id
attribute (value).
At this, the concrete values of xml:id
are generally "standardized" in
that they follow a specific schema/syntax. However, special rules apply for the two
elements
TEI
and text
:
TEI
The xml:id
of the TEI
element states the project-specific
five-place ID of the work. If the dataset merely comprises one volume of a multi-volume
work,
a suffix _VolXX is added, with "XX" signifying the two-digit volume number (potentially
with a leading zero).
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" xml:id="W0066"> <TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" xml:id="W0014_Vol02">
text
The text
element obtains either the xml:id
value
completeWork in case it is not part of a multi-volume work, or the xml:id
value VolXX (with "XX" being the volume number, potentially containing a
leading zero) if it comprises one
volume of a multi-volume work.
<text xml:lang="la" type="work_monograph" xml:id="completeWork"> ... or ... <text xml:lang="la" type="work_volume" xml:id="Vol03">
The values of all elements within (or
below, so to speak) of the text
element that have an
xml:id
attribute follow a consistent schema in that the value always has
21 places and consists of the following parts:
xml:id="[work ID]-[volume number]-[facsimile number]-[element code]-[alphan. code]"
Description:
- : project-specific 5-place ID of the work (not the volume) that the element is to be found in, such as "W0002" or "W0013".
- : 2-digit number of the volume within the work, in which the
element identified through the
xml:id
attribute is to be found; is "00" in case of a single-volume work, "01" with the first volume, "02" with the second volume, and so on. - : 4-digit value stating the number
of the facsimile that corresponds to the element identified by the
xml:id
; the value follows the value of thefacs
attribute of the nearest previous page break (pb
). - : a 2-letter abbreviation for the type
of element that is to be identified, such as "pa" for
p
elements. For a complete list of all possible element codes, please see further below in this section. - : a 4-place alphanumeric value which
primarily serves for the uniqueness of the
xml:id
value, and which only in the case oflb
elements contains relevant information (in the form of line numeration, please see Page, Column, and Line Breaks).
The following elements obtain an xml:id
following the schema
described here (the corresponding element codes are stated in quotation marks):
front
: "fm" (front matter)body
: "tb" (text body)back
: "bm" (back matter)titlePage
: "tp"docTitle
: "dt"titlePart
: "tt"div
: "dX" (division, with "X" as a numeric value stating thediv
's position in the overalldiv
hierarchy: e.g., "d1" is a "top-level"div
)head
: "he" (heading)item
: "it"list
: "li"p
: "pa" (paragraph)pb
: "pb"lb
: "lb"note
: with attribute @type="margin": "nm" (Marginalnote)milestone
: "mi"lg
: "lg"unclear
: "un"persName
: "pe"placeName
: "pl"term
: "te"title
: "ti"supplied
: "su"
<div xml:id="W0002-00-0010-d1-03eb" type="privileges"> <head xml:id="W0002-00-0010-he-03ea"> <pb n="[vii]" facs="facs:W0002-0010" xml:id="W0002-00-0010-pb-03ee"/> <lb xml:id="W0002-00-0010-lb-0001"/> Priuilegio del Rey de Portugal. </head> <p xml:id="W0002-00-0010-pa-040a"> <lb xml:id="W0002-00-0010-lb-0002"/> <hi rendition="#initCaps">E</hi><hi rendition="#it">V El Rey... ...
The consistent syntax of xml:id
within the text
area, allowing for
a differentiation of uniquely identified elements with regards to their work, volume,
facsimile number, and/or
element type (or even, in the case of lb
, with regards to its line number), is supposed to
make such elements "addressable" and recognizable in a logically intuitive way (not
least
for persons editing the XML document).
4.2.2. Cross-References
If annotated, cross-references are annotated generally by means of
the ref
element and its target
attribute. They occur, for example, in the form
of short summary titles in the table of contents of works or work parts, respectively,
or
in the form of symbols referencing marginal notes. The type
attribute of ref
marks these differences, valid values are currently summary and
note-anchor (and url and
image, which are usually not relevant, however, in the works). The
target
attribute states the xml:id
of the element that
is to be referenced.
<list type="summaries"> <head>SUMMA.</head>... <item> ...esto. nume. <ref type="summary" target="#W0002-00-0025-mi-03fe">15.</ref> </item> ...</list> <p>... Ni es <milestone unit="article" rendition="#dagger" n="15" xml:id="W0002-00-0025-mi-03fe"/> contra razon, que vno... <p>... <ref type="note-anchor" n="n" target="#W0002-00-0025-nm-0420"> <hi rendition="#sup">n</hi> </ref> <note place="margin" n="n" anchored="true" xml:id="W0002-00-0025-nm-0420"> <lb xml:id="..."/>Maior. in 4... </note> ...</p>
4.3. References and Semantic Text Enrichment
The edition of the "School of Salamanca" project links different internal
datasets: works, dictionary articles, and authors. Furthermore, there are references
to
external data from authority files, leading out of the web application. For the latter
purpose, the
CERL-Thesaurus (Consortium of European Research Libraries),
the
Gemeinsame Normdatei der Deutschen Nationalbibliothek (GND)
and, with place names, the
Getty Thesaurus of Geographic Names ® Online
are referenced. All these (internal and external) references make use of the xml:id
attribute
described above as well as the attributes ref
, key
, and
sortKey
, which shall be described in the following.
4.3.1. Attributes for References and Normalizations
4.3.1.1. ref
ref
attributes are made use of in order to link
entities occurring in the text to their specific dataset; this concerns,
for instance, the linking of references to persons to the respective (project-specific)
author
dataset, of place names to the public authority file, etc. At this,
the attribute value obtains a prefix separated from the subsequent number by
means of a colon, stating the place or authority file in which the
entity is to be found: author:, work:, or lemma:
refer to datasets for authors, works, or lemmata of the "School of Salamanca" project,
whereas cerl:, getty:, and gnd: refer to
external authority files. There can be multiple such references/keys within one
attribute value, separated through blank characters; at this, IMPORTANT the
project-specific author dataset needs to be referenced, if existing, and OPTIONAL there
should also be a reference to the CERL dataset (when it comes
to persons) or to the getty dataset (with locations). The web application currently
uses
the key stated at first position, so that the internal keys
(author:, work:,
lemma:) should be mentioned primarily, then potentially followed by
the cerl: key, and after that coming further keys to authority files.
IMPORTANT Please note that the reference by means of the
ref
attribute, linking a contentually relevant element to a respective dataset, is
something conceptually different than the type of structural cross-reference
described above; for instance, the latter is
more or less explicitely stated as such in the original document (by means of page
or
paragraph numbers, etc.) and tagged through the ref
element, while the
former can only be identified,
to a large extent, through scholarly studying of the text (ultimately, then, being
encoded
within the ref
attribute).
4.3.1.2. Variants of @ref
Depending on the type of reference to be made, there are three different scenarios
for
using the ref
attribute.
- 1. A place/element within the same dataset is referenced.
<!-- always without prefix (work:, lemma:, author:)--> <term ref="#W0998-00-0066-mi-3e52" key="utilitas" xml:id="W0998-00-0014-te-aa45"> context ... </term>
-
2. A different, project-specific dataset (or a place/entity therein) is referenced.
<!-- always with prefix (work:, lemma:, author:) and @xml:id of the dataset (in this case: A0001)--> <persName ref="author:A0001#A0001-pa-43fa" key="Vitoria, Francisco de" xml:id="W0998-00-0324-pe-7f6a"> FRANCISCI ... </persName>
- 3. A dataset/entity external to the project's digital edition is referenced. Please see External Linked Data.
key
attributes are used for recording normalized variants of the
entities annotated through the respective elements. In case of a failing recourse
to the normalized variant from the external authority file, the web application
makes use of the value of this attribute. The following elements obtain
a key
attribute applied in this way:
-
persName
-
docAuthor
-
placeName
-
title
-
term
4.3.1.3.1. Normalization of Proper Names and Work Titles
OPTIONAL If there are established and well-known forms of work titles or proper names,
they may be recorded in their normalized/modernized variant
(in addition to the annotated, original variant of the text)
by means of the key
attribute in elements persName
, docAuthor
, placeName
,
or title
:
<persName ref="cerl:01302080" key="Hieronymus, Sophronius Eusebius" resp="#DG" cert="high"> Hierony. </persName> ... <bibl> <title key="De Republica, Liber III, § 11" resp="#DG" cert="high"> Lib. 3 de Republi. tit. 11. fol.78. </title> </bibl>
(With regards to proper names and work titles, please see also
the sections on Persons,
Places, and Bibliographic
References. For the ref
attribute in this example, please see also the
section on ref attributes.)
4.3.1.4. sortKey
The sortKey
attribute is used exclusively with
bibl
elements (hence, with external bibliographic references
or references to internal works).
It serves to group references to a specific work.
Thus, the entry to be stated within sortKey
consists
of the author name, the "_" separator, and the work's (short) title.
Blank space occurring in the name or title is omitted, instead the following word
is simply appended, with its first letter in upper case ("camel case"), to the previous
word.
Examples:
<bibl sortKey="Mt_16">Matthei decimo sexto</bibl>, <bibl sortKey="ThomasAquinas_SThPrimaPars"> <author> <persName ref="cerl:cnp00396685" key="Thomas <de Aquino>"> <choice> <orig>S. Tho.</orig> <reg>Sanctus Thomas</reg> </choice> </persName> </author> 1.p.q.1.a.7.</bibl>
4.3.1.5. n
With numerated or named elements, the name or number
of the element (i.e., the text passage) can be stated within the value of n
.
This functionality acutally has nothing to do with the previously mentioned references
and links,
but is rather mentioned at this place in order to avoid ambiguities. It can be used,
though,
for stating page numbers (within the pb
element, please see Pagination)
or short titles (within div
, for example, in the case of a long title in its head
element, see
also Structural Text Units).
<div type="foreword" xml:id="..." n="Praefatio (Boyer)">...</div> <div type="question" xml:id="..." n="Qu. 1 - An in Eccl. sit dignitas">...</div> <milestone unit="article" xml:id="..." n="De diversis acceptionibus"/> <pb facs="facs:W0065-B-0077" n="[2]" xml:id="..."/>
4.3.2. Internal References
Internal references designate links between datasets within the digital edition (see also Variants of @ref) and can be applied, for instance, in the following scenarios:
- References from a work to passages in other works, to authors of works of the digital edition, or to lemmata of the dictionary;
- Reference from a dictionary article (i.e., from a lemma dataset) to passages of other dictionary articles and/or to specific authors, or to passages in specific works of the source collection.
4.3.2.1. Lemmata
Lemmata – i.e., references to terms covered in the project's dictionary –
are annotated in the work corpus specifically by means of the term
element. At this, the lemma
dataset is referred to from the ref
attribute of term. In the key
attribute, the normalized form of the lemma's name (i.e., normalized according to
the name in the dictionary
article) is recorded.
An example (for illustrative purposes):
De <term ref="lemma:L0001" key="utilitas" xml:id="W0998-00-0034-te-7a8df">utilitate</term> ... <term ref="lemma:L0325" key="lumen supernaturale" xml:id="W0998-00-0087-te-445f">supernaturali lumine</term> et revelatio <lb/>ne cognita ad Deum et divina quaedam pertinent: <term ref="lemma:L0404" key="obiectum scientiae Theologiae" xml:id="W0998-00-0113-te-77da"> obiectum verum scien<lb break="no" rendition="#hyphen"/>tiae Theologiae </term>
4.3.2.2. Authors
The "authors" designate those persons authoring one or several works of the
corpus. In the event of an author being mentioned (e.g., in the form of a clearly
assignable
citation or paraphrase), the mentioned proper name, or reference to the author is
annotated
by means of the persName
element. Accordingly, the reference to the respective dataset (within ref
)
and the normalized form (i.e., the name of the author according to the corresponding
biographical
article, stated in key
) are recorded.
<persName ref="author:A0001 cerl:cnp01318674 gnd:118768735" key="Vitoria, Francisco de" xml:id="W0998-00-0245-pe-7ff8"> FRANCISCI DE VIctoria </persName>
The annotation of authors is to be differentiated from the annotation of "external" person names in that, with authors, the author's key (e.g., ref="author:A0001") must be stated. (with regards to external person names, see also Persons in the "External Linked Data" section).
4.3.2.3. Works
The "works" are those works that are encoded within the scope of the digital edition
of
this project. Citations and other bibliographic references to these works, occurring
in a text (of an other work),
are annotated by means of the bibl
element. In the element's sortKey
attribute, an
identifying abbreviation consisting of the author name, separator "_", and work (short)
title (of the referenced work)
is to be stated. The bibl
element usually contains further sub-elements such as author
(including persName
, for which the respective section applies), and title
.
... as <bibl sortKey="Vitoria_ComSTh"> <author> <persName ref="author:A0001" key="Vitoria, Francisco de">Vitoria</persName> </author> in <title ref="work:W0015#a6ef">Liber 1 Caput 1 Artikel 1</title> </bibl> explains ...
The annotation of citations of, or bibliographic references to "external" authors (authoring works that are not included in the corpus at hand) are described in Bibliographic References.
4.3.3. External Linked Data
OPTIONAL "External linked data" refer to persons, places, or cited literature not
encoded or described within the scope of this digital edition. In the process of contentual
analysis and enrichment of the texts, these persons, places, or literary references
are
successively – and, potentially, constrained to certain parts of the texts – annotated.
Consequently, the annotation may not be conducted to the full extent, but rather in
the
process of studying the source material. With specific works or sequences in single
works, in which
persons, places, or bibliographic references are annotated rather exhaustively,
informations about these annotations need to be stated in the teiHeader
.
4.3.3.1. Persons
For the demands of these guidelines, references to persons – different from
– are external references. With persons (i.e., persName
s),
the ref
attribute links to the
CERL Thesaurus, and OPTIONAL
to the GND. In the key
attribute, the
normalized form of the person name proposed in the CERL database is stated, serving
as a "fallback" solution
in case of the CERL server not being available.
<persName ref="cerl:cnp00396685 gnd:118622110" key="Thomas de Aquino">B. Thomas</persName>
4.3.3.2. Places
With places (placeName
), the ref
attribute references the
Getty Thesaurus of Geographic Names® Online
and OPTIONAL the GND authority files.
The key
attribute states the normalized place name proposed in the GETTY dataset,
serving as a "fallback" solution in case of the GETTY server not being available.
<placeName ref="getty:7002722" key="Athos">Monte Athon</placeName>
4.4. Bibliographic References
Citations and other references to external literature (not
included in the corpus described here) occurring in the text are annotated by means
of the bibl
element
(with regards to references to works of the corpus, please refer to Works).
In the element's sortKey
attribute, an
identifying abbreviation consisting of the author name, separator "_", and work (short)
title (of the referenced work)
is to be stated. The bibl
element usually contains further sub-elements such as author
or title
.
<bibl sortKey="Fischer_Fische"> <author> <persName ref="cerl:99999" key="Fischer, Fritz">Fritz Fischer</persName> </author> <title>Meine Fische</title> </bibl>
5. Metadata in the teiHeader
In the teiHeader
of a work, generic information (applying to all works of
the corpus) is embedded by means of xi:include
elements. The latter are included
from the central documentation file for TEI metadata of the project.
This comprises the following statements from the teiHeader
:
-
editionStmt
-
publicationStmt
-
encodingDesc
-
encodingDesc
/editorialDecl
is specified whether the work is part of Group B or a reference work. If this description is not pro-vided, the work is considered part of Group A.<editorialDecl> […] <p xml:id="W0111_RW">Reference works contain automatic hyphenation of marked and un-marked words in the pb, cb and lb elements. Abbreviations are coded as they appear in the origi-nal.</p> </editorialDecl> <editorialDecl> […] <p xml:id="W0074_AEW">Only automatically edited work: it contains automatic hypenation of marked and unmarked words in the pb, cb and lb elements. Abbreviations are partially re-solved.</p> </editorialDecl>
In the documentation file for TEI metadata of the project, the
non-standard characters declared in the charDecl
are included in turn via xi:include
from a specific
non-standard character file.
In opposition to these generically (i.e., corpus-wide) applying information, the following data is to be annotated in a work-specific way:
-
titleStmt
" (...) groups information about the title of a work and those responsible for its content" (TEI Guidelines) -
sourceDesc
"(...) describes the source from which an electronic text was derived or generated (...)" (TEI Guidelines) -
revisionDesc
"(...) summarizes the revision history for a file" (TEI Guidelines). See also the section on Revision History.
5.1. Bibliographic Description
Bibliographic information about the original source used for encoding are recorded
in the sourceDesc
.
<sourceDesc> <biblStruct> <monogr> <author> <persName ref="author:0011 gnd:118944053 cerl:cnp01451608" key="Azpilcueta, Martin de"> <forename>Martin</forename> <nameLink>de</nameLink> <surname>Azpilcueta</surname> </persName> </author> <title type="short" level="m">Manual de confessores</title> <title type="main" level="m">Manval De Confessores Y Penitentes, Qve Clara Y Brevemente Contiene, La Vniversal Y Particular Decision De Qvasi Todas Las Dvdas, que en confessiones suelen ocurrir de los pecados, ... en cinco Comentarios de Vsura, Cambios, Symonia mental, Defension del proximo, De hurto notable, & irregularidad ...</title> <title type="245a" level="m">Manual de confessores y penitentes : que clara y breuemente contiene la universal y particular decision de quasi todas las dudas ... </title> <imprint> <pubPlace role="firstEd" ref="getty:7010814" key="Coimbra">Coimbra</pubPlace> <date type="firstEd" when="1553">1553</date> <pubPlace role="thisEd" ref="getty:7002835" key="Salamanca">Salamanca</pubPlace> <date type="thisEd" when="1556">1556</date> <publisher n="firstEd"> <persName ref="gnd:1037601092" key="Barreira, João de"><!--not found in CERL--> <forename>João</forename> <nameLink>de</nameLink> <surname>Barreira</surname> </persName> <persName ref="cerl:cni00045922" key="Alvares, João"> <forename>João</forename> <surname>Álvares</surname> </persName> </publisher> <publisher n="thisEd"> <persName ref="gnd:1037609387" key="Portonariis, Andreas de"><!--not found in CERL--> <forename>Andrea</forename> <nameLink>de</nameLink> <surname>Portonarijs</surname> </persName> </publisher> </imprint> <extent> [16], 797 [i.e. 799] S. ; 4° </extent> </monogr> <note xml:id="ownerOfPrimarySource"> <ref type="institution" target="gnd:4313400-2">Universität Salamanca / Bibliothek</ref> <ref type="catLink" target="http://brumario.usal.es/record=b1857195~S1*spi#.VEdkThaq5OI"/> <!--further sources/institutions are to be stated here--> </note> </biblStruct> </sourceDesc>
The bibliographic description follows, with regards to the full title (title
type="main"), the guidelines
"Alte Drucke"
(German) of the
Head Office of Gemeinsamer Bibliotheksverbund (p. 6-7),
as well as, analogously, the guidelines of the
Mindestanforderungen für die autoptische Katalogisierung Alter Drucke (AAD)
(German). As the example shows, an established, citable short title and a "Ansetzungssachtitel"
(according to the guidelines) are
stated along with the full title. Authors and places are annotated corresponding to
the edition guidelines at hand.
The information about the original source may contain the following exception: If
the digitized and encoded edition
is not the first edition of the text, a publisher
element containing the n
attribute with value
thisEd is added. In the event of encoding a different edition than the first one, the information
about
both relevant editions is thus recorded and displayed.
A similar exception applies with regards to the datasets of multi-volume works:
<imprint> <pubPlace role="firstEd" ref="getty:7007856" key="Antwerpen">Antverpiae</pubPlace> <!--@key = the preferred Getty reference--> <date type="firstEd" when="1668">1668</date> <date type="summaryFirstEd" when="1668">1.1668 - 6.1686</date> <publisher n="firstEd"> <persName ref="cerl:cni00031626 gnd:123414245" key="Meurs, Jacob van"> <forename>Jacobus</forename> <surname>Meursius</surname> </persName> </publisher> </imprint>
Should the digitized series not be the one containing the first editions of the volumes,
the date
element containing the type
attribute value of summaryFirstEd
is complemented
by a date
element containing the type
attribute value of summaryThisEd
.
Thus, it is made clear which volumes from the series have been encoded, and whether
the volumes are
the first-edition volumes.
In conjunction with the information about imprints (Printers/publishers, publication
place and year),
the institution in possession of the digital scans of the original and a link to the
corresponding catalogue entry
are recorded. Should pages be missing in this original source, they are substituted
through those of other versions/editions as
far as possible. Die Addition of "external" pages and their origin are documented
within the sourceDesc
in the form
of a list.