1. Home
  2. First Steps
  3. TEI-XML Tags
  4. xml:id and xml:lang attributes

xml:id and xml:lang attributes

Of interest are the attributes @xml:id and @xml:lang, which can be added to almost all tags and serve to provide a unique identifier for that element (identification) and to define the language in which this tag and all the content it contains is in (language). 

@xml:id

The @xml:id have strict rules for their composition:

  • They cannot contain anything other than letters (upper and lower case), numbers, full stop (.), hyphen (-), and underscore (_).
  • They must begin with a letter.
  • They should bear the number corresponding to the verse, chapter, language, etc., in order to be meaningful.
  • If the identifier is to be added not only in the original text but also in the translations, we need to use the abbreviation of the language of the text as the first part of the identifier. For example, @xml:id="la.4.236" indicates verse 4.236 of the work that we are tagging, but in its Latin version. The same verse, in the Spanish translation, would carry the identifier @xml:id="es.4.236".
  • When identifiers are used to mark special tags, they must include, as text, that tag or its abbreviation in the identifier. This distinguishes them from other unique identifiers that may be attached to the same verse or paragraph, which all carry the same number.

There are tags that refer to elements previously defined in the document by means of an @xml:id. The attribute that indicates them must cite exactly the value of that @xml:id preceded by the “#” symbol.

@xml:lang

The @xml:lang, serve to define the language in which this tag, plus all that which it contains, is in. For example, a <text> así definido:

<text type="translation" xml:lang="es"> … Contenidos … </text>

introduces all the translation of our original text, which, in turn, will be identified by:

<text type="source" xml:lang="grc"> … Contenidos … </text>

The values of the languages that we use in our editions are:

  • Latin: "la"
  • Ancient Greek: "grc"
  • Spanish: "es"
  • English: "en"
  • Italian: "it"
  • German: "de"
  • Modern Greek: "el"
  • French: "fr"
  • Galician: "gl"
  • Basque: "eu"
  • Catalan: "ca"
  • Portuguese: "pt"

These language abbreviations will also be of use to us for the @xml:id that we will need to assign to the segments in order to align text and translation.

For a full list of language abbreviations please consult the official webpage of IANA (Internet Assigned Numbers Authority).

How can we help?

en_GBEN