Level 1
Particularites of the printing technology and typographical aspects are not taken into
account and are not documented in the Ground Truth corpus. A normalization is
carried out to a greater extent. The following characters are normalized:
- long-s to round-s
- umlaute (e above a vowel) to ÀâüΓΓΓ
- sz to Γ
- Virgel to comma
- Quotation marks are transferred to today's use and are not differentiated
- Separators are transferred to today's use and are not differentiated
- the round-r in connection with c ist dissolved to etc.
- The reproduction of spaces is limited to the separation of words.
- Punctuation marks are always used in conjunction with the preceding word.