Ground Truth Guidelinesde OCR-D: DFG-funded Initiative for Optical Character Recognition Development
How to Transcribe in Level 2
If the text to be transcribed can be recorded with Unicode characters, these must
be used
exclusively.
If the character can only be formed by combining two characters, this combination
must be
used.
Apart from the vocal ligatures, all ligatures are split.
Typographical peculiarities are to be documented as formatting details. This
includes all non-vocal ligatures.
If the character cannot be formed from the combination of several characters and if
a MUFI
equivalent exists, use MUFI.
If options 1, 2, 4 are not possible, a code definition shall be used in consultation
with the
OCR-D Coordination project following the joint agreements reached on major
international projects such as IMPACT, EEBO, ECCO.