ocrd_validators package¶
Validators for various OCR-D related data structures.
-
class
ocrd_validators.
ParameterValidator
(ocrd_tool)[source]¶ Bases:
ocrd_validators.json_validator.JsonValidator
JsonValidator validating parametersagains ocrd-tool.json.
Construct a ParameterValidator.
- Parameters
ocrd_tool (dict) – Parsed
ocrd-tool.json
.
-
class
ocrd_validators.
WorkspaceValidator
(resolver, mets_url, src_dir=None, skip=None, download=False, page_strictness='strict', page_coordinate_consistency='poly')[source]¶ Bases:
object
Validates an OCR-D/METS workspace against the specs.
Construct a new WorkspaceValidator.
- Parameters
resolver (Resolver) –
mets_url (string) –
src_dir (string) –
skip (list) –
download (boolean) –
page_strictness ("strict"|"lax"|"fix"|"off") –
page_coordinate_consistency ("poly"|"baseline"|"both"|"off") –
-
static
check_file_grp
(workspace, input_file_grp=None, output_file_grp=None, page_id=None, report=None)[source]¶ Return a report on whether input_file_grp is/are in workspace.mets and output_file_grp is/are not. To be run before processing
- Parameters
workspacec (Workspace) –
input_file_grp (list|string) –
output_file_grp (list|string) –
page_id (list|string) –
-
static
validate
(*args, **kwargs)[source]¶ Validates the workspace of a METS URL against the specs
- Parameters
resolver (
ocrd.Resolver
) – Resolvermets_url (string) – URL of the METS file
src_dir (string, None) – Directory containing mets file
skip (list) – Tests to skip. One or more of ‘mets_unique_identifier’, ‘mets_file_group_names’, ‘mets_files’, ‘pixel_density’, ‘dimension’, ‘url’
download (boolean) – Whether to download files
- Returns
report (
ValidationReport
) Report on the validity
-
class
ocrd_validators.
PageValidator
[source]¶ Bases:
object
Validator for OcrdPage <../ocrd_models/ocrd_models.ocrd_page.html>.
-
static
validate
(filename=None, ocrd_page=None, ocrd_file=None, page_textequiv_consistency='strict', page_textequiv_strategy='first', check_baseline=True, check_coords=True)[source]¶ Validates a PAGE file for consistency by filename, OcrdFile or passing OcrdPage directly.
- Parameters
filename (string) – Path to PAGE
ocrd_page (OcrdPage) – OcrdPage instance
ocrd_file (OcrdFile) – OcrdFile instance wrapping OcrdPage
page_textequiv_consistency (string) – ‘strict’, ‘lax’, ‘fix’ or ‘off’
page_textequiv_strategy (string) – Currently only ‘first’
check_baseline (bool) – whether Baseline must be fully within TextLine/Coords
check_coords (bool) – whether *Region/TextLine/Word/Glyph must each be fully contained within Border/*Region/TextLine/Word, resp.
- Returns
report (
ValidationReport
) Report on the validity
-
static
-
class
ocrd_validators.
OcrdToolValidator
(schema, validator_class=<class 'jsonschema.validators.create.<locals>.Validator'>)[source]¶ Bases:
ocrd_validators.json_validator.JsonValidator
JsonValidator validating against the
ocrd-tool.json
schema.Construct a JsonValidator.
- Parameters
schema (dict) –
validator_class (Draft4Validator|DefaultValidatingDraft4Validator) –
-
static
validate
(obj, schema={'additionalProperties': False, 'description': 'Schema for tools by OCR-D MP', 'properties': {'dockerhub': {'description': 'DockerHub image', 'type': 'string'}, 'git_url': {'description': 'Github/Gitlab URL', 'format': 'url', 'type': 'string'}, 'tools': {'additionalProperties': False, 'patternProperties': {'ocrd-.*': {'additionalProperties': False, 'properties': {'categories': {'description': 'Tools belong to this categories, representing modules within the OCR-D project structure', 'items': {'enum': ['Image preprocessing', 'Layout analysis', 'Text recognition and optimization', 'Model training', 'Long-term preservation', 'Quality assurance'], 'type': 'string'}, 'type': 'array'}, 'description': {'description': 'Concise description what the tool does'}, 'executable': {'description': 'The name of the CLI executable in $PATH', 'type': 'string'}, 'input_file_grp': {'description': 'Input fileGrp@USE this tool expects by default', 'items': {'pattern': '^OCR-D-[A-Z0-9-]+$', 'type': 'string'}, 'type': 'array'}, 'output_file_grp': {'description': 'Output fileGrp@USE this tool produces by default', 'items': {'pattern': '^OCR-D-[A-Z0-9-]+$', 'type': 'string'}, 'type': 'array'}, 'parameters': {'description': 'Object describing the parameters of a tool. Keys are parameter names, values sub-schemas.', 'patternProperties': {'.*': {'additionalProperties': False, 'properties': {'cacheable': {'default': False, 'description': "If parameter is reference to file: Whether the file should be cached, e.g. because it is large and won't change.", 'type': 'boolean'}, 'content-type': {'description': 'If parameter is reference to file: Media type of the file', 'type': 'string'}, 'default': {'description': 'Default value when not provided by the user'}, 'description': {'description': 'Concise description of syntax and semantics of this parameter'}, 'enum': {'description': 'List the allowed values if a fixed list.', 'type': 'array'}, 'format': {'description': 'Subtype, such as `float` for type `number` or `uri` for type `string`.'}, 'required': {'description': 'Whether this parameter is required', 'type': 'boolean'}, 'type': {'description': 'Data type of this parameter', 'enum': ['string', 'number', 'boolean', 'object', 'array'], 'type': 'string'}}, 'required': ['description', 'type'], 'type': 'object'}}, 'type': 'object'}, 'steps': {'description': 'This tool can be used at these steps in the OCR-D functional model', 'items': {'enum': ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis'], 'type': 'string'}, 'type': 'array'}}, 'required': ['description', 'steps', 'executable', 'categories', 'input_file_grp'], 'type': 'object'}}, 'type': 'object'}, 'version': {'description': 'Version of the tool, expressed as MAJOR.MINOR.PATCH.', 'pattern': '^[0-9]+\\.[0-9]+\\.[0-9]+$', 'type': 'string'}}, 'required': ['version', 'git_url', 'tools'], 'type': 'object'})[source]¶ Validate against
ocrd-tool.json
schema.
-
class
ocrd_validators.
OcrdResourceListValidator
(schema, validator_class=<class 'jsonschema.validators.create.<locals>.Validator'>)[source]¶ Bases:
ocrd_validators.json_validator.JsonValidator
JsonValidator validating against the
resource_list.yml
schema.Construct a JsonValidator.
- Parameters
schema (dict) –
validator_class (Draft4Validator|DefaultValidatingDraft4Validator) –
-
static
validate
(obj, schema={'additionalProperties': False, 'patternProperties': {'^ocrd-.*': {'items': {'additionalProperties': False, 'properties': {'description': {'description': 'A description of the resource', 'type': 'string'}, 'name': {'description': 'Name to store the resource as', 'type': 'string'}, 'parameter_usage': {'default': 'as-is', 'description': 'Defines how the parameter is to be used', 'enum': ['as-is', 'without-extension'], 'type': 'string'}, 'path_in_archive': {'default': '.', 'description': 'if type is archive, the resource is at this location in the archive', 'type': 'string'}, 'size': {'description': 'Size of the resource in bytes', 'type': 'number'}, 'type': {'default': 'file', 'description': 'Type of the URL', 'enum': ['file', 'github-dir', 'tarball'], 'type': 'string'}, 'url': {'description': 'URLs of all components of this resource', 'type': 'string'}, 'version_range': {'default': '>= 0.0.1', 'description': 'Range of supported versions, syntax like in PEP 440', 'type': 'string'}}, 'required': ['url', 'description', 'name', 'size'], 'type': 'object'}, 'type': 'array'}}, 'type': 'object'})[source]¶ Validate against
resource_list.schema.yml
schema.
-
class
ocrd_validators.
OcrdZipValidator
(resolver, path_to_zip)[source]¶ Bases:
object
Validate conformance with BagIt and OCR-D bagit profile.
- See:
- Parameters
resolver (Resolver) – resolver
path_to_zip (string) – Path to the OCRD-ZIP file
-
validate
(skip_checksums=False, skip_bag=False, skip_unzip=False, skip_delete=False, processes=2)[source]¶ Validate an OCRD-ZIP file for profile, bag and workspace conformance
- Parameters
skip_bag (boolean) – Whether to skip all checks of manifests and files
skip_checksums (boolean) – Whether to omit checksum checks but still check basic BagIt conformance
skip_unzip (boolean) – Whether the OCRD-ZIP is unzipped, i.e. a directory
skip_delete (boolean) – Whether to skip deleting the unpacked OCRD-ZIP dir after valdiation
processes (integer) – Number of processes used for checksum validation
-
class
ocrd_validators.
XsdValidator
(schema_url)[source]¶ Bases:
object
XML Schema validator.
Construct an XsdValidator.
- Parameters
schema_url (str) – URI of XML schema to validate against.
-
class
ocrd_validators.
XsdMetsValidator
(schema_url)[source]¶ Bases:
ocrd_validators.xsd_validator.XsdValidator
XML Schema validator.
Construct an XsdValidator.
- Parameters
schema_url (str) – URI of XML schema to validate against.
-
class
ocrd_validators.
XsdPageValidator
(schema_url)[source]¶ Bases:
ocrd_validators.xsd_validator.XsdValidator
XML Schema validator.
Construct an XsdValidator.
- Parameters
schema_url (str) – URI of XML schema to validate against.
Submodules¶
- ocrd_validators.constants module
- ocrd_validators.json_validator module
- ocrd_validators.ocrd_tool_validator module
- ocrd_validators.ocrd_zip_validator module
- ocrd_validators.page_validator module
- ocrd_validators.parameter_validator module
- ocrd_validators.resource_list_validator module
- ocrd_validators.workspace_validator module
- ocrd_validators.xsd_mets_validator module
- ocrd_validators.xsd_page_validator module
- ocrd_validators.xsd_validator module