ocrd_models.ocrd_mets module

API to METS

class ocrd_models.ocrd_mets.OcrdMets(**kwargs)[source]

Bases: ocrd_models.ocrd_xml_base.OcrdXmlDocument

API to a single METS file

static empty_mets(now=None)[source]

Create an empty METS file from bundled template.

property unique_identifier

Get the unique identifier by looking through mods:identifier

See specs for details.

property agents

List all :py:class:`ocrd_models.ocrd_agent.OcrdAgent`s

add_agent(*args, **kwargs)[source]

Add an ocrd_models.ocrd_agent.OcrdAgent to the list of agents in the metsHdr.

property file_groups

fileGrp` entries.

Type

List the @USE of all `mets

find_all_files(*args, **kwargs)[source]

Like find_files() but return a list of all results.

Equivalent to list(self.find_files(...))

find_files(ID=None, fileGrp=None, pageId=None, mimetype=None, url=None, local_only=False)[source]

Search mets:file entries in this METS document and yield results.

The ID, fileGrp, url and mimetype parameters can each be either a literal string, or a regular expression if the string starts with // (double slash).

If it is a regex, the leading // is removed and candidates are matched against the regex with re.fullmatch. If it is a literal string, comparison is done with string equality.

The pageId parameter supports the numeric range operator ... For example, to find all files in pages PHYS_0001 to PHYS_0003, PHYS_0001..PHYS_0003 will be expanded to PHYS_0001,PHYS_0002,PHYS_0003.

Keyword Arguments
  • ID (string) – @ID of the mets:file

  • fileGrp (string) – @USE of the mets:fileGrp to list files of

  • pageId (string) – @ID of the corresponding physical mets:structMap entry (physical page)

  • url (string) – @xlink:href (URL or path) of mets:Flocat of mets:file

  • mimetype (string) – @MIMETYPE of mets:file

  • local (boolean) – Whether to restrict results to local files in the filesystem

Yields

ocrd_models:ocrd_file:OcrdFile instantiations

add_file_group(fileGrp)[source]

Add a new mets:fileGrp.

Parameters

fileGrp (string) – @USE of the new mets:fileGrp.

rename_file_group(old, new)[source]

Rename a mets:fileGrp by changing the @USE from old to new.

remove_file_group(USE, recursive=False, force=False)[source]

Remove a mets:fileGrp (single fixed @USE or multiple regex @USE)

Parameters
  • USE (string) – @USE of the mets:fileGrp to delete. Can be a regex if prefixed with //

  • recursive (boolean) – Whether to recursively delete each mets:file in the group

  • force (boolean) – Do not raise an exception if mets:fileGrp does not exist

add_file(fileGrp, mimetype=None, url=None, ID=None, pageId=None, force=False, local_filename=None, ignore=False, **kwargs)[source]

Instantiate and add a new ocrd_models.ocrd_file.OcrdFile.

Parameters

fileGrp (string) – @USE of mets:fileGrp to add to

Keyword Arguments
  • mimetype (string) – @MIMETYPE of the mets:file to use

  • url (string) – @xlink:href (URL or path) of the mets:file to use

  • ID (string) – @ID of the mets:file to use

  • pageId (string) – @ID in the physical mets:structMap to link to

  • force (boolean) – Whether to add the file even if a mets:file with the same @ID already exists.

  • ignore (boolean) – Do not look for existing files at all. Shift responsibility for preventing errors from duplicate ID to the user.

  • local_filename (string) –

remove_file(*args, **kwargs)[source]

Delete each ocrd:file matching the query. Same arguments as find_files()

remove_one_file(ID)[source]

Delete an existing ocrd_models.ocrd_file.OcrdFile.

Parameters

ID (string) – @ID of the mets:file to delete

Returns

The old ocrd_models.ocrd_file.OcrdFile reference.

property physical_pages

List all page IDs (the @ID of each physical mets:structMap mets:div)

get_physical_pages(for_fileIds=None)[source]

List all page IDs (the @ID of each physical mets:structMap mets:div), optionally for a subset of mets:file @ID for_fileIds.

set_physical_page_for_file(pageId, ocrd_file, order=None, orderlabel=None)[source]

Set the physical page ID (@ID of the physical mets:structMap mets:div entry) corresponding to the mets:file ocrd_file, creating all structures if necessary.

Parameters
Keyword Arguments
  • order (string) – @ORDER to use

  • orderlabel (string) – @ORDERLABEL to use

get_physical_page_for_file(ocrd_file)[source]

Get the physical page ID (@ID of the physical mets:structMap mets:div entry) corresponding to the mets:file ocrd_file.

remove_physical_page(ID)[source]

Delete page (physical mets:structMap mets:div entry @ID) ID.

merge(other_mets, fileGrp_mapping=None, after_add_cb=None, **kwargs)[source]

Add all files from other_mets.

Accepts the same kwargs as find_files()

Keyword Arguments
  • fileGrp_mapping (dict) – Map other_mets fileGrp to fileGrp in this METS

  • after_add_cb (function) – Callback received after file is added to the METS