ocrd_utils.str module

Utility functions for strings, paths and URL.

ocrd_utils.str.assert_file_grp_cardinality(grps, n, msg=None)[source]

Assert that a string of comma-separated fileGrps contains exactly n entries.

ocrd_utils.str.concat_padded(base, *args)[source]

Concatenate string and zero-padded 4 digit number

ocrd_utils.str.get_local_filename(url, start=None)[source]

Return local filename, optionally relative to start

Parameters
  • url (string) – filename or URL

  • start (string) – Base path to remove from filename. Raise an exception if not a prefix of url

ocrd_utils.str.is_local_filename(url)[source]

Whether a url is a local filename.

ocrd_utils.str.is_string(val)[source]

Return whether a value is a str.

ocrd_utils.str.make_file_id(ocrd_file, output_file_grp)[source]

Derive a new file ID for an output file from an existing input file ocrd_file and the name of the output file’s fileGrp/@USE, output_file_grp. If ocrd_file’s ID contains the input file’s fileGrp name, then replace it by output_file_grp. Otherwise use output_file_grp together with the position of ocrd_file within the input fileGrp (as a fallback counter). Increment counter until there is no more ID conflict.

ocrd_utils.str.nth_url_segment(url, n=- 1)[source]

Return the last /-delimited segment of a URL-like string

Parameters
  • url (string) –

  • n (integer) – index of segment, default: -1

ocrd_utils.str.parse_json_string_or_file(*values)[source]

Parse a string as either the path to a JSON object or a literal JSON object.

Empty strings are equivalent to ‘{}’

ocrd_utils.str.parse_json_string_with_comments(val)[source]

Parse a string of JSON interspersed with #-prefixed full-line comments

ocrd_utils.str.remove_non_path_from_url(url)[source]

Remove everything from URL after path.

ocrd_utils.str.safe_filename(url)[source]

Sanitize input to be safely used as the basename of a local file.