ocrd.cli.workspace module¶
OCR-D CLI: workspace management
ocrd workspace¶
Working with workspace
ocrd workspace [OPTIONS] COMMAND [ARGS]...
Options
-
-d,--directory<WORKSPACE_DIR>¶ Changes the workspace folder location [default: METS_URL directory or .]”
-
-M,--mets-basename<mets_basename>¶ METS file basename. Deprecated, use –mets/–directory
-
-m,--mets<METS_URL>¶ The path/URL of the METS file [default: WORKSPACE_DIR/mets.xml]
-
--backup¶ Backup mets.xml whenever it is saved.
Environment variables
-
WORKSPACE_DIR Provide a default for
-d
add¶
Add a file or http(s) URL FNAME to METS in a workspace. If FNAME is not an http(s) URL and is not a workspace-local existing file, try to copy to workspace.
ocrd workspace add [OPTIONS] FNAME
Options
-
-G,--file-grp<FILE_GRP>¶ Required fileGrp USE
-
-i,--file-id<FILE_ID>¶ Required ID for the file
-
-m,--mimetype<TYPE>¶ Media type of the file. Guessed from extension if not provided
-
-g,--page-id<PAGE_ID>¶ ID of the physical page
-
-C,--check-file-exists¶ Whether to ensure FNAME exists
-
--ignore¶ Do not check whether file exists.
-
--force¶ If file with ID already exists, replace it. No effect if –ignore is set.
Arguments
-
FNAME¶ Required argument
backup¶
Backing and restoring workspaces - dev edition
ocrd workspace backup [OPTIONS] COMMAND [ARGS]...
bulk-add¶
Add files in bulk to an OCR-D workspace.
FILE_GLOB can either be a shell glob expression or a list of files.
–regex is applied to the absolute path of every file in FILE_GLOB and can define named groups that can be used in –page-id, –file-id, –mimetype, –url and –file-grp by referencing the named group ‘grp’ in the regex as ‘{{ grp }}’.
b Example:
- ocrd workspace bulk-add \
–regex ‘^.*/(?P<fileGrp>[^/]+)/page_(?P<pageid>.*).(?P<ext>[^.]*)$’ \ –file-id ‘FILE_{{ fileGrp }}_{{ pageid }}’ \ –page-id ‘PHYS_{{ pageid }}’ \ –file-grp “{{ fileGrp }}” \ –url ‘{{ fileGrp }}/FILE_{{ pageid }}.{{ ext }}’ \ path/to/files//.*
ocrd workspace bulk-add [OPTIONS] FILE_GLOB...
Options
-
-r,--regex<regex>¶ Required Regular expression matching the FILE_GLOB filesystem paths to define named captures usable in the other parameters
-
-m,--mimetype<mimetype>¶ Media type of the file. If not provided, guess from filename
-
-g,--page-id<page_id>¶ physical page ID of the file
-
-i,--file-id<file_id>¶ Required ID of the file
-
-u,--url<url>¶ Required local filesystem path in the workspace directory (copied from source file if different)
-
-G,--file-grp<file_grp>¶ Required File group USE of the file
-
-n,--dry-run¶ Don’t actually do anything to the METS or filesystem, just preview
-
-I,--ignore¶ Disable checking for existing file entries (faster)
-
-f,--force¶ Replace existing file entries with the same ID (no effect when –ignore is set, too)
-
-s,--skip¶ Skip files not matching –regex (instead of failing)
Arguments
-
FILE_GLOB¶ Required argument(s)
clone¶
Create a workspace from METS_URL and return the directory
METS_URL can be a URL, an absolute path or a path relative to $PWD. If METS_URL is not provided, use –mets accordingly. METS_URL can also be an OAI-PMH GetRecord URL wrapping a METS file.
ocrd workspace clone [OPTIONS] METS_URL [WORKSPACE_DIR]
Options
-
-f,--clobber-mets¶ Overwrite existing METS file
-
-a,--download¶ Download all files and change location in METS file after cloning
Arguments
-
METS_URL¶ Required argument
-
WORKSPACE_DIR¶ Optional argument
find¶
Find files.
- (If any
FILTERstarts with//, then its remainder will be interpreted as a regular expression.)
ocrd workspace find [OPTIONS]
Options
-
-i,--file-id<FILTER>¶ ID
-
-g,--page-id<FILTER>¶ Page ID
-
-m,--mimetype<FILTER>¶ Media type to look for
-
-G,--file-grp<FILTER>¶ fileGrp USE
-
-k,--output-field<output_field>¶ Output field. Repeat for multiple fields, will be joined with tab
- Options
url | mimetype | pageId | ID | fileGrp | basename | basename_without_extension | local_filename
-
--download¶ Download found files to workspace and change location in METS file
init¶
Create a workspace with an empty METS file in –directory.
ocrd workspace init [OPTIONS] [DIRECTORY]
Options
-
-f,--clobber-mets¶ Clobber mets.xml if it exists
Arguments
-
DIRECTORY¶ Optional argument
merge¶
Merges this workspace with the workspace that contains METS_PATH
The --file-id, --page-id, --mimetype and --file-grp options have
the same semantics as in ocrd workspace find, see ocrd workspace find --help
for an explanation.
ocrd workspace merge [OPTIONS] METS_PATH
Options
-
--copy-files,--no-copy-files¶ Copy files as well
- Default
True
-
--fileGrp-mapping<filegrp_mapping>¶ JSON object mapping src to dest fileGrp
-
-i,--file-id<FILTER>¶ ID
-
-g,--page-id<FILTER>¶ Page ID
-
-m,--mimetype<FILTER>¶ Media type to look for
-
-G,--file-grp<FILTER>¶ fileGrp USE
Arguments
-
METS_PATH¶ Required argument
prune-files¶
Removes mets:files that point to non-existing local files
- (If any
FILTERstarts with//, then its remainder will be interpreted as a regular expression.)
ocrd workspace prune-files [OPTIONS]
Options
-
-G,--file-grp<FILTER>¶ fileGrp USE
-
-m,--mimetype<FILTER>¶ Media type to look for
-
-g,--page-id<FILTER>¶ Page ID
-
-i,--file-id<FILTER>¶ ID
remove¶
Delete files (given by their ID attribute ID).
- (If any
IDstarts with//, then its remainder will be interpreted as a regular expression.)
ocrd workspace remove [OPTIONS] [ID]...
Options
-
-k,--keep-file¶ Do not delete file from file system
-
-f,--force¶ Continue even if mets:file or file on file system does not exist
Arguments
-
ID¶ Optional argument(s)
remove-group¶
Delete fileGrps (given by their USE attribute GROUP).
- (If any
GROUPstarts with//, then its remainder will be interpreted as a regular expression.)
ocrd workspace remove-group [OPTIONS] [GROUP]...
Options
-
-r,--recursive¶ Delete any files in the group before the group itself
-
-f,--force¶ Continue removing even if group or containing files not found in METS
-
-k,--keep-files¶ Do not delete files from file system
Arguments
-
GROUP¶ Optional argument(s)
rename-group¶
Rename fileGrp (USE attribute NEW to OLD).
ocrd workspace rename-group [OPTIONS] OLD NEW
Arguments
-
OLD¶ Required argument
-
NEW¶ Required argument
set-id¶
Set METS ID.
If one of the supported identifier mechanisms is used, will set this identifier.
Otherwise will create a new <mods:identifier type=”purl”>{{ ID }}</mods:identifier>.
ocrd workspace set-id [OPTIONS] ID
Arguments
-
ID¶ Required argument
validate¶
Validate a workspace
METS_URL can be a URL, an absolute path or a path relative to $PWD. If not given, use –mets accordingly.
Check that the METS and its referenced file contents abide by the OCR-D specifications.
ocrd workspace validate [OPTIONS] [METS_URL]
Options
-
-a,--download¶ Download all files
-
-s,--skip<skip>¶ Tests to skip
- Options
imagefilename | dimension | mets_unique_identifier | mets_file_group_names | mets_files | pixel_density | page | page_xsd | mets_xsd | url
-
--page-textequiv-consistency,--page-strictness<page_textequiv_consistency>¶ How strict to check PAGE multi-level textequiv consistency
- Options
strict | lax | fix | off
-
--page-coordinate-consistency<page_coordinate_consistency>¶ How fierce to check PAGE multi-level coordinate consistency
- Options
poly | baseline | both | off
Arguments
-
METS_URL¶ Optional argument