ocrd.cli.workspace module¶
OCR-D CLI: workspace management
ocrd workspace¶
Working with workspace
ocrd workspace [OPTIONS] COMMAND [ARGS]...
Options
-
-d
,
--directory
<WORKSPACE_DIR>
¶ Changes the workspace folder location [default: METS_URL directory or .]”
-
-M
,
--mets-basename
<mets_basename>
¶ METS file basename. Deprecated, use –mets/–directory
-
-m
,
--mets
<METS_URL>
¶ The path/URL of the METS file [default: WORKSPACE_DIR/mets.xml]
-
--backup
¶
Backup mets.xml whenever it is saved.
Environment variables
-
WORKSPACE_DIR
Provide a default for
-d
add¶
Add a file or http(s) URL FNAME to METS in a workspace. If FNAME is not an http(s) URL and is not a workspace-local existing file, try to copy to workspace.
ocrd workspace add [OPTIONS] FNAME
Options
-
-G
,
--file-grp
<FILE_GRP>
¶ Required fileGrp USE
-
-i
,
--file-id
<FILE_ID>
¶ Required ID for the file
-
-m
,
--mimetype
<TYPE>
¶ Media type of the file. Guessed from extension if not provided
-
-g
,
--page-id
<PAGE_ID>
¶ ID of the physical page
-
-C
,
--check-file-exists
¶
Whether to ensure FNAME exists
-
--ignore
¶
Do not check whether file exists.
-
--force
¶
If file with ID already exists, replace it. No effect if –ignore is set.
Arguments
-
FNAME
¶
Required argument
backup¶
Backing and restoring workspaces - dev edition
ocrd workspace backup [OPTIONS] COMMAND [ARGS]...
bulk-add¶
Add files in bulk to an OCR-D workspace.
FILE_GLOB can either be a shell glob expression or a list of files.
–regex is applied to the absolute path of every file in FILE_GLOB and can define named groups that can be used in –page-id, –file-id, –mimetype, –url and –file-grp by referencing the named group ‘grp’ in the regex as ‘{{ grp }}’.
b Example:
- ocrd workspace bulk-add \
–regex ‘^.*/(?P<fileGrp>[^/]+)/page_(?P<pageid>.*).(?P<ext>[^.]*)$’ \ –file-id ‘FILE_{{ fileGrp }}_{{ pageid }}’ \ –page-id ‘PHYS_{{ pageid }}’ \ –file-grp “{{ fileGrp }}” \ –url ‘{{ fileGrp }}/FILE_{{ pageid }}.{{ ext }}’ \ path/to/files//.*
ocrd workspace bulk-add [OPTIONS] FILE_GLOB...
Options
-
-r
,
--regex
<regex>
¶ Required Regular expression matching the FILE_GLOB filesystem paths to define named captures usable in the other parameters
-
-m
,
--mimetype
<mimetype>
¶ Media type of the file. If not provided, guess from filename
-
-g
,
--page-id
<page_id>
¶ physical page ID of the file
-
-i
,
--file-id
<file_id>
¶ Required ID of the file
-
-u
,
--url
<url>
¶ Required local filesystem path in the workspace directory (copied from source file if different)
-
-G
,
--file-grp
<file_grp>
¶ Required File group USE of the file
-
-n
,
--dry-run
¶
Don’t actually do anything to the METS or filesystem, just preview
-
-I
,
--ignore
¶
Disable checking for existing file entries (faster)
-
-f
,
--force
¶
Replace existing file entries with the same ID (no effect when –ignore is set, too)
-
-s
,
--skip
¶
Skip files not matching –regex (instead of failing)
Arguments
-
FILE_GLOB
¶
Required argument(s)
clone¶
Create a workspace from METS_URL and return the directory
METS_URL can be a URL, an absolute path or a path relative to $PWD. If METS_URL is not provided, use –mets accordingly. METS_URL can also be an OAI-PMH GetRecord URL wrapping a METS file.
ocrd workspace clone [OPTIONS] METS_URL [WORKSPACE_DIR]
Options
-
-f
,
--clobber-mets
¶
Overwrite existing METS file
-
-a
,
--download
¶
Download all files and change location in METS file after cloning
Arguments
-
METS_URL
¶
Required argument
-
WORKSPACE_DIR
¶
Optional argument
find¶
Find files.
- (If any
FILTER
starts with//
, then its remainder will be interpreted as a regular expression.)
ocrd workspace find [OPTIONS]
Options
-
-i
,
--file-id
<FILTER>
¶ ID
-
-g
,
--page-id
<FILTER>
¶ Page ID
-
-m
,
--mimetype
<FILTER>
¶ Media type to look for
-
-G
,
--file-grp
<FILTER>
¶ fileGrp USE
-
-k
,
--output-field
<output_field>
¶ Output field. Repeat for multiple fields, will be joined with tab
- Options
url | mimetype | pageId | ID | fileGrp | basename | basename_without_extension | local_filename
-
--download
¶
Download found files to workspace and change location in METS file
init¶
Create a workspace with an empty METS file in –directory.
ocrd workspace init [OPTIONS] [DIRECTORY]
Options
-
-f
,
--clobber-mets
¶
Clobber mets.xml if it exists
Arguments
-
DIRECTORY
¶
Optional argument
merge¶
Merges this workspace with the workspace that contains METS_PATH
The --file-id
, --page-id
, --mimetype
and --file-grp
options have
the same semantics as in ocrd workspace find
, see ocrd workspace find --help
for an explanation.
ocrd workspace merge [OPTIONS] METS_PATH
Options
-
--copy-files
,
--no-copy-files
¶
Copy files as well
- Default
True
-
--fileGrp-mapping
<filegrp_mapping>
¶ JSON object mapping src to dest fileGrp
-
-i
,
--file-id
<FILTER>
¶ ID
-
-g
,
--page-id
<FILTER>
¶ Page ID
-
-m
,
--mimetype
<FILTER>
¶ Media type to look for
-
-G
,
--file-grp
<FILTER>
¶ fileGrp USE
Arguments
-
METS_PATH
¶
Required argument
prune-files¶
Removes mets:files that point to non-existing local files
- (If any
FILTER
starts with//
, then its remainder will be interpreted as a regular expression.)
ocrd workspace prune-files [OPTIONS]
Options
-
-G
,
--file-grp
<FILTER>
¶ fileGrp USE
-
-m
,
--mimetype
<FILTER>
¶ Media type to look for
-
-g
,
--page-id
<FILTER>
¶ Page ID
-
-i
,
--file-id
<FILTER>
¶ ID
remove¶
Delete files (given by their ID attribute ID
).
- (If any
ID
starts with//
, then its remainder will be interpreted as a regular expression.)
ocrd workspace remove [OPTIONS] [ID]...
Options
-
-k
,
--keep-file
¶
Do not delete file from file system
-
-f
,
--force
¶
Continue even if mets:file or file on file system does not exist
Arguments
-
ID
¶
Optional argument(s)
remove-group¶
Delete fileGrps (given by their USE attribute GROUP
).
- (If any
GROUP
starts with//
, then its remainder will be interpreted as a regular expression.)
ocrd workspace remove-group [OPTIONS] [GROUP]...
Options
-
-r
,
--recursive
¶
Delete any files in the group before the group itself
-
-f
,
--force
¶
Continue removing even if group or containing files not found in METS
-
-k
,
--keep-files
¶
Do not delete files from file system
Arguments
-
GROUP
¶
Optional argument(s)
rename-group¶
Rename fileGrp (USE attribute NEW
to OLD
).
ocrd workspace rename-group [OPTIONS] OLD NEW
Arguments
-
OLD
¶
Required argument
-
NEW
¶
Required argument
set-id¶
Set METS ID.
If one of the supported identifier mechanisms is used, will set this identifier.
Otherwise will create a new <mods:identifier type=”purl”>{{ ID }}</mods:identifier>.
ocrd workspace set-id [OPTIONS] ID
Arguments
-
ID
¶
Required argument
validate¶
Validate a workspace
METS_URL can be a URL, an absolute path or a path relative to $PWD. If not given, use –mets accordingly.
Check that the METS and its referenced file contents abide by the OCR-D specifications.
ocrd workspace validate [OPTIONS] [METS_URL]
Options
-
-a
,
--download
¶
Download all files
-
-s
,
--skip
<skip>
¶ Tests to skip
- Options
imagefilename | dimension | mets_unique_identifier | mets_file_group_names | mets_files | pixel_density | page | page_xsd | mets_xsd | url
-
--page-textequiv-consistency
,
--page-strictness
<page_textequiv_consistency>
¶ How strict to check PAGE multi-level textequiv consistency
- Options
strict | lax | fix | off
-
--page-coordinate-consistency
<page_coordinate_consistency>
¶ How fierce to check PAGE multi-level coordinate consistency
- Options
poly | baseline | both | off
Arguments
-
METS_URL
¶
Optional argument