Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor resource discovery #559

Merged
merged 74 commits into from
Jan 26, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
2257876
utils: implement resource lookup logic
kba Aug 10, 2020
a7b8001
utils: list_all_resources to list all processor resources
kba Aug 10, 2020
7e362d7
ocrd_utils.constants: XDG_CACHE_HOME
kba Aug 10, 2020
78e84a2
list_all_resources: also look in XDG_CACHE_HOME
kba Aug 10, 2020
5c75f40
Processor: implement resolve_resource and list_all_resources
kba Aug 10, 2020
bb63210
resolve_resource: also look in XDG_CACHE_HOME
kba Aug 10, 2020
297a0f3
Processor: fix signature for list_resource_candidates
kba Aug 10, 2020
c999229
initial test of list_resource_candidates
kba Aug 25, 2020
479cedd
test_os: not all test environments have VIRTUAL_ENV set
kba Oct 13, 2020
247bf4d
Merge branch 'master' into resolve-files
kba Oct 19, 2020
8e0b595
Merge branch 'master' into resolve-files
kba Oct 27, 2020
0d97c2d
wip
kba Oct 27, 2020
a54b5bf
Merge branch 'master' into resolve-files
kba Dec 11, 2020
6e76653
fixes merge error {f,}chmod
kba Dec 11, 2020
afcd117
run non-logging unit tests with standard $HOME
kba Dec 15, 2020
a3226b1
implement -C/-L cmdline flags
kba Dec 16, 2020
c2e0460
schema for resource list
kba Dec 21, 2020
5a6ccf3
implement foundation of ocrd resmgr
kba Dec 21, 2020
128f6b7
ocrd resmgr list-{installed,available} same output
kba Dec 21, 2020
0f42da6
resmgr: basic downloading of urls of files
kba Dec 22, 2020
b63b4d8
resmgr: support parameter_usage different from resource name
kba Dec 22, 2020
3f0eeac
add more models to resource_list.yml
kba Dec 22, 2020
6e6e424
resmgr: simplify resource typing
kba Dec 22, 2020
5843e1b
resmgr: support tarball downloads
kba Dec 22, 2020
0edac70
search for resources only on top-level
kba Dec 22, 2020
f96ce5e
use resmgr in Processor.resolve_resource
kba Dec 22, 2020
fa90f1b
simplify Processor.resolve_resource, delegate to resmgr as much as po…
kba Dec 22, 2020
89f77f0
resmgr: add anybaseocr resources
kba Dec 23, 2020
18009a7
resmgr download: show progressbar, add size to resource list
kba Dec 23, 2020
2df4c22
fix resmgr test
kba Dec 23, 2020
e8d0e0f
resmgr download: * to download all resources for this model
kba Dec 23, 2020
1fd35b9
:package: pre-release 2.22.0b1
kba Dec 28, 2020
849de10
new PAGE XML user method get_AllTextLine
kba Dec 30, 2020
bf47a07
update assets
kba Dec 30, 2020
db36dd3
kraken resources
kba Dec 30, 2020
a571b82
:package: pre-release 2.22.0b2
kba Dec 30, 2020
3aa60a8
reslist: use name w/o slash
kba Dec 30, 2020
4bf12fb
:package: pre-release v2.22.0b3
kba Dec 31, 2020
2c26eb0
Update ocrd/ocrd/processor/base.py
kba Jan 4, 2021
02e6415
rename PAGE method get_AllTextLine{,s}
kba Jan 4, 2021
4def1d9
OcrdPage.get_AllTextLines: support region_order, stub for textline_order
kba Jan 4, 2021
7911603
ocrd resmgr list-installed: look in fs for candidates
kba Jan 5, 2021
54a214a
resource_list.yml: typo: ocrd{,-cis}-ocropy-recognize
kba Jan 5, 2021
e33346b
resmgr list-installed: create stub in user resource list for unregist…
kba Jan 5, 2021
3ee66ce
resmgr: use last URL segment as the resource name
kba Jan 5, 2021
b21d462
resmgr: unquote URL encoded path
kba Jan 5, 2021
d8d97af
resmgr: use GET instead of HEAD for content-length
kba Jan 6, 2021
509200c
resmgr: support "download" (=copying) of local files
kba Jan 7, 2021
cbbc09a
resmgr, introduce intermediary "ocrd-resource" dir
kba Jan 12, 2021
7b1b6c9
default to VIRTUAL_ENV sharedir
kba Jan 12, 2021
565ba38
resmgr: save stub on download
kba Jan 12, 2021
012e49e
get_AllTextLines: implement textlineOrder
bertsky Jan 12, 2021
199b430
resmgr: ocrd-resources also for list_resource_candidates
kba Jan 12, 2021
8349807
resmgr: add @stweil's ONB model to list
kba Jan 14, 2021
7840b5b
resmgr: when wildcard downloading, omit ??? user entries
kba Jan 18, 2021
5038005
add a config file $XDG_CONFIG_HOME/ocrd.yml
kba Jan 19, 2021
2ab2151
ocrd resmgr: use resource_location from config for default
kba Jan 19, 2021
4687886
config: merge with default config for updated config
kba Jan 19, 2021
032929e
move config file to $XDG_CONFIG_HOME/ocrd/config.yml for consistency
kba Jan 19, 2021
c6a53b0
resource manager: methods to resolve resource dirs
kba Jan 19, 2021
a741a72
Merge branch 'master' into resolve-files
kba Jan 20, 2021
53a591d
:package: v2.22.0b4
kba Jan 20, 2021
a05ecf4
fix ocrd_config test
kba Jan 20, 2021
61a8845
config: mkdir -p $(basename)
kba Jan 20, 2021
fd8ca26
:bug: resmgr: virtualenv location was missing "share"
kba Jan 21, 2021
a3cff9e
resmgr: show shorthand location in list-installed
kba Jan 21, 2021
9280ef4
remove virtualenv, introduce /usr/local/share
kba Jan 22, 2021
9cb058a
:fire: remove configuration file
kba Jan 22, 2021
7e26a07
resmgr: lookup in XDG_DATA_HOME and absolute path only
kba Jan 22, 2021
22fb2c6
resmgr download: be stricter about uninstalled processors
kba Jan 25, 2021
2b3cb64
resmgr download "*"
kba Jan 25, 2021
ac74c3d
Update ocrd/ocrd/resource_manager.py
kba Jan 25, 2021
134a0c1
Update ocrd/ocrd/resource_manager.py
kba Jan 25, 2021
a5858ec
allow "from ocrd import OcrdResourceManager"
kba Jan 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 2 additions & 9 deletions ocrd/ocrd/cli/resmgr.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,11 @@
from ocrd_utils import (
initLogging,
getLogger,
VIRTUAL_ENV,
RESOURCE_LOCATIONS,
XDG_CACHE_HOME,
XDG_CONFIG_HOME,
XDG_DATA_HOME
RESOURCE_LOCATIONS
)
from ocrd_validators import OcrdZipValidator

from ..resource_manager import OcrdResourceManager
from ..config import load_config_file

config = load_config_file()

def print_resources(executable, reslist, resmgr):
print('%s' % executable)
Expand Down Expand Up @@ -64,7 +57,7 @@ def list_installed(executable=None):
@resmgr_cli.command('download')
@click.option('-n', '--any-url', help='Allow downloading/copying unregistered resources', is_flag=True)
@click.option('-o', '--overwrite', help='Overwrite existing resources', is_flag=True)
@click.option('-l', '--location', help='Where to store resources', type=click.Choice(RESOURCE_LOCATIONS), default=config.resource_location, show_default=True)
@click.option('-l', '--location', help='Where to store resources', type=click.Choice(RESOURCE_LOCATIONS), default='data', show_default=True)
@click.argument('executable', required=True)
@click.argument('url_or_name', required=True)
def download(any_url, overwrite, location, executable, url_or_name):
Expand Down
31 changes: 0 additions & 31 deletions ocrd/ocrd/config.py

This file was deleted.

1 change: 0 additions & 1 deletion ocrd/ocrd/processor/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
initLogging,
list_resource_candidates,
list_all_resources,
XDG_CACHE_HOME
)
from ocrd_validators import ParameterValidator
from ocrd_models.ocrd_page import MetadataItemType, LabelType, LabelsType
Expand Down
22 changes: 5 additions & 17 deletions ocrd/ocrd/resource_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,10 @@

from ocrd_validators import OcrdResourceListValidator
from ocrd_utils import getLogger
from ocrd_utils.constants import HOME, XDG_CACHE_HOME, XDG_CONFIG_HOME, XDG_DATA_HOME, VIRTUAL_ENV
from ocrd_utils.constants import HOME, XDG_DATA_HOME, XDG_CONFIG_HOME
from ocrd_utils.os import list_all_resources, pushd_popd

from .constants import RESOURCE_LIST_FILENAME, RESOURCE_USER_LIST_COMMENT
from .config import load_config_file

class OcrdResourceManager():

Expand Down Expand Up @@ -71,9 +70,7 @@ def list_installed(self, executable=None):
# resources we know about
all_executables = list(self.database.keys())
# resources in the file system
parent_dirs = [join(x, 'ocrd-resources') for x in [XDG_CACHE_HOME, XDG_CONFIG_HOME, XDG_DATA_HOME]]
if VIRTUAL_ENV:
parent_dirs += [join(VIRTUAL_ENV, 'share', 'ocrd-resources')]
parent_dirs = [join(x, 'ocrd-resources') for x in [XDG_DATA_HOME, '/usr/local/share']]
kba marked this conversation as resolved.
Show resolved Hide resolved
for parent_dir in parent_dirs:
if Path(parent_dir).exists():
all_executables += [x for x in listdir(parent_dir) if x.startswith('ocrd-')]
Expand Down Expand Up @@ -135,25 +132,16 @@ def find_resources(self, executable=None, name=None, url=None, database=None):
return ret

kba marked this conversation as resolved.
Show resolved Hide resolved
def location_to_resource_dir(self, location):
return join(VIRTUAL_ENV, 'share', 'ocrd-resources') if location == 'virtualenv' and VIRTUAL_ENV else \
join(XDG_CACHE_HOME, 'ocrd-resources') if location == 'cache' else \
return '/usr/local/share/ocrd-resources' if location == 'system' else \
join(XDG_DATA_HOME, 'ocrd-resources') if location == 'data' else \
join(XDG_CONFIG_HOME, 'ocrd-resources') if location == 'config' else \
getcwd()

def resource_dir_to_location(self, resource_path):
resource_path = str(resource_path)
return 'virtualenv' if VIRTUAL_ENV and resource_path.startswith(join(VIRTUAL_ENV, 'share', 'ocrd-resources')) else \
'cache' if resource_path.startswith(join(XDG_CACHE_HOME, 'ocrd-resources')) else \
return 'system' if resource_path.startswith('/usr/local/share/ocrd-resources') else \
'data' if resource_path.startswith(join(XDG_DATA_HOME, 'ocrd-resources')) else \
'config' if resource_path.startswith(join(XDG_CONFIG_HOME, 'ocrd-resources')) else \
resource_path

@property
def default_resource_dir(self):
config = load_config_file()
return self.location_to_resource_dir(config.resource_location)

def parameter_usage(self, name, usage='as-is'):
if usage == 'as-is':
return name
Expand Down Expand Up @@ -189,8 +177,8 @@ def download(
self,
executable,
url,
basedir,
overwrite=False,
basedir=XDG_CACHE_HOME,
name=None,
resource_type='file',
path_in_archive='.',
Expand Down
2 changes: 1 addition & 1 deletion ocrd_modelfactory/ocrd_modelfactory/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from PIL import Image

from ocrd_utils import VERSION, MIMETYPE_PAGE
from ocrd_models import OcrdExif, OcrdConfig
from ocrd_models import OcrdExif
from ocrd_models.ocrd_page import PcGtsType, PageType, MetadataType, parse

__all__ = [
Expand Down
1 change: 0 additions & 1 deletion ocrd_models/ocrd_models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
APIs and schemas for various file formats in the OCR domain.
"""
from .ocrd_agent import OcrdAgent
from .ocrd_config import OcrdConfig
from .ocrd_exif import OcrdExif
from .ocrd_file import OcrdFile
from .ocrd_mets import OcrdMets
Expand Down
25 changes: 0 additions & 25 deletions ocrd_models/ocrd_models/ocrd_config.py

This file was deleted.

2 changes: 0 additions & 2 deletions ocrd_utils/ocrd_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,6 @@
LOG_FORMAT,
LOG_TIMEFMT,
VERSION,
VIRTUAL_ENV,
XDG_CACHE_HOME,
XDG_CONFIG_HOME,
XDG_DATA_HOME)

Expand Down
6 changes: 1 addition & 5 deletions ocrd_utils/ocrd_utils/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,8 @@
'REGEX_FILE_ID',
'RESOURCE_LOCATIONS',
'VERSION',
'VIRTUAL_ENV',
'XDG_CONFIG_HOME',
'XDG_DATA_HOME',
'XDG_CACHE_HOME',
]

VERSION = get_distribution('ocrd_utils').version
Expand Down Expand Up @@ -106,7 +104,5 @@
HOME = expanduser('~')
XDG_DATA_HOME = environ['XDG_DATA_HOME'] if 'XDG_DATA_HOME' in environ else join(HOME, '.local', 'share')
XDG_CONFIG_HOME = environ['XDG_CONFIG_HOME'] if 'XDG_CONFIG_HOME' in environ else join(HOME, '.config')
XDG_CACHE_HOME = environ['XDG_CACHE_HOME'] if 'XDG_CACHE_HOME' in environ else join(HOME, '.cache')
VIRTUAL_ENV = environ.get('VIRTUAL_ENV', None)

RESOURCE_LOCATIONS = ['virtualenv', 'cwd', 'cache', 'config', 'data']
RESOURCE_LOCATIONS = ['data', 'cwd', 'system']
20 changes: 8 additions & 12 deletions ocrd_utils/ocrd_utils/os.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

from atomicwrites import atomic_write as atomic_write_, AtomicWriter

from .constants import XDG_DATA_HOME, XDG_CONFIG_HOME, XDG_CACHE_HOME
from .constants import XDG_DATA_HOME

def abspath(url):
"""
Expand Down Expand Up @@ -69,11 +69,8 @@ def list_resource_candidates(executable, fname, cwd=getcwd(), is_file=False, is_
processor_path_var = '%s_PATH' % executable.replace('-', '_').upper()
if processor_path_var in environ:
candidates += [join(x, fname) for x in environ[processor_path_var].split(':')]
if 'VIRTUAL_ENV' in environ:
candidates.append(join(environ['VIRTUAL_ENV'], 'share', 'ocrd-resources', executable, fname))
candidates.append(join(XDG_DATA_HOME, 'ocrd-resources', executable, fname))
candidates.append(join(XDG_CONFIG_HOME, 'ocrd-resources', executable, fname))
candidates.append(join(XDG_CACHE_HOME, 'ocrd-resources', executable, fname))
candidates.append(join('/usr/local/share/ocrd-resources', executable, fname))
if is_file:
candidates = [c for c in candidates if Path(c).is_file()]
if is_dir:
Expand All @@ -94,13 +91,12 @@ def list_all_resources(executable):
for processor_path in environ[processor_path_var].split(':'):
if isdir(processor_path):
candidates += list(scandir(processor_path))
if 'VIRTUAL_ENV' in environ:
sharedir = join(environ['VIRTUAL_ENV'], 'share', 'ocrd-resources', executable)
if isdir(sharedir):
candidates += list(scandir(sharedir))
for xdgdir in [join(d, 'ocrd-resources', executable) for d in [XDG_DATA_HOME, XDG_CONFIG_HOME, XDG_CACHE_HOME]]:
if isdir(xdgdir):
candidates += list(scandir(xdgdir))
datadir = join(XDG_DATA_HOME, 'ocrd-resources', executable)
if isdir(datadir):
candidates += list(scandir(datadir))
systemdir = join('/usr/local/share/ocrd-resources', executable)
if isdir(systemdir):
candidates += list(scandir(systemdir))
return [x.path for x in candidates]

# ht @pabs3
Expand Down
2 changes: 0 additions & 2 deletions ocrd_validators/ocrd_validators/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
'WorkspaceValidator',
'PageValidator',
'OcrdToolValidator',
'OcrdConfigValidator',
'OcrdResourceListValidator',
'OcrdZipValidator',
'XsdValidator',
Expand All @@ -18,7 +17,6 @@
from .workspace_validator import WorkspaceValidator
from .page_validator import PageValidator
from .ocrd_tool_validator import OcrdToolValidator
from .ocrd_config_validator import OcrdConfigValidator
from .resource_list_validator import OcrdResourceListValidator
from .ocrd_zip_validator import OcrdZipValidator
from .xsd_validator import XsdValidator
Expand Down
1 change: 0 additions & 1 deletion ocrd_validators/ocrd_validators/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@

OCRD_TOOL_SCHEMA = yaml.safe_load(resource_string(__name__, 'ocrd_tool.schema.yml'))
RESOURCE_LIST_SCHEMA = yaml.safe_load(resource_string(__name__, 'resource_list.schema.yml'))
CONFIG_SCHEMA = yaml.safe_load(resource_string(__name__, 'ocrd_config.schema.yml'))
OCRD_BAGIT_PROFILE = yaml.safe_load(resource_string(__name__, 'bagit-profile.yml'))

BAGIT_TXT = 'BagIt-Version: 1.0\nTag-File-Character-Encoding: UTF-8'
Expand Down
7 changes: 0 additions & 7 deletions ocrd_validators/ocrd_validators/ocrd_config.schema.yml

This file was deleted.

22 changes: 0 additions & 22 deletions ocrd_validators/ocrd_validators/ocrd_config_validator.py

This file was deleted.

2 changes: 1 addition & 1 deletion repo/spec
Submodule spec updated 1 files
+2 −1 ocrd_tool.md
17 changes: 0 additions & 17 deletions tests/test_ocrd_config.py

This file was deleted.

12 changes: 1 addition & 11 deletions tests/utils/test_os.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,11 @@ class TestOsUtils(TestCase):
def setUp(self):
self.maxDiff = None
self.tempdir_path = mkdtemp()
self.tempdir_venv = mkdtemp()
ENV['OCRD_DUMMY_PATH'] = self.tempdir_path
self.VIRTUAL_ENV = ENV.get('VIRTUAL_ENV')
ENV['VIRTUAL_ENV'] = self.tempdir_venv

def tearDown(self):
rmtree(self.tempdir_path)
rmtree(self.tempdir_venv)
del ENV['OCRD_DUMMY_PATH']
if self.VIRTUAL_ENV:
ENV['VIRTUAL_ENV'] = self.VIRTUAL_ENV
else:
del ENV['VIRTUAL_ENV']

def test_resolve_basic(self):
def dehomify(s):
Expand All @@ -37,10 +29,8 @@ def dehomify(s):
self.assertEqual(cands, [join(x, fname) for x in [
dehomify(join(getcwd(), 'ocrd-resources')),
dehomify(self.tempdir_path),
dehomify(join(self.tempdir_venv, 'share', 'ocrd-resources', 'ocrd-dummy')),
'$HOME/.local/share/ocrd-resources/ocrd-dummy',
'$HOME/.config/ocrd-resources/ocrd-dummy',
'$HOME/.cache/ocrd-resources/ocrd-dummy',
'/usr/local/share/ocrd-resources/ocrd-dummy',
]])


Expand Down