Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ICU tokenizer #307

Closed
mausch opened this issue Mar 3, 2022 · 1 comment · Fixed by #309
Closed

Support ICU tokenizer #307

mausch opened this issue Mar 3, 2022 · 1 comment · Fixed by #309

Comments

@mausch
Copy link
Contributor

mausch commented Mar 3, 2022

Hi and thank you for creating and maintaining these docker images!

On to the issue 🙂

Setting NOMINATIM_TOKENIZER=icu on image tag 4.0-d880386e3e7833363dab5b5b37fa72f6d65c9766 crashes import with:

2022-03-03 15:39:05: Setting up tokenizer
.........................
Traceback (most recent call last):
  File "/usr/local/bin/nominatim", line 11, in <module>
    exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module',
  File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 235, in nominatim
    return parser.run(**kwargs)
  File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 96, in run
    return args.command.run(args)
  File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/setup.py", line 101, in run
    tokenizer = SetupAll._get_tokenizer(args.continue_at, args.config)
  File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/setup.py", line 171, in _get_tokenizer
    return tokenizer_factory.create_tokenizer(config)
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/factory.py", line 59, in create_tokenizer
    tokenizer.init_new_db(config, init_db=init_db)
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/icu_tokenizer.py", line 46, in init_new_db
    self.loader = ICURuleLoader(config)
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/icu_rule_loader.py", line 48, in __init__
    self._setup_analysis()
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/icu_rule_loader.py", line 128, in _setup_analysis
    self.analysis[name] = TokenAnalyzerRule(section, self.normalization_rules)
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/icu_rule_loader.py", line 156, in __init__
    analysis_mod = importlib.import_module(module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/nominatim/lib-python/nominatim/tokenizer/token_analysis/generic.py", line 9, in <module>
    import datrie
ModuleNotFoundError: No module named 'datrie'

The ICU tokenizer was introduced in Nominatim 4.0.0 : https://nominatim.org/2021/11/03/release-40.html

As far as I understand this new tokenizer makes the custom Postgres module obsolete, which means we can deploy Nominatim on managed Postgres instances e.g. AWS RDS.

Maybe the python3-datrie package just needs to be added to the Dockerfile around here?

python3-icu git \

@leonardehrenfried
Copy link
Collaborator

Yes, I think that you need to add python3-datrie and perhaps a few others.

PRs on this are very welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants