Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide external PostgreSQL support #245

Closed
sghaida opened this issue Jul 1, 2021 · 18 comments
Closed

provide external PostgreSQL support #245

sghaida opened this issue Jul 1, 2021 · 18 comments

Comments

@sghaida
Copy link

sghaida commented Jul 1, 2021

Is your feature request related to a problem? Please describe.
this feature request is all about maintainability of the database installation, backup

Describe the solution you'd like
i would like to contribute a PR to support connectivity to external Database and to be able to do so i need to update some ENVs and initialization scripts of the container along with the Dockerfile also it would be nice to contribute this pr to you so i don't need to maintain different fork for every version update. in the PR i will maintain both ways of deployments (Local DB, Remote DB).

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
on cloud provider mounting the DB on external volume using NFS will make sense but then we need to worry about the database backup using some external scripts , in case the volume got corrupted then we can recover.

Additional context
the reason for that, im trying to run the container on AWS ECS but running DB inside the container is limiting, and time consuming on initialization, not to mention mounting DB on external volume is not reliable, in terms if the NFS volume got corrupted

@leonardehrenfried
Copy link
Collaborator

leonardehrenfried commented Jul 1, 2021

Well, I would not be opposed to a feature where you override the postgres connection string via an env variable. Be aware that in such a case an empty, unused postgres instance would still be running in the container but that wouldn't be too resource intensive. Lets see what the other maintainers think.

However, have you seen the comments in this similar request? #238

Seems that you need filesystem access to the host where the database is deployed which rules out RDS, doesn't it?

@robjuz
Copy link

robjuz commented Jul 20, 2021

@sghaida Just override the NOMINATIM_DATABASE_DSN variable.

I'm doing so in my Helm chart and it works with an external database.

@mausch
Copy link
Contributor

mausch commented Mar 17, 2022

Something weird happens when initialising the container with NOMINATIM_DATABASE_DSN...
It creates and populates the database but then fails at

sudo -u nominatim nominatim admin --check-database
as if the variable were not set?

e.g.

docker run -it --rm \
  -e PBF_URL=https://download.geofabrik.de/europe/monaco-latest.osm.pbf \
  -e NOMINATIM_DATABASE_DSN="pgsql:dbname=nominatim;hostaddr=192.168.0.3;user=my_user;password=my_pass" \
  -p 8080:8080 \
  --name nominatim \
  mediagis/nominatim:4.0-d880386e3e7833363dab5b5b37fa72f6d65c9766

Relevant output:

...
+ sudo -E -u nominatim nominatim import --osm-file /nominatim/data.osm.pbf --threads 16

2022-03-17 11:45:01: Using project directory: /nominatim
2022-03-17 11:45:01: Creating database
2022-03-17 11:45:03: Setting up country tables
2022-03-17 11:45:05: Importing OSM data file
2022-03-17 11:45:05  osm2pgsql version 1.5.1
2022-03-17 11:45:05  Database version: 10.12 (Debian 10.12-2.pgdg90+1)
2022-03-17 11:45:05  PostGIS version: 2.5
2022-03-17 11:45:05  Parsing gazetteer style file '/usr/local/etc/nominatim/import-full.style'.
...
2022-03-17 11:45:14: Starting indexing rank (0 to 4) using 16 threads
...
2022-03-17 11:45:30: Done 2168/2168 in 8 @ 258.961 per second - FINISHED rank 30
...
2022-03-17 11:45:35: Recompute word counts
2022-03-17 11:45:35: Setup website at /nominatim/website
+ '[' -f tiger-nominatim-preprocessed.csv.tar.gz ']'
+ sudo -u nominatim nominatim admin --check-database
2022-03-17 11:45:35: Using project directory: /nominatim
2022-03-17 11:45:35: Checking database
Checking database connection ... Failed
Cannot connect to database: FATAL:  database "nominatim" does not exist

Hints:

* Is the database server started?
* Check the NOMINATIM_DATABASE_DSN variable in your local .env
* Try connecting to the database with the same settings

Project directory: /nominatim
Current setting of NOMINATIM_DATABASE_DSN: pgsql:dbname=nominatim

@leonardehrenfried
Copy link
Collaborator

leonardehrenfried commented Mar 17, 2022

Just guessing: the last line shows the value of NOMINATIM_DATABASE_DSN cut off after the first semicolon. Did you not copy the complete line or the bug somewhere in this area?

@mausch
Copy link
Contributor

mausch commented Mar 17, 2022

That's the whole output, I'm not cutting it off.
It looks like it's using the default value i.e. that command doesn't seem to be picking up the env var?

@mausch
Copy link
Contributor

mausch commented Mar 17, 2022

Seems that you need filesystem access to the host where the database is deployed which rules out RDS, doesn't it?

This requirement should go away with #307 .
I just need to connect to an external Postgres instance to test it 🙂

@leonardehrenfried
Copy link
Collaborator

You could try removing the line with the database check and see what happens. Perhaps there is a bug in it. Or raise a ticket with upstream nominatim.

@mausch
Copy link
Contributor

mausch commented Mar 17, 2022

Made it work in #308

leonardehrenfried added a commit that referenced this issue Mar 17, 2022
#245 Pass environment variables to all sudo commands
@leonardehrenfried
Copy link
Collaborator

@mausch Now that #308 has been merged, would you be able to provide a little bit of documentation on how to do it?

I think a paragraph in this ticket would be enough.

@mausch
Copy link
Contributor

mausch commented Mar 18, 2022

Of course!

Here's an example command:

docker run -it --rm \
  -e PBF_URL=https://download.geofabrik.de/europe/monaco-latest.osm.pbf \
  -e NOMINATIM_TOKENIZER=icu \
  -e NOMINATIM_DATABASE_DSN="pgsql:dbname=nominatim;hostaddr=192.168.0.3;user=my_user;password=my_pass" \
  -e PGHOSTADDR=192.168.0.3 \
  -e PGDATABASE=nominatim \
  -e PGUSER=my_user \
  -e PGPASSWORD=my_pass \
  -p 8080:8080 \
  --name nominatim \
  mediagis/nominatim:4.0-2a43ad71ad58d3b86f0b23a535b84f71f68a53ab

At the moment you have to duplicate the connection details since NOMINATIM_DATABASE_DSN is used for the nominatim commands and PGDATABASE et al are used for psql.
Maybe in the future there could be a script parsing NOMINATIM_DATABASE_DSN into PGDATABASE and the rest.
This helm chart for example works the other way around: it builds the DSN from individual settings.

As you said above Postgres is still started within the container but it's not used. Maybe in the future the scripts could detect NOMINATIM_DATABASE_DSN and not start Postgres if present.

Also note that when using an external Postgres it's up to you to make sure that the tokenizer is correctly configured (either use ICU or install the tokenizer on your Postgres separately).

leonardehrenfried added a commit that referenced this issue Jul 21, 2022
Point documentation to issues, closes #245
@MALKARAJ
Copy link

Screenshot from 2022-08-21 23-04-06
Getting password error even though the password is correct

@mausch
Copy link
Contributor

mausch commented Aug 21, 2022

@MALKARAJ if you're trying to connect it to a postgres instance running outside the nominatim container then the hostaddr can't be 127.0.0.1

@inovramadani
Copy link

inovramadani commented Nov 30, 2023

Of course!

Here's an example command:

docker run -it --rm \
  -e PBF_URL=https://download.geofabrik.de/europe/monaco-latest.osm.pbf \
  -e NOMINATIM_TOKENIZER=icu \
  -e NOMINATIM_DATABASE_DSN="pgsql:dbname=nominatim;hostaddr=192.168.0.3;user=my_user;password=my_pass" \
  -e PGHOSTADDR=192.168.0.3 \
  -e PGDATABASE=nominatim \
  -e PGUSER=my_user \
  -e PGPASSWORD=my_pass \
  -p 8080:8080 \
  --name nominatim \
  mediagis/nominatim:4.0-2a43ad71ad58d3b86f0b23a535b84f71f68a53ab

At the moment you have to duplicate the connection details since NOMINATIM_DATABASE_DSN is used for the nominatim commands and PGDATABASE et al are used for psql. Maybe in the future there could be a script parsing NOMINATIM_DATABASE_DSN into PGDATABASE and the rest. This helm chart for example works the other way around: it builds the DSN from individual settings.

As you said above Postgres is still started within the container but it's not used. Maybe in the future the scripts could detect NOMINATIM_DATABASE_DSN and not start Postgres if present.

Also note that when using an external Postgres it's up to you to make sure that the tokenizer is correctly configured (either use ICU or install the tokenizer on your Postgres separately).

@mausch how can this config work if I add mounted volume "nominatim-data" so I don't need to re-import every time I restart/rebuild the service? Question in my head is how to connect that mounted volume to external postgres? Or is it simply nominatim can detect if it's external postgres then no need to re-import? Sorry for many confusions I have here.

@mausch
Copy link
Contributor

mausch commented Nov 30, 2023

@inovramadani Not sure why you're mounting a volume or what you plan to put in it... I recommend posting on https://github.com/mediagis/nominatim-docker/discussions with more details.

@asdfklgash
Copy link

If you want to connect to a hostname use PGHOST instead of PGHOSTADDR!

@MatsGej
Copy link

MatsGej commented Jan 19, 2024

I tried moving the posgresql database used to Azure Postgresql, but ran into this errror:
pg_restore: error: could not execute query: ERROR: extension "hstore" is not allow-listed for "azure_pg_admin" users in Azure Database for PostgreSQL

The error message you're seeing indicates that the hstore extension, which is required by the public.location_property_osmline table, is not allowed for azure_pg_admin users in Azure Database for PostgreSQL.

Azure Database for PostgreSQL has a list of allowed extensions, and hstore is not included in this list for azure_pg_admin users. This is a restriction imposed by Azure for security and stability reasons.

I want to move the database to Azure Postgresql. Do yo have any suggestions?

The way I did this was to use pgadmin to create a backup of the generated postgresql database in docker and then I tried importing it into the Azure Postgresql Flexible database. I was then going to change the docker version so I could move it to Azure Container Apps. Maybe there are better ways? But I want to use the Azure Postgresql database since I have other stuff in it already! And I have to pay for it anyway.

@mtmail
Copy link
Contributor

mtmail commented Jan 19, 2024

This github issue was closed 18 months ago. Problems with Azure are outside the scope of Nominatim.

@aferrar
Copy link

aferrar commented Feb 21, 2024

I am following the steps outlined in this issue and am running into the following error. I can gladly create a new issue but I feel it is applicable to this thread. Please let me know if you'd like me to move it. Thank you

2024-02-20 17:26:47: Using project directory: /nominatim
2024-02-20 17:26:47: Warming database caches
Traceback (most recent call last):
File "/usr/local/bin/nominatim", line 12, in
exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module',
File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 225, in nominatim
return get_set_parser().run(**kwargs)
File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 121, in run
return args.command.run(args)
File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/admin.py", line 60, in run
return self._warm(args)
File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/admin.py", line 95, in _warm
api.reverse((random.uniform(-90, 90), random.uniform(-180, 180)),
File "/usr/local/lib/nominatim/lib-python/nominatim/api/core.py", line 610, in reverse
return self._loop.run_until_complete(self._async_api.reverse(coord, **params))
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/nominatim/lib-python/nominatim/api/core.py", line 201, in reverse
async with self.begin() as conn:
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/nominatim/lib-python/nominatim/api/core.py", line 140, in begin
await self.setup_database()
File "/usr/local/lib/nominatim/lib-python/nominatim/api/core.py", line 101, in setup_database
async with engine.begin() as conn:
File "/usr/lib/python3/dist-packages/sqlalchemy/ext/asyncio/base.py", line 60, in aenter
return await self.start(is_ctxmanager=True)
File "/usr/lib/python3/dist-packages/sqlalchemy/ext/asyncio/engine.py", line 609, in start
await self.conn.start(is_ctxmanager=is_ctxmanager)
File "/usr/lib/python3/dist-packages/sqlalchemy/ext/asyncio/engine.py", line 154, in start
await (greenlet_spawn(self.sync_engine.connect))
File "/usr/lib/python3/dist-packages/sqlalchemy/util/_concurrency_py3k.py", line 123, in greenlet_spawn
result = context.switch(*args, **kwargs)
File "/usr/lib/python3/dist-packages/sqlalchemy/future/engine.py", line 406, in connect
return super(Engine, self).connect()
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 3204, in connect
return self._connection_cls(self, close_with_result=close_with_result)
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 96, in init
else engine.raw_connection()
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 3283, in raw_connection
return self._wrap_pool_connect(self.pool.connect, _connection)
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 3250, in _wrap_pool_connect
return fn()
File "/usr/lib/python3/dist-packages/sqlalchemy/pool/base.py", line 310, in connect
return _ConnectionFairy._checkout(self)
File "/usr/lib/python3/dist-packages/sqlalchemy/pool/base.py", line 868, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/lib/python3/dist-packages/sqlalchemy/pool/base.py", line 476, in checkout
rec = pool.do_get()
File "/usr/lib/python3/dist-packages/sqlalchemy/pool/impl.py", line 145, in do_get
with util.safe_reraise():
File "/usr/lib/python3/dist-packages/sqlalchemy/util/langhelpers.py", line 70, in exit
compat.raise
(
File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 207, in raise

raise exception
File "/usr/lib/python3/dist-packages/sqlalchemy/pool/impl.py", line 143, in _do_get
return self._create_connection()
File "/usr/lib/python3/dist-packages/sqlalchemy/pool/base.py", line 256, in _create_connection
return _ConnectionRecord(self)
File "/usr/lib/python3/dist-packages/sqlalchemy/pool/base.py", line 371, in init
self.__connect()
File "/usr/lib/python3/dist-packages/sqlalchemy/pool/base.py", line 665, in connect
with util.safe_reraise():
File "/usr/lib/python3/dist-packages/sqlalchemy/util/langhelpers.py", line 70, in exit
compat.raise
(
File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 207, in raise

raise exception
File "/usr/lib/python3/dist-packages/sqlalchemy/pool/base.py", line 661, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/create.py", line 590, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 597, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/lib/python3/dist-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 758, in connect
await_only(self.asyncpg.connect(*arg, **kw)),
TypeError: connect() got an unexpected keyword argument 'hostaddr'
See https://nominatim.org/release-docs/latest/admin/Import/#wikipediawikidata-rankings

Checking indexing status ... OK
Checking that database indexes are complete ... OK
Checking that all database indexes are valid ... OK
Checking TIGER external data table. ... not applicable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants