Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refuse to import data when there is an osm2pgsql_properties table #2085

Closed
wants to merge 1 commit into from

Conversation

joto
Copy link
Collaborator

@joto joto commented Sep 24, 2023

This checks stops users from clobbering an existing database.

@rustprooflabs
Copy link
Contributor

Will this require manually deleting the table or will a --force option (or similar) be offered?

@joto
Copy link
Collaborator Author

joto commented Sep 24, 2023

@rustprooflabs As the moment there is no --force option or something like that but you have to either drop the database and create a new one or delete the osm2pgsql_properties table. But that's certainly something we could think about. I am worried though that people will just get into the habit of using --force and then it didn't buy us much. What do you think?

I realize that this is quite a change compared to earlier behaviour, so maybe it is the wrong move here. I created this because I have been bitten twice already with this recently. The reason this is a bit more of a problem now than it was before is the osm2pgsql_replication table. It is created first thing in the database, even before, for instance, the input file is opened. So it is easy to clobber that table with an invalid osm2pgsql command line.

@mboeringa
Copy link

@joto, I am not sure if this is the right move. This actually kind of reverts an earlier fix.

Yes, with the increasing complexity of osm2pgsql, it will inevitably be easier to clobber things up. But I am afraid with all the new stuff, including the new "Themepark" I saw announced, this is virtually unavoidable. We're way past the point of osm2pgsql being a tool you can just "hit and run" (if it ever was with amount of command line options).

(Desirable) Flexibility comes at a price...

If you do intend to change this, I second @rustprooflabs 's request for some --force option. I also think some strong recommendation to properly script stuff to avoid mayhem might be in order... Personally, I can afford to loose stuff because I run a test instance only, but that likely will not be the case for the majority of users...

Copy link
Collaborator

@pnorman pnorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncertain on this, but if we do it, we need at the least a command line option to remove the table to do a new import, and possibly a --force option.

A lot of my transform development involves reimporting over an existing DB.

@rustprooflabs
Copy link
Contributor

I am worried though that people will just get into the habit of using --force and then it didn't buy us much.

I'm not too worried about other people's habits. 😄 Seriously though, I think if there's a good error message (Error: Cannot import because of existing data. Run with --force to override) people will default to the "safe" approach especially if that's what the docs recommend. It's easy enough to re-run and add the additional switch when force is needed.

I created this because I have been bitten twice already with this recently. ... So it is easy to clobber that table with an invalid osm2pgsql command line.

I ran into this exact problem in rustprooflabs/pgosm-flex#313, I do like the default-safe option. I think this type of protection is a good idea. However, if it's decided not to add this type of check I am also okay. I run essentially all of my OSM imports via PgOSM Flex and have already added the protection I want there. If it's added here, I just need to update to pass that new option along to osm2pgsql appropriately.

@joto
Copy link
Collaborator Author

joto commented Oct 31, 2023

So it looks like people would be okay with this if we have a command line option to get the old behaviour, i.e. do not do this check. But --force would be the wrong name for that. We are not "forcing" anything.

What should we name this option? --ignore-existing? --overwrite-dabase? I am not really happy with any of these names, because we are only checking the properties table and ignore all the other stuff that's in the database. So its not that we clean up everything if we don't use this option.

I am not sure the behaviour of osm2pgsql is sufficiently clear (either with or without that option) so maybe we need a different solution?

@rustprooflabs
Copy link
Contributor

Why is --force wrong? If the default behavior changes to refuse to continue if the table exists, it seems the non-default option is to force it to proceed.

If --force is a no-go, I prefer --ignore-existing over --overwrite-database. I don't like --overwrite-database at all since my databases always have OSM + something else.

@joto
Copy link
Collaborator Author

joto commented Nov 2, 2023

Why is --force wrong?

It is just much too unspecific. What is being "forced" here? I know having a -f or --force option is kind of a standard with Unix commands and that's why we have been calling it that in this discussion, but I believe that many users will not understand it.

@lonvia
Copy link
Collaborator

lonvia commented Nov 7, 2023

Not sure what to do with this one.

I very much agree that the current bahaviour of osm2pgsql is dangerous and it should rather be conservative about overwriting existing data. But we've had this behaviour for a long time and I'm fully aware that many users rely on it. The properties table does change the game a little bit: you cannot do two imports with different prefixes on the same database anymore without breaking something. That was not the case before. The recommended work-around is to use schemas but, again, we do have some historic baggage here.

If we do change the behaviour, then I'd rather do it without introducing an additional command line. We already have far too many of them and you can simply overwrite the check by running a psql -d whatever -c 'DROP TABLE osm2pgsql_properties. Then the user at least knows what they are getting into.

This checks stops users from clobbering an existing database.
@rustprooflabs
Copy link
Contributor

The properties table does change the game a little bit: you cannot do two imports with different prefixes on the same database anymore without breaking something. That was not the case before. The recommended work-around is to use schemas but, again, we do have some historic baggage here.

I guess I'm not aware of the baggage with schemas. How does this interact with flex styles that save tables to schemas? A good portion of what I've found useful lately is loading a variety of
different data, with different styles, to different schemas. The styles I use set schema dynamically.

There are already limitations around multiple sources within one DB with replication (e.g. osmcode/pyosmium#214), I'd prefer not to see multiple sources further limited. If I have to drop the properties tables to load into multiple schemas I can, but that seems to dilute the purpose of that table in this scenario.

Would it be possible to define which schema the properties table is saved in? This seems like it would still allow supporting multiple imports to a single database w/out losing the addition of safety against accidentally doing bad things. Another possible approach could be changing the table from a tall table (2 columns, many rows) into wide structure. Either via dedicated columns or a JSON/JSONB column? This would allow adding a schema_name column to allow support of that.

I'd rather do it without introducing an additional command line.

Understandable, I'm okay with your suggestion to manually delete the table when needed.

@joto
Copy link
Collaborator Author

joto commented Dec 8, 2023

(Ups. What I wrote here before was wrong.)

You can use the --schema or the --middle-schema option to set where the properties table goes. So everything that was possible before is still possible.

@joto
Copy link
Collaborator Author

joto commented Feb 12, 2024

Looks like we are not 100% sure to go forward with this and the PR is out of date by now.

I am closing this PR maybe we'll revisit this issue later on.

@joto joto closed this Feb 12, 2024
@joto joto deleted the refuse-import branch February 12, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants