-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent overlong tag strings #26
Conversation
This is a problem I encountered myself while testing the implementation of #7 and I have seen it being a problem for a few users before. Most of these users use a translation file to avoid merging, which results in invalid OSM files but it may be acceptable if you understand what you're doing. I agree we need a solution for this, but I wasn't really sure what road to take. There are several solutions:
For the last solution I was considering providing a number of predefined translation files in the ogr2osm-translations project for some general use-cases, limiting the length could be one of them. The downside is that the translation files are difficult to use, it requires users to be programmers. So maybe I favour the second solution, and I'd add it in TranslationBase.merge_tags to make sure any new DataWriter outputs the correct value. In that case the performance is to be verified though. |
Thats also what I did but the result not only becomes invalid, it becomes broken (closed lines will no longer be closed). Thats why I would also go for option 2
If I can support you, feel free to ask but you know the code much better than me so... When implementing this I also suggest considering #28 (which can also be considered a breaking change). |
It is indeed cumbersome. I was hoping to gain some performance once the string is too long, there would be no need to reconstruct the whole tags dictionary then. But on the other hand there is a lot of overhead while testing for the length. I'll have to dig somewhat deeper and test a few things out first, it will take some time. Since it does not alter the functionality compared to how it works now in your PR, it is better to do this later. Meanwhile I added a new parameter I'll think about the location of the PLACEHOLDER constant. On one hand it is fine, but I don't really like the import of |
hey, thanks for taking this on. These commits look great.
An implementation after the full tag has been constructed (as it is in this PR) makes it a bit easier to cut the string at an arbitrary location (instead of on a per-value basis). I tried to update the documentation. Feel free to comment/adapt.
I added that functionality (think it was missing before?).
I collected some anecdotal evidence with a Geopackage of roughly 5mio elements and I saw no significant time difference. I think its only a single computation against all tags
I did, it works:
I had the same feeling. Could this be a global constant? Another thing is the duplicate code between the PBF and the XML-Writer. |
Thanks for your modifications. I realized too late I didn't implement the unlimited length case and the same is true for the issues with PbfDataWriter, I was clearly too tired to continue. |
LGTM, thanks for the friendly collaboration! |
When merging tags of duplicate elements, the tag string may exceed the maximum allowed number of 255 characters in OSM. I guess this could also happen, if the original source has more than 255 character attributes?
This PR prevents such tags by replacing overlong tags with
...
. I'm not sure if the static vars (PLACEHOLDER
andMAX_TAG_LENGTH
are stored at a reasonable location. Feel free to request any changes!This PR additionally add info on how to run tests. I have run the test and they succeed.
This PR originates from this tool, where I ran into the overlong string issue here.
PS: Very nice tool you (all) created!