Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with encoding in windows batch #8

Closed
Meibes opened this issue Jun 10, 2021 · 4 comments
Closed

Issues with encoding in windows batch #8

Meibes opened this issue Jun 10, 2021 · 4 comments

Comments

@Meibes
Copy link

Meibes commented Jun 10, 2021

Hi,

I was trying to use ogr2osm in a windows batch but had a lot of encoding problems, because the batch always created ANSI-encoded files, but my workflow needs utf-8 encoded files. I managed to solve my issue by changing the following line:
self.f = open(self.filename, 'w', buffering = -1)
to
self.f = open(self.filename, 'w', buffering = -1, encoding="utf-8")

there is already a parameter called "encoding" but it seems it is only used for the source file, could we extend this "encoding" to be used in the destination file as well? or could we introduce another parameter for that? what are your thoughts? or do you have a tip how I can force the windows batch to output utf-8 without changing ogr2osm?

thanks for this awesome tool =)

@roelderickx
Copy link
Owner

Thanks for your bug report. This issue looks like a duplicate of pnorman#15 but your solution is different and you have found a testcase where the current method has issues.

Some observations:

  • The encoding parameter of ogr2osm only specifies the encoding of the input file, not the encoding of the output file
  • The documentation of the python open() function specifies that the default encoding is used when the encoding parameter is omitted or None. This is platform dependent, I can't test it on Windows but at least for Linux it is UTF-8.
  • Although a clear suggestion is present, there is no strict obligation for an osm file to be encoded in UTF-8 on the OSM wiki page
  • According to the W3C recommendation for XML the expected encoding is UTF-8 if neither a byte order mark nor an encoding is specified, as is currently the case for ogr2osm

Given the last observation ogr2osm is supposed to output UTF-8 at the moment, eventually translating from the input file encoding if necessary. To obtain consistent behaviour across different operating systems it is as such necessary to pass encoding='utf-8' as you suggested. I would also explicitly specify the encoding in the header then, ie <?xml version="1.0" encoding="utf-8"?>.

I can confirm the testcases still pass on Linux with your suggested modification. Can you verify if the testcases pass under Windows as well?

@Meibes
Copy link
Author

Meibes commented Jun 11, 2021

Thanks for the fast answer!

  • As far as I know both Linux and Mac use UTF-8 as their default encoding and Windows uses ANSI / Windows-1252 (at least in the german version of windows).
  • It seems some OSM-tools do write UTF-8 in the header, here is an example of Overpass:

<?xml version="1.0" encoding="UTF-8"?> <osm version="0.6" generator="Overpass API 0.7.56.9 76e5016d"> <note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note> <meta osm_base="2021-06-09T08:10:43Z"/>

After making these changes everything runs smooth in the batch.

@roelderickx
Copy link
Owner

Ok. I am not sure if the cram tests can be run as is under Windows, but can you try to convert at least test/shapefiles/japanese.shp and confirm if the formatted result matches test/japanese.xml?

In the test script the output is formatted using xmllint before comparing:

ogr2osm --encoding shift_jis --gis-order -f test/shapefiles/japanese.shp
xmllint --format japanese.osm > japanese.xml

roelderickx added a commit that referenced this issue Jun 12, 2021
@roelderickx
Copy link
Owner

Meanwhile I managed to test the modification in Windows, the test is conclusive. The proposed changes have been merged into master.
Thanks @Meibes for your investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants