Skip to content

Commit

Permalink
Always set encoding to UTF-8 for OSM data
Browse files Browse the repository at this point in the history
  • Loading branch information
brendan-ward committed Aug 19, 2023
1 parent 03e0e77 commit d4630f3
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/source/known_issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,8 @@ df = read_dataframe(path, INTERLEAVED_READING=True)

We recommend the following to sidestep performance issues:

- always download remote OSM data sources to local files before attempting
- download remote OSM data sources to local files before attempting
to read
- the `use_arrow=True` option may speed up reading from OSM files
- if possible, use a different tool such as `ogr2ogr` to translate the OSM
data source into a more performant format for reading by layer, such as GPKG
6 changes: 6 additions & 0 deletions pyogrio/_io.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -438,13 +438,19 @@ cdef detect_encoding(OGRDataSourceH ogr_dataset, OGRLayerH ogr_layer):
-------
str or None
"""

if OGR_L_TestCapability(ogr_layer, OLCStringsAsUTF8):
return 'UTF-8'

driver = get_driver(ogr_dataset)
if driver == 'ESRI Shapefile':
return 'ISO-8859-1'

if driver == "OSM":
# always set OSM data to UTF-8
# per https://help.openstreetmap.org/questions/2172/what-encoding-does-openstreetmap-use
return "UTF-8"

return None


Expand Down

0 comments on commit d4630f3

Please sign in to comment.