Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixup_shp_columnnames silently fails with windows-1258 encoding #6394

Closed
mtnorthcott opened this issue Sep 1, 2020 · 1 comment · Fixed by #6438
Closed

fixup_shp_columnnames silently fails with windows-1258 encoding #6394

mtnorthcott opened this issue Sep 1, 2020 · 1 comment · Fixed by #6438
Assignees
Labels
minor A low priority issue which might affect only some users and /or not the main functionality
Milestone

Comments

@mtnorthcott
Copy link
Contributor

Expected Behavior

fixup_shp_columnnames() renames shapefile column names containing windows-1258 characters. geonode.tests.integration.GeoNodeMapTest.test_layer_zip_upload_non_utf8 passes on the premise that the column names have been altered.

Actual Behavior

Travis CI (GDAL 1.11)

An exception is thrown when executing inLayerDefn.GetFieldDefn(i).GetName() on line 1158

Traceback (most recent call last):
  File "/home/travis/build/GeoNode/geonode/geonode/utils.py", line 1158, in fixup_shp_columnnames
    field_name = inLayerDefn.GetFieldDefn(i).GetName()
  File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/osgeo/ogr.py", line 3475, in GetName
    return _ogr.FieldDefn_GetName(self, *args)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 2: invalid start byte

Ubuntu 18.04 (GDAL 2.2.x)

The call to inLayerDefn.GetFieldDefn(i).GetName() succeeds and the function proceeds to the next block but the call to inDataSource.ExecuteSQL(qry) on line 1203 causes a segmentation fault and no further tests are run.

Ubuntu 18.04 w/ Ubuntu GIS PPA (GDAL 2.4.x)

The call to inLayerDefn.GetFieldDefn(i).GetName() succeeds and the function proceeds to the next block but the call to inDataSource.ExecuteSQL(qry) on line 1203 causes a TypeError which reads

Traceback (most recent call last):
  File "test.py", line 15, in <module>
    dataset.ExecuteSQL(qry)
  File "/usr/local/lib/python3.7/dist-packages/osgeo/ogr.py", line 1093, in ExecuteSQL
    return _ogr.DataSource_ExecuteSQL(self, *args, **kwargs)
TypeError: in method 'DataSource_ExecuteSQL', argument 2 of type 'char const *'

Diagnosis

  • Travis CI passes on GDAL 1.11 because inLayerDefn.GetFieldDefn(i).GetName() is contained within a try/except block and, on exception thrown, the function returns the tuple (True, None, None). The test passes despite not having renamed any column names.
  • The call to ch.decode("utf-8", "surrogateescape") on line 1173 always throws an exception as ch is a Python str type, not a bytes object. The except block is always executed because of this so has_ch is True and the loop exits after the first iteration.
  • The test fails with GDAL 2.2/2.4 as the subsequent query is executed with invalid characters.
  • When using GDAL 2.2.x, the segmentation fault causes the test and all subsequent tests to fail.
  • When using GDAL 2.4.x, the TypeError is caught but rethrown so there is no return value and the test fails.
  • In all cases with windows-1258 encoding, no columns are renamed.

Proposed Fix

I have devised a form of fix to get my GNIP-75 PR running under Travis CI, as this involves an upgrade to bionic and therefore GDAL 2.4.x. However, the fix is still relevant for any installation methods that use GDAL 2.4.x.

The call to ch.decode("utf-8", "surrogateescape") on line 1173 always throws an exception as ch is a Python str type, not a bytes object. The except block is always executed because of this so has_ch is True and the loop exits after the first iteration.

See 86a7214#diff-aadb2492556667858a9abd91a23ef330R1170

When using GDAL 2.4.x, the TypeError is caught but rethrown so there is no return value and the test fails.

This can be restored to identical functionality as Travis CI by returning the same tuple in an except TypeError: block. See 86a7214#diff-aadb2492556667858a9abd91a23ef330R1196

In all cases with windows-1258 encoding, no columns are renamed.

Currently, I do not have a solution for this underlying problem. Proposals welcome here.

Steps to Reproduce the Problem

  1. Start GeoNode with docker-compose or paver on the master branch.
  2. Execute ./manage.py test geonode.tests.integration.GeoNodeMapTest.test_layer_zip_upload_non_utf8 locally or within the django4geonode Docker container
  3. Observe test results. These vary with GDAL version (gdal-config --version to fetch version)

Alternatively, execute these calls on the ming_female_1.shp shapefile (one from the test) within a Docker container so that you can vary the GDAL version.

Specifications

  • GeoNode version: 3.0dev on master
  • Installation method (manual, GeoNode Docker, SPCGeoNode Docker): manual, Docker
  • Platform: Ubuntu 18.04 LTS 64-bit
  • Additional details:
@t-book t-book added this to the 3.1 milestone Sep 1, 2020
@t-book t-book added the minor A low priority issue which might affect only some users and /or not the main functionality label Sep 1, 2020
@srtonz
Copy link

srtonz commented Sep 3, 2020

@mtnorthcott it'd be interesting to log the value of qry here https://github.com/GeoNode/geonode/blob/master/geonode/utils.py#L1202 before the error is thrown. Wondering if there's an encoding issue with the query passed to ogr here.

There may also be other ways to rename a shapefile column, see for example https://gis.stackexchange.com/a/326864

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor A low priority issue which might affect only some users and /or not the main functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants