-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpq convert output of Overture parquet files cannot be read by GDAL #102
Comments
I also tried the |
Thanks for the report, @geographika. It may be that gdal cannot read Parquet files with v2 data pages. I can try writing an older version. |
Thanks for the reply @tschaub and for the tool! |
@geographika - This may or may not be the same issue you are experiencing, but it looks to me like OGR/GDAL cannot read a field that is a list of structs. I'll ticket this on the GDAL repo to get more info, but it may be that GDAL doesn't handle all of Parquet's logical field types. I see that OSGeo/gdal#8262 is related. I ticketed this issue as OSGeo/gdal#8606 to get some more info. |
Thanks @tschaub for following up on this. Just to note GDAL reads the raw Overture parquet files fine - as a table with records, but once converted to GeoParquet the file "loads" but is empty. Parquet: File converted using gpq to Geoparquet: |
@geographika - I agree that there is something odd going on. But I'm tempted to believe that it has to do with OGR trying to ignore those logical field types that it cannot handle. Specifically, it does not currently read logical lists where the elements are groups. The I downloaded one of the building Parquet files and named it ogr2ogr no-unhandled.parquet input.parquet After this, I can create a new, valid GeoParquet file with this: gpq convert no-unhandled.parquet no-unhandled-geo.parquet And then I can verify that OGR can read it with this: ogrinfo no-unhandled-geo.parquet -al |
OSGeo/gdal#8608 adds support to OGR to read columns that have a list of structs (like the Overture data). I've created apache/arrow#38503 in hopes of tracking down the remaining incompatibility. |
@geographika - The v0.21.0 release has a fix that should address the issue converting Overture data ( With one of the above Overture files (named # gpq version
0.21.0
# gpq convert input.parquet input-geo.parquet
# gpq describe input-geo.parquet
gpq describe input-geo.parquet
╭──────────────────────┬─────────┬─────────────────────────────────┬────────────┬─────────────┬──────────┬────────────────┬────────┬────────╮
│ COLUMN │ TYPE │ ANNOTATION │ REPETITION │ COMPRESSION │ ENCODING │ GEOMETRY TYPES │ BOUNDS │ DETAIL │
├──────────────────────┼─────────┼─────────────────────────────────┼────────────┼─────────────┼──────────┼────────────────┼────────┼────────┤
│ categories │ │ group │ 0..1 │ │ │ │ │ │
│ level │ int32 │ int(bitwidth=32, issigned=true) │ 0..1 │ zstd │ │ │ │ │
│ socials │ │ list │ 0..1 │ │ │ │ │ │
│ subType │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ numFloors │ int32 │ int(bitwidth=32, issigned=true) │ 0..1 │ zstd │ │ │ │ │
│ entityId │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ class │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ sourceTags │ │ map │ 0..1 │ │ │ │ │ │
│ localityType │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ emails │ │ list │ 0..1 │ │ │ │ │ │
│ drivingSide │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ adminLevel │ int32 │ int(bitwidth=32, issigned=true) │ 0..1 │ zstd │ │ │ │ │
│ road │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ isoCountryCodeAlpha2 │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ isoSubCountryCode │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ updateTime │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ wikidata │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ confidence │ double │ │ 0..1 │ zstd │ │ │ │ │
│ defaultLanguage │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ brand │ │ group │ 0..1 │ │ │ │ │ │
│ addresses │ │ list │ 0..1 │ │ │ │ │ │
│ names │ │ group │ 0..1 │ │ │ │ │ │
│ isIntermittent │ boolean │ │ 0..1 │ zstd │ │ │ │ │
│ connectors │ │ list │ 0..1 │ │ │ │ │ │
│ surface │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ version │ int32 │ int(bitwidth=32, issigned=true) │ 0..1 │ zstd │ │ │ │ │
│ phones │ │ list │ 0..1 │ │ │ │ │ │
│ id │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ geometry │ binary │ │ 0..1 │ zstd │ WKB │ │ │ │
│ context │ binary │ string │ 0..1 │ zstd │ │ │ │ │
│ height │ double │ │ 0..1 │ zstd │ │ │ │ │
│ maritime │ boolean │ │ 0..1 │ zstd │ │ │ │ │
│ sources │ │ list │ 0..1 │ │ │ │ │ │
│ websites │ │ list │ 0..1 │ │ │ │ │ │
│ isSalt │ boolean │ │ 0..1 │ zstd │ │ │ │ │
│ bbox │ │ group │ 1 │ │ │ │ │ │
├──────────────────────┼─────────┴─────────────────────────────────┴────────────┴─────────────┴──────────┴────────────────┴────────┴────────┤
│ Rows │ 815104 │
│ Row Groups │ 1 │
│ GeoParquet Version │ 1.0.0 │
╰──────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
# ogrinfo input-geo.parquet -al
Warning 1: Field brand.names.common of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.official of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.alternate of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.short of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field addresses of unhandled type list<element: struct<freeform: string, locality: string, postCode: string, region: string, country: string>> ignored
Warning 1: Field names.common of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.official of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.alternate of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.short of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field sources of unhandled type list<element: struct<property: string, dataset: string, recordId: string, confidence: double>> ignored
INFO: Open of `input-geo.parquet'
using driver `Parquet' successful.
Layer name: input-geo
Geometry: Unknown (any)
Feature Count: 815104
Extent: (-179.760039, -62.216330) - (179.962804, 72.784394)
Layer SRS WKT:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Geometry Column = geometry
categories.main: String (0.0)
categories.alternate: StringList (0.0)
level: Integer (0.0)
socials: StringList (0.0)
subType: String (0.0)
numFloors: Integer (0.0)
entityId: String (0.0)
class: String (0.0)
sourceTags: String(JSON) (0.0)
localityType: String (0.0)
emails: StringList (0.0)
drivingSide: String (0.0)
adminLevel: Integer (0.0)
road: String (0.0)
isoCountryCodeAlpha2: String (0.0)
isoSubCountryCode: String (0.0)
updateTime: String (0.0)
wikidata: String (0.0)
confidence: Real (0.0)
defaultLanguage: String (0.0)
brand.wikidata: String (0.0)
isIntermittent: Integer(Boolean) (0.0)
connectors: StringList (0.0)
surface: String (0.0)
version: Integer (0.0)
phones: StringList (0.0)
id: String (0.0)
context: String (0.0)
height: Real (0.0)
maritime: Integer(Boolean) (0.0)
websites: StringList (0.0)
isSalt: Integer(Boolean) (0.0)
bbox.minx: Real (0.0)
bbox.maxx: Real (0.0)
bbox.miny: Real (0.0)
bbox.maxy: Real (0.0)
OGRFeature(input-geo):0
categories.main (String) = (null)
categories.alternate (StringList) = (null)
level (Integer) = (null)
socials (StringList) = (null)
subType (String) = (null)
numFloors (Integer) = (null)
entityId (String) = (null)
class (String) = (null)
sourceTags (String(JSON)) = (null)
localityType (String) = (null)
emails (StringList) = (null)
drivingSide (String) = (null)
adminLevel (Integer) = (null)
road (String) = (null)
isoCountryCodeAlpha2 (String) = (null)
isoSubCountryCode (String) = (null)
updateTime (String) = 2020-03-20T18:08:51.000Z
wikidata (String) = (null)
confidence (Real) = (null)
defaultLanguage (String) = (null)
brand.wikidata (String) = (null)
isIntermittent (Integer(Boolean)) = (null)
connectors (StringList) = (null)
surface (String) = (null)
version (Integer) = 0
phones (StringList) = (null)
id (String) = w783118772@1
context (String) = (null)
height (Real) = (null)
maritime (Integer(Boolean)) = (null)
websites (StringList) = (null)
isSalt (Integer(Boolean)) = (null)
bbox.minx (Real) = 56.6205337
bbox.maxx (Real) = 56.6207643
bbox.miny (Real) = 54.3153349
bbox.maxy (Real) = 54.3154768
POLYGON ((56.6205337 54.3154585,56.6205677 54.3153349,56.6207643 54.3153532,56.6207304 54.3154768,56.6205337 54.3154585))
# ... etc. It looks like after OSGeo/gdal#8608 is released, those warnings will go away and the additional columns will be read as well. |
@tschaub - many thanks for following this up and releasing the new version - much appreciated. $env:PATH += ";D:\Tools\gpq-windows-amd64."
gpq version
# 0.21.0
gpq convert D:\Data\type=administrativeBoundary\part-00018-87dd7d19-acc8-4d4f-a5ba-20b407a79638.c000.zstd.parquet D:\Data\test.geo.parquet --from=parquet --to=geoparquet
gpq validate D:\Data\test.geo.parquet
# Summary: Passed 20 checks.
ogrinfo D:\Data\test.geo.parquet -al
#INFO: Open of `/data/overture/test.geo.parquet'
# using driver `Parquet' successful.
#
#Layer name: test.geo
#Geometry: Unknown (any)
#Feature Count: 13455 |
I was testing the Overture maps data and realised it is only available in parquet and not geoparquet format. As I understand it this is a user case for gpq as mentioned in #57
The tools runs fine and seems to produce output, but I cannot read this using GDAL. Apologies if this is user error or should be a GDAL issue instead - please close if this is the case.
Full steps to recreate below (note I was using gpq on a Windows machine, and testing the output on both Windows and Linux.
Download data:
Run conversion:
QGIS opens the file but the attribute table is empty. Testing with
ogrinfo
:Trying to read the data gives the likely cause of the issue:
ERROR 1: ReadNext() failed: Malformed levels. min: 2 max: 2 out of range. Max Level: 1
.Testing with the GDAL validate script from here
The text was updated successfully, but these errors were encountered: