Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: bump duckdb-rs version #174

Merged
merged 1 commit into from
Nov 19, 2024
Merged

Conversation

DeoLeung
Copy link
Contributor

@DeoLeung DeoLeung commented Nov 8, 2024

Ticket(s) Closed

What

support reading and writing geoparquet

Why

with duckdb_spatial>=v1.1.0 geometry column in parquet will be read/write as geometry instead of blob

How

no explicit change of sql is needed

Tests

a read/write parquet test should be added later

@CLAassistant
Copy link

CLAassistant commented Nov 8, 2024

CLA assistant check
All committers have signed the CLA.

@DeoLeung DeoLeung changed the title bump duckdb-rs version chore: bump duckdb-rs version Nov 8, 2024
@philippemnoel
Copy link
Collaborator

This PR would need documentation so people know how to use it

@DeoLeung
Copy link
Contributor Author

This PR would need documentation so people know how to use it

just fix the packages and now i can build it and test it locally

it seems there's no geometry type cell.rs, currently it still read the geometry column as text :(

@philippemnoel
Copy link
Collaborator

This PR would need documentation so people know how to use it

just fix the packages and now i can build it and test it locally

it seems there's no geometry type cell.rs, currently it still read the geometry column as text :(

Yes, that’s because duckdb-rs is on duckdb 1.0.0, not 1.1.0 which it seems like it needs. We would need you to submit a PR to our fork to duckdb-rs to upgrade it as well there, then update the commit here.

@DeoLeung
Copy link
Contributor Author

This PR would need documentation so people know how to use it

just fix the packages and now i can build it and test it locally
it seems there's no geometry type cell.rs, currently it still read the geometry column as text :(

Yes, that’s because duckdb-rs is on duckdb 1.0.0, not 1.1.0 which it seems like it needs. We would need you to submit a PR to our fork to duckdb-rs to upgrade it as well there, then update the commit here.

from https://github.com/paradedb/duckdb-rs/tree/main/crates/libduckdb-sys it seems it's already on v1.1.1

@philippemnoel
Copy link
Collaborator

This PR would need documentation so people know how to use it

just fix the packages and now i can build it and test it locally
it seems there's no geometry type cell.rs, currently it still read the geometry column as text :(

Yes, that’s because duckdb-rs is on duckdb 1.0.0, not 1.1.0 which it seems like it needs. We would need you to submit a PR to our fork to duckdb-rs to upgrade it as well there, then update the commit here.

from https://github.com/paradedb/duckdb-rs/tree/main/crates/libduckdb-sys it seems it's already on v1.1.1

Ah yes! You are right! Oops, I had no idea. Then, I'm not sure... Are you sure this is supported? Do we need to do anything to enable it?

@philippemnoel
Copy link
Collaborator

Could you please sign the CLA and I think this will be good to go :)

@DeoLeung
Copy link
Contributor Author

Could you please sign the CLA and I think this will be good to go :)

done :)

for the geoparquet reference, i think it need some extra data type mapping (especially postgis geometry)

duckdb/duckdb#12503
duckdb/pg_duckdb#141

@philippemnoel
Copy link
Collaborator

Could you please sign the CLA and I think this will be good to go :)

done :)

for the geoparquet reference, i think it need some extra data type mapping (especially postgis geometry)

duckdb/duckdb#12503 duckdb/pg_duckdb#141

We ship with Postgis support and the spatial extension is enabled in pg_analytics. It was added a while back. Does that give you enough?

@DeoLeung
Copy link
Contributor Author

Could you please sign the CLA and I think this will be good to go :)

done :)
for the geoparquet reference, i think it need some extra data type mapping (especially postgis geometry)
duckdb/duckdb#12503 duckdb/pg_duckdb#141

We ship with Postgis support and the spatial extension is enabled in pg_analytics. It was added a while back. Does that give you enough?

not enough at the moment.
we want to store the analytic data into s3 in parquet format, most of them are gis data, with newly duckdb, we could dump them as geoparquet, and read it back directly as geometry data type

load spatial;
create table gis_test as select st_makepoint(127.0, 30.0) as geometry;
copy  gis_test to '~/Downloads/gis_test.parquet' (format parquet);
create table gis2 as select * from '~/Downloads/gis_test.parquet';
describe table gis2;
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varcharvarcharvarcharvarcharvarcharvarchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ geometry    │ GEOMETRY    │ YES     │         │         │         │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

but in pg_analytics, it still read it as text column

CREATE FOREIGN TABLE gis2 () server parquet_server options(files '~/Downloads/gis_test.parquet');
select * from information_schema.columns where table_schema = 'public' and table_name = 'gis2' and column_name = 'geometry';
select column_name, data_type from information_schema.columns where table_schema = 'public' and table_name = 'gis2' and column_name = 'geometry';
 column_name | data_type 
-------------+-----------
 geometry    | text
(1 row)

@philippemnoel
Copy link
Collaborator

Could you please sign the CLA and I think this will be good to go :)

done :)
for the geoparquet reference, i think it need some extra data type mapping (especially postgis geometry)
duckdb/duckdb#12503 duckdb/pg_duckdb#141

We ship with Postgis support and the spatial extension is enabled in pg_analytics. It was added a while back. Does that give you enough?

not enough at the moment. we want to store the analytic data into s3 in parquet format, most of them are gis data, with newly duckdb, we could dump them as geoparquet, and read it back directly as geometry data type

load spatial;
create table gis_test as select st_makepoint(127.0, 30.0) as geometry;
copy  gis_test to '~/Downloads/gis_test.parquet' (format parquet);
create table gis2 as select * from '~/Downloads/gis_test.parquet';
describe table gis2;
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varcharvarcharvarcharvarcharvarcharvarchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ geometry    │ GEOMETRY    │ YES     │         │         │         │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

but in pg_analytics, it still read it as text column

CREATE FOREIGN TABLE gis2 () server parquet_server options(files '~/Downloads/gis_test.parquet');
select * from information_schema.columns where table_schema = 'public' and table_name = 'gis2' and column_name = 'geometry';
select column_name, data_type from information_schema.columns where table_schema = 'public' and table_name = 'gis2' and column_name = 'geometry';
 column_name | data_type 
-------------+-----------
 geometry    | text
(1 row)

Can you confirm the duckDB version you are using separately? And can you confirm the GIS extension is loaded in pg_analytics?

@DeoLeung
Copy link
Contributor Author

confirm duckdb version 1.1.1 and 1.1.3 works with geoparquet

confirm postgis is loaded in postgres
but I'm not sure how to check if duckdb spatial is enabled in pg_analytics :(

tried select duckdb_execute($$load spatial$$); with no luck

@philippemnoel
Copy link
Collaborator

confirm duckdb version 1.1.1 and 1.1.3 works with geoparquet

confirm postgis is loaded in postgres but I'm not sure how to check if duckdb spatial is enabled in pg_analytics :(

tried select duckdb_execute($$load spatial$$); with no luck

I think you would need to duckdb_execute this:

SELECT extension_name, installed, description
FROM duckdb_extensions();

https://duckdb.org/docs/extensions/overview.html

@philippemnoel philippemnoel merged commit 9578517 into paradedb:dev Nov 19, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Geoparquet support
3 participants