Support Struct columns #5

adriangb · 2023-02-28T18:30:39Z

What should we do with these? Custom types in Postgres are messy because we need to know (1) the name to emit DDL and (2) the OIDs of the inner types, which is messy when dealing with nested types like arrays. Maybe we should copy these over as JSONB?

adriangb · 2023-03-09T01:34:18Z

So I think the thing to do here is ask the user to get the OIDs for us:

needed_metadata = analyze_metadata(schema)

metada = await some_util_funcion_per_db_driver(needed_metada, some_connection)

encoder = Encoder(schema, metadata)

The key is that the rust package remain sans-io and with no dependency on a particular database driver (which would make it much harder to be language agnostic).

adriangb · 2023-05-08T11:50:16Z

Making progress here. We can now specify the encoder to use for each column. We just need the utility function.

jpfeuffer · 2025-02-14T15:07:02Z

Hi @adriangb ! Thank you for this helpful little lib.

I was investigating the problem a little and I am not sure if you really need the type names or their OIDs for an export into Postgres' binary format (anymore). I think restrictions on matches between expected server-side OID and OIDs in the pg.bin dump were loosened around v14: https://www.postgresql.org/message-id/E1jxxnr-0003Mn-S8%40gemulon.postgresql.org
Since then, only mismatches in predefined system-level OIDs (<10000) error out (which is fine for our purposes since you registered them for all equivalent pyarrow types already). Same for array types.

Therefore, I believe (as a first step or fallback) you could just put dummy OIDs (>10000, maybe better >16384) in your Encoder(Builder) for a pa.Struct or the pa.List Encoder when it encounters a pa.Struct as inner_type.

For pg versions <14 or as safe default behaviour, one could indeed require a dict[pa.struct, tuple[OID, OID]] (or Map equiv. in rust) that is used during the traversal of the column type hierarchy (where the tuple describes the type OID and corresponding array_type OID).
Note that by looking at the source code, you actually do not need any OID for the first level of a record/composite type column (only the first children and every struct below).
I don't think we would ever need the type names as in the DDL except if we want the some_util_funcion_per_db_driver function to work with that to make it a bit easier.

What do you think about working on this together? My rust is unfortunately not very good (we will need some recursion for the struct encoders) but I could do testing and an example for the analyze_metadata and util_functions.

adriangb · 2025-02-14T16:37:22Z

That's really interesting news! I would be happy to support only Postgres > 14.

The recursion shouldn't be too hard. Do we need arbitrary recursion, or just one level?

jpfeuffer · 2025-02-14T17:53:27Z

I think ideally full recursion but it only recurses whenever you encounter a struct or a list (which you kind of handled already).
Regarding total size that needs to be put in the beginning of a binary struct encoding for an entry: it should be possible to keep a total byte count while recursing down but you could also do it like duckdb and save the start position when starting to insert in the buffer, do a start-end when the insert for an entry is done, go back to the start position and fill those bytes. Not sure what fits the current style best.

adriangb · 2025-02-14T18:05:48Z

We can do full recursion. IIRC I used the DuckDB way for Lists and other variable length types - I would do the same thing.

adriangb mentioned this issue Mar 19, 2023

JSONB support #8

Closed

adriangb added the good first issue Good for newcomers label May 8, 2023

adriangb removed the good first issue Good for newcomers label Jun 15, 2023

adriangb mentioned this issue Jun 16, 2023

How should I handle arrays of structs, #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Struct columns #5

Support Struct columns #5

adriangb commented Feb 28, 2023

adriangb commented Mar 9, 2023 •

edited

Loading

adriangb commented May 8, 2023

jpfeuffer commented Feb 14, 2025

adriangb commented Feb 14, 2025

jpfeuffer commented Feb 14, 2025

adriangb commented Feb 14, 2025

Support Struct columns #5

Support Struct columns #5

Comments

adriangb commented Feb 28, 2023

adriangb commented Mar 9, 2023 • edited Loading

adriangb commented May 8, 2023

jpfeuffer commented Feb 14, 2025

adriangb commented Feb 14, 2025

jpfeuffer commented Feb 14, 2025

adriangb commented Feb 14, 2025

adriangb commented Mar 9, 2023 •

edited

Loading