support PG array dimensionality #411

alexdemeo · 2023-12-22T23:28:12Z

Add array support to postgres reader/converter with dimensionality read from pg_attribute.attndims. This is grabbed by joining on information_schema.columns.column_name = pg_attribute.attname

This a followup to #410 which lays a tiny bit of groundwork for #264 (comment)

criccomini · 2023-12-23T19:44:44Z

recap/converters/postgresql.py

+        Build a list type with `ndims` dimensions containing nullable `base_type` as the innermost value type.
+        """
+        if ndims == 0:
+            return UnionType(types=[NullType(), base_type])


I'm curious about this one. It seems right, but I'm not 100% sure. As I read it, there are a few things:

DbapiConverter handles root-level NULLABLE fields (https://github.com/recap-build/recap/blob/main/recap/converters/dbapi.py#L15-L16)

This code here handles NULLABLE items in a PG ARRAY field.

I think this is the right behavior. But I'm curious: are PG arrays always allowed NULLs in their dimensional values? I couldn't find good docs on this. I haven't tested it out.

I did some testing and digging, and afaict the answer is yes- the innermost value can always be null. Enforcing non-nulls requires adding some sort of validation to CHECK against https://stackoverflow.com/a/59421233. Which seems like a pretty challenging rabbit hole of digging through information_schema.check_constraints

criccomini · 2023-12-23T19:49:03Z

recap/converters/postgresql.py

@@ -8,32 +8,22 @@
    FloatType,
    IntType,
    ListType,
-    ProxyType,


Niiiice. Took me a sec to grok why we didn't need this anymore. Fully walking the n_dimensions means we don't need self-references. Awesome.

One question/nuance here: the PG dimensions are just a suggestion.

The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.

https://www.postgresql.org/docs/current/arrays.html

So the question is, do we want to have the Recap reflect the DB's data or its schema? My implementation (with ProxyType) reflected the data. Yours changes it to reflect the schema. Perhaps we want it configurable one as the default? WDYT?

I like to think the schema is the beacon of truth for what the user intends for the column. If users are leveraging the column differently than schema's representation, they should fix the schema. But I could see past mistakes leading to a situation where this isn't true, which would then lead to recap constructing a false narrative about the data. I think making it configurable makes sense. Maybe default to ProxyType since that's the safer assumption? Would we want to add config params to the PostgresqlConverter constructor?

Ya, can you add a param to the init to config. Defaulting to proxy is safer, as you say.

criccomini · 2023-12-23T19:49:58Z

recap/clients/postgresql.py

+                    information_schema.columns.*,
+                    pg_attribute.attndims
+                FROM information_schema.columns
+                JOIN pg_attribute on information_schema.columns.column_name = pg_attribute.attname


Ah nice catch, I'll add in the other joins

criccomini · 2023-12-23T19:55:57Z

tests/integration/clients/test_postgresql.py

-                test_bit_array BIT(8)[]
+                test_bit_array BIT(8)[],
+                test_int_array_2d INTEGER[][],
+                test_text_array_3d TEXT[][][]


Do you mind adding a NOT NULL array as well? I realized we haven't tested that.

criccomini · 2023-12-25T17:58:28Z

This looks great. I think we're ready to merge after you add the ProxyType config stuff!

…e how the reader interprets the number of dimensions in an array

alexdemeo · 2023-12-29T17:42:36Z

recap/converters/postgresql.py

-    def __init__(self, namespace: str = DEFAULT_NAMESPACE) -> None:
+    def __init__(
+        self,
+        ignore_array_dimensionality: bool = True,


I wonder if the negation support_array_dimensionality = False or include_array_dimensions = False would be more intuitive for the user

My vote is for 'enforce_array_dimensions'

criccomini · 2023-12-30T18:47:31Z

Sorry for delayed review--things have been busy on the home front. 🎄

I'll try and get to this over the next few days.

criccomini · 2024-01-03T19:26:31Z

@alexdemeo 0.9.6 is up on pypi with this change! Thanks!

support PG array dimensionality

b6847a7

alexdemeo marked this pull request as ready for review December 22, 2023 23:31

criccomini mentioned this pull request Dec 23, 2023

Update docs for FilesystemClient and strict #409

Open

criccomini reviewed Dec 23, 2023

View reviewed changes

alexdemeo added 2 commits December 24, 2023 14:31

add in other necessary joins

dbf2a91

add not-null array test

e9e13e8

alexdemeo added 3 commits December 25, 2023 18:42

add psql converter init param ignore_array_dimensionality to configur…

183ee64

…e how the reader interprets the number of dimensions in an array

style and typing

09dc949

remove superfluous print() I missed

2cc2753

alexdemeo requested a review from criccomini December 27, 2023 16:43

alexdemeo commented Dec 29, 2023

View reviewed changes

rename ignore_array_dimensionality -> enforce_array_dimensions

1c97a1e

criccomini merged commit 16754f0 into gabledata:main Jan 3, 2024
3 checks passed

alexdemeo deleted the ad/pg-support-dimensionality branch January 3, 2024 20:58

This was referenced Jan 13, 2024

Add ARRAY support to SnowflakeReader #263

Open

Postgres converter support for enums #415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support PG array dimensionality #411

support PG array dimensionality #411

alexdemeo commented Dec 22, 2023 •

edited

Loading

criccomini Dec 23, 2023

alexdemeo Dec 24, 2023

criccomini Dec 23, 2023

alexdemeo Dec 24, 2023

criccomini Dec 25, 2023

criccomini Dec 23, 2023

alexdemeo Dec 24, 2023

criccomini Dec 23, 2023

criccomini commented Dec 25, 2023

alexdemeo Dec 29, 2023

criccomini Dec 30, 2023

criccomini commented Dec 30, 2023

criccomini commented Jan 3, 2024

support PG array dimensionality #411

support PG array dimensionality #411

Conversation

alexdemeo commented Dec 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

criccomini commented Dec 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

criccomini commented Dec 30, 2023

criccomini commented Jan 3, 2024

alexdemeo commented Dec 22, 2023 •

edited

Loading