Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Representations #2523

Closed
wants to merge 11 commits into from
Closed

Conversation

aljungberg
Copy link
Contributor

@aljungberg aljungberg commented Oct 21, 2022

Data Representations

Present and receive API fields with custom formatting rules.

For certain APIs, the default conversion to and from JSON performed by PostgreSQL may not be right. For example you might wish to represent dates in UNIX epoch timestamp format, fixed precision decimals as strings rather than JSON floats, colours as CSS hex strings, binary blobs as base64 data and so forth. Perhaps you even need different representations in different API endpoints.

Data Representations allow you to create custom, two-way convertible representations of your data on a per field level using standard PostgreSQL functions. Just create a domain, which we can view as a kind of type alias that does nothing by default, and then define casts to and from JSON. (Normally PostgreSQL ignores domain casts, but Data Representations detects those casts and applies them automatically as needed.)

Features:

  • Lets you define custom data types and attendant formatters and parsers.
  • Use them in your base tables to apply custom representations globally in all APIs using those tables.
  • Or create views with different representations for the same underlying columns.
  • Presents data: fields transparently formatted into JSON when using for example GET.
  • Receive data: automatically parses custom data types in incoming JSON payloads when using POST or PATCH.
  • Filters: parses query parameters sent as plain text if you also specify a TEXT->my custom type cast.
  • Works for data RETURNING operations, like if you request the full body of your patched resource with return=presentation.
  • Works for batch editing.
  • Completely opt in. If you don't want it, don't define any domain casts. (There's no reason to do so in ordinary usage since PostgreSQL doesn't support such casts.)
  • Fully integrated: unlike a naive implementation where you create a view with computed columns that formats the data the way you want it, with Data Representations you don't lose the ability to write to those fields. They are two-way.
  • High performant: since your filters are parsed to their native data type before being applied for filtering, you don't need a second index of your computed presentation value. Querying is as fast as querying the underlying true data type.

Example

PostgreSQL has a built-in formatter for dates:

curl "http://mypostgrest/todo?select=id,due_at&id=eq.1"
[{"id":1, "due_at":"2018-01-02T00:00:00+00:00"}]

Let's replace the +00:00 suffix with Z, an equivalent but shorter ISO 8601 date:

CREATE DOMAIN public.isodate AS timestamp with time zone;
CREATE OR REPLACE FUNCTION json(public.isodate) RETURNS json AS $$
    SELECT to_json(replace(to_json($1)#>>'{}', '+00:00', 'Z'));
$$ LANGUAGE SQL IMMUTABLE;
CREATE CAST (public.isodate AS json) WITH FUNCTION json(public.isodate) AS IMPLICIT;
ALTER TABLE todo ALTER COLUMN due_at TYPE public.isodate;

The underlying data is unchanged since our custom domain is just an alias for timestamp with time zone.

# ... after reloading PostgREST to refresh the schema cache ...
curl "http://mypostgrest/todo?select=id,due_at&id=eq.1"
[{"id":1, "due_at":"2018-01-02T00:00:00Z"}]

Let's look at some colours.

curl "http://mypostgrest/todo?select=id,label_color&id=eq.3"
{"id":3, "label_color":123456}

In this case we store our colour as an integer, so it's being formatted as an integer by PostgreSQL. Let's consider that an implementation detail we wish to hide from our API consumers.

CREATE DOMAIN public.color AS INTEGER CHECK (VALUE >= 0 AND VALUE <= 16777215);

CREATE OR REPLACE FUNCTION color(json) RETURNS public.color AS $$
    SELECT color($1 #>> '{}');
$$ LANGUAGE SQL IMMUTABLE;

CREATE OR REPLACE FUNCTION color(text) RETURNS public.color AS $$
    SELECT ('x' || lpad((CASE WHEN SUBSTRING($1::text, 1, 1) = '#' THEN SUBSTRING($1::text, 2) ELSE $1::text END), 8, '0'))::bit(32)::int;
$$ LANGUAGE SQL IMMUTABLE;

CREATE OR REPLACE FUNCTION json(public.color) RETURNS json AS $$
    SELECT
        CASE WHEN $1 IS NULL THEN to_json(''::text)
        ELSE to_json('#' || lpad(upper(to_hex($1)), 6, '0'))
        END;
$$ LANGUAGE SQL IMMUTABLE;

CREATE CAST (public.color AS json) WITH FUNCTION json(public.color) AS IMPLICIT;
CREATE CAST (json AS public.color) WITH FUNCTION color(json) AS IMPLICIT;
# ... after reloading PostgREST to refresh the schema cache ...
curl "http://mypostgrest/todo?select=id,label_color&id=eq.3"
{"id":3, "label_color":"#01E240"}

curl -X PATCH "http://mypostgrest/todo?select=id,label_color&id=eq.2"
     -H 'Content-Type: application/json'
     -H 'Accept: application/json'
     -d '{"label_color": "#221100"}'

curl "http://mypostgrest/todo?select=id,label_color&id=eq.2"
[{"id":2, "label_color": "#221100"}]

To the outside world, it looks just like if your column type is a string with hexadecimal values, while under the hood it's an integer which even comes with a built in check constraint on validity. Your data is clean, space efficient, yet presented and interpreted in a user friendly manner.

What about filtering? For that we need one last converter. Since query parameters are given as strings, let's make a text->color cast. We already have the function for it.

CREATE CAST (text AS public.color) WITH FUNCTION color(text) AS IMPLICIT;

Now we can filter:

curl "http://mypostgrest/todo?select=id,label_color&label_color=eq.#221100"
[{"id":2, "label_color": "#221100"}]

Under the hood the value is first converted to its underlying integer type before querying the table, which ensures maximum performance. Without Data Representations it would have been easy to end up with a situation where instead PostgresSQL had to format every row colour as a hexadecimal string and then compare, making index usage hard.

TODO

  • Docs. The above can be used as a starting point for docs.

This pull request implements #2310. I renamed the concept from "transformers" to "data representations" which seemed more intuitive.

Future Direction

Some things that could be implemented in the future:

  • Custom representations for CSV and binary content types.
  • Method to activate data representations without changing column types (when that isn't convenient).

@steve-chavez
Copy link
Member

steve-chavez commented Oct 21, 2022

Needs to be merged with main because of @steve-chavez 's recent refactoring.

Ah sorry about that, I don't have more big refactors planned. Rebasing on main would be great to check how's the test suite.

Looks like an interesting feature! 👀

Copy link
Member

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff. Looking at the higher level architecture, I see we are storing those parsers and formatters in dbTables right now. This will prevent us from doing the same thing for computed columns and/or RPCs, correct?

Computed columns could be solved by making them "first-class" columns in terms of the schema cache - i.e. parse those functions and add them as columns in the cache. This would also improve openapi output, where those functions lack right now. That's unrelated to this PR, though.

But with the current structure we could not support RPCs, I think. Would RPC support be useful? I think so. At least for the output part, but probably also to parse input for arguments. We should maybe think RPCs from the beginning, because that might affect the overall architecture.

src/PostgREST/Config.hs Outdated Show resolved Hide resolved
src/PostgREST/DbStructure.hs Outdated Show resolved Hide resolved
src/PostgREST/Error.hs Outdated Show resolved Hide resolved
src/PostgREST/Query/QueryBuilder.hs Outdated Show resolved Hide resolved
src/PostgREST/Query/QueryBuilder.hs Outdated Show resolved Hide resolved
@aljungberg
Copy link
Contributor Author

aljungberg commented Oct 24, 2022

This is great stuff. Looking at the higher level architecture, I see we are storing those parsers and formatters in dbTables right now. This will prevent us from doing the same thing for computed columns and/or RPCs, correct?

Thanks!

Since the result of allTables is a TableMap, the specifications will be loaded there initially even if we later also do RPC, I suppose. For RPC we'd need a separate query.

Computed columns could be solved by making them "first-class" columns in terms of the schema cache - i.e. parse those functions and add them as columns in the cache. This would also improve openapi output, where those functions lack right now. That's unrelated to this PR, though.

Oh yes, I didn't know computed columns don't go into the cache! My vote for what it's worth would be to consider them normal columns, perhaps just with a flag indicating they are read-only? (Although even that isn't certain since I guess you could make computed columns read-write with INSTEAD OF triggers in PostgreSQL. But I digress.)

But with the current structure we could not support RPCs, I think. Would RPC support be useful? I think so. At least for the output part, but probably also to parse input for arguments. We should maybe think RPCs from the beginning, because that might affect the overall architecture.

I think the need is slightly less pressing since with RPC you can already do whatever function calls you need to parse your inputs and format your outputs when you define your function. And if your function can't be changed you can define a second function to call the first with the appropriate layer of conversions.

So RPC data representations is more of a nice to have than something that enables entirely new use cases. That said, it would be intuitive and probably also convenient to have that support. And if we do, definitely it should be for input and output for the same reason, intuitiveness.

In order to add such support, my first instinct would perhaps be to split dbFieldRepresentations into two:

  1. a map of qualified identifier to column names and types
  2. a map of column types to converter(s)

Since we already know the types of a proc both in and out we could just look up each type in the second map to figure what functions to apply. For tables we'd continue to rely on the first map since we need that to build the json_to_recordset (unless we do something with TableMap instead as discussed elsewhere in the PR).

@wolfgangwalther
Copy link
Member

Since the result of allTables is a TableMap, the specifications will be loaded there initially even if we later also do RPC, I suppose. For RPC we'd need a separate query.

I think we should get all the formatters/parsers in a separate query. Right now we get them together with the column definitions. This has two drawbacks:

  • For a single type / cast, we possibly fetch this information a lot of types: For all columns that have this type.
  • Our haskell typing isn't 100% right now. We could, in theory, represent different columns with the same type with different parsers in haskell. But we don't want to allow that. So we should have a typing that represents this: As suggested in a different comment above: Storing the data type in the columns list - and then having a type->parser/formatter mapping would do that.

I think the need is slightly less pressing since with RPC you can already do whatever function calls you need to parse your inputs and format your outputs when you define your function. And if your function can't be changed you can define a second function to call the first with the appropriate layer of conversions.

I think it's very convenient to return table types (row types for existing tables) from RPCs - and then potentially use further embedding on the result. This would not be possible, if I wanted to return JSON from the RPC to make the transform myself. So having those formatters available for RPCs would be really helpful.

Since we already know the types of a proc both in and out we could just look up each type in the second map to figure what functions to apply. For tables we'd continue to rely on the first map since we need that to build the json_to_recordset (unless we do something with TableMap instead as discussed elsewhere in the PR).

Ah yes, that's basically what I suggested above, too.

@aljungberg
Copy link
Contributor Author

aljungberg commented Nov 25, 2022

This PR has been basically rewritten from scratch. My initial work happened to coincide with a big refactoring, and since there were points about the actual mechanism of the functionality a 'big rewrite' ended up making sense.

This PR depends on #2542.

  • Fully future compatible with RPC support. I haven't tested it but to make data representations work for RPC may only take a dozen lines of codes consisting of resolver bits.
  • All data representations are loaded with a single query into a single mapping in the schema cache.
  • The core is now a map of user defined type a -> type b converters.
  • This concept obviously will work fine with RPC.
  • But it will also fit conceptually very nicely with inserting CSV payload using csv->myfield converters and so forth.
  • The new CoercibleField type represents use-case specific references to named values.
  • Execution plans use CoercibleField (almost) everywhere instead of Field -- conceptually a Field is now user input yet to be sanitised and prepped for use, while CoercibleField is a part of a plan to reference some value by name with the attendant type and coercion information necessary for that reference.
  • Added support for data representations in ANY queries, e.g. label_color=in.(#000100,#01E240).
  • Support for views.
  • New unit tests covering large number of potential edge cases like data representations over joins, computed columns in views, updating through views, compound filters and many more.
  • Lots of internal documentation and comments.

@aljungberg aljungberg force-pushed the representations branch 2 times, most recently from aaf4618 to 00a84c9 Compare November 25, 2022 19:05
@aljungberg
Copy link
Contributor Author

aljungberg commented Nov 28, 2022

This PR has been rebased on the latest update to #2542.

The failing tests may be caused by this unexpected behaviour (tested on Postgres 9.6):

SELECT due_at AS "due_at" 
FROM json_to_recordset ((SELECT json_build_array('{"due_at":"2018-01-03T11:00:00+00"}'::json))) AS _ ("due_at" json);

In recent versions due_at is parsed as the JSON value "2018-01-03T11:00:00+00" (a single string as the root). But it seems like Postgres 9.6 somehow parses the JSON incorrectly and transmogrifies the "2018-..." bit internally into 2018- and then tries to parse that as JSON. So we get:

Query Error: error: invalid input syntax for type json

Indeed, this otherwise identical query does not cause an error (although it incorrectly gets due_at as the integer 2018):

SELECT due_at AS "due_at" 
FROM json_to_recordset ((SELECT json_build_array('{"due_at":"2018"}'::json))) AS _ ("due_at" json);

DB Fiddle. A quick look around suggests early versions of json_to_recordset had some very simple heuristics for extracting nested JSON (see the discussion here for instance). This is a bit of a blocker for data reps. We need an efficient way to extract keys from the main object without the value becoming corrupted, so we can apply our custom data rep parsers instead of the implicit parser.

Easy solution is to make data reps for payload parsing require Postgres 13 or above. Thoughts?

Edit: another option would be to read the nested values as text (confirmed this does not error out) and ask the user to provide a text parser instead of a JSON parser. This would be illogical (a text parser to parse JSON), technically incorrect (somejson::text != somejson), and a surprising difference between versions of Postgres. But it would "work" for some definition of working and the tests would pass.

@aljungberg
Copy link
Contributor Author

aljungberg commented Nov 28, 2022

So the error for Postgres 10 and above is actually something else, it seems like. One minor error message change and lack of support for arrays of domain types breaks the ANY support.

@aljungberg aljungberg force-pushed the representations branch 3 times, most recently from 3eef88f to c47bc5c Compare November 28, 2022 17:48
@aljungberg
Copy link
Contributor Author

aljungberg commented Nov 29, 2022

Summary of the last set of updates:

  • some fixes to improve backwards compatibility with older versions of Postgres
  • also simply disabled some failing tests with the intent to document that data representations require PostgreSQL 11 or newer (trying to use an older version will work in many cases but not all; it fails in safe if perhaps somewhat non-obvious ways)
  • added more unit tests and improved code comments
  • this PR now (very slightly) increases test coverage for the code-base as a whole

Most notably, I have resolved a hard-to-exercise edge case when using computed relations. I made a mistake when rewriting the PR so that data representations were no longer applied to fields read via a computed relation. This has now been fixed and given test coverage.

@steve-chavez
Copy link
Member

@aljungberg Could you rebase this on main 🙏 ? This will help me in reviewing as it will remove some commits that were already merged on #2542.

@steve-chavez
Copy link
Member

@aljungberg More of a high level question. Since you mention plans for binary types:

Some things that could be implemented in the future:
Custom representations for CSV and binary content types.

On #1582 (comment) we're thinking of using pg AGGREGATEs to output custom binary types. Do you see that feature conflicting with what you have in mind? Or would it play along? Ideally we'd maintain a cohesive feature set.

@wolfgangwalther Also, on #1582 (comment) you mentioned the possibility of using a custom function to parse input from the client - would data representations work better for that case? Or would both features be complementary?

@aljungberg
Copy link
Contributor Author

@aljungberg Could you rebase this on main 🙏 ? This will help me in reviewing as it will remove some commits that were already merged on #2542.

Sure, no problem. I'm pencilling this in for next week (other work permitting). I'll take a look at your aggregate question too. Thanks for picking this up!

aljungberg added a commit to Screenly/postgrest that referenced this pull request Jan 23, 2023
… fields.

See PR PostgREST#2523. Most notable code changes:

- Load data representation casts into schema cache.
- Data representations for reads, filters, inserts, updates, views, over joins.
- `CoercibleField` represents name references in queries where coercion may be needed.
- `ResolverContext` help facilitate field resolution during planning.
- Planner 'resolves' names in the API query and pairs them with any implicit conversions to be used in the query builder stage.
- Tests for all of the above.
@aljungberg
Copy link
Contributor Author

Here's the update that hopefully ties up all known loose ends, @steve-chavez. Summary:

  • Rebased on main
  • (Mostly) squashed the PR
  • Wrote a change log entry
  • There are some use-cases which I think demonstrate the benefit of this PR well (base64 binary fields, UNIX timestamps, JSON incompatible numbers). Since I'd like to use them in the documentation for the feature, I added unit tests for each so we don't have to worry about the docs disagreeing with reality
  • Also fixed the failing idempotence test

(The only remaining failure is that thing where the code coverage checker flags all data fields, even if they're in fact exercised by tests.)

@steve-chavez
Copy link
Member

steve-chavez commented Jan 26, 2023

Default PostgreSQL date formatting is not fully ISO8601 compliant since it doesn't use T for the separator.
curl "http://mypostgrest/todo?select=id,due_at&id=eq.1"
[{"id":1, "due_at":"2018-01-02 00:00:00+00""}]

@aljungberg Is the above true? I'm trying to reproduce it as:

create table qux(id bigint, due_at timestamptz);
insert into qux values (1, now());

curl 'localhost:3000/qux'
[{"id":1,"due_at":"2023-01-26T18:30:16.697152-05:00"}]

The T is already included. I haven't been able to get the whitespace instead of the T(tested on pg versions 9 til 15).

@steve-chavez
Copy link
Member

Just finished trying the features this brings, really cool!

So it seems "data representations" would be superior to computed columns right? Because we can insert/update on them and they don't require an extra index for search.

I guess computed columns would only make sense in cases where users can't/wan't to modify a table.


Will review the code next.

@steve-chavez
Copy link
Member

Also, just remembered a previous case where this was needed: #1597. Basically it required calling to_timestamp() before inserting to a timestamp.

The solution was RPC, but this seems better as it doesn't require repeating the INSERT statement.


It also seems useful for the PostGIS geometry type, which doesn't have a json -> geometry conversion. For inserting we resort to using its string representation as documented here: https://postgrest.org/en/stable/how-tos/working-with-postgresql-data-types.html#postgis

Which brings me to a question... could we extend this feature to also work for base types? It would not only be useful for geometry but also for bigint(which has a problem with JS). Or would that cause issues?

…nsformation.

- With the previous method, very long queries such as `ANY (ARRAY[test.color('000100'), test.color('CAFE12'), test.color('01E240'), ...` could be generated. Consider the case where the parser function name is 45 characters and there's a hundred literals. That's 4.5kB of SQL just for the function name alone!
- New version uses `unnest`: `ANY (SELECT test.color(unnest('{000100,CAFE12,01E240,...}'::text[]))` to produce a much shorter query.
- This is likely to be more performant and either way much more readable and debuggable in the logs.
@aljungberg
Copy link
Contributor Author

  • Updated to include latest main.

@aljungberg
Copy link
Contributor Author

aljungberg commented Mar 15, 2023

Reading more in #1582 I see I may inadvertently have intersected with your prior work in some regards. As you rightfully pointed out, the idea of using aggregates there is quite related.

I think data reps solve similar issues without requiring create aggregate privileges (which you can’t get in some cloud hosted versions of Postgres). You could select your source data, cast that result set into a data rep domain of an aggregate nature and implement it as regular postgres function, delivering binary data as the return value.

So your GET /lines example could be implemented as a VIEW that selects into a record set and casts that to twkb_lines, a custom domain over a test.lines set. Then combine that with a data rep transform, twkb_lines -> bytea. Haven’t tested it but it seems like it should work. Both approaches can exist at once. #1582 might have better performance while data reps require fewer privileges and maybe fewer functions (but you get a VIEW instead). Plus I like that you could also define twkb_lines -> json and twkb_lines -> xml etc and not have to change your view. (When we add more ways to activate data reps maybe the VIEW can go away too. I quite like that simplicity.)

@steve-chavez
Copy link
Member

Both approaches can exist at once

Yes. I'm close to finishing #1582(which stalled for years already). Let me finish that one, and then I'll also help with merging this PR - we'll also figure out how can the features play together.

@steve-chavez
Copy link
Member

FYI, Wolfgang is proposing an idea with DOMAINs for #1582.

@aljungberg
Copy link
Contributor Author

FYI, Wolfgang is proposing an idea with DOMAINs for #1582.

Thanks for the heads up. That actually would work really well, posted my thoughts over there.

steve-chavez pushed a commit to steve-chavez/postgrest that referenced this pull request Jun 23, 2023
… fields.

See PR PostgREST#2523. Most notable code changes:

- Load data representation casts into schema cache.
- Data representations for reads, filters, inserts, updates, views, over joins.
- `CoercibleField` represents name references in queries where coercion may be needed.
- `ResolverContext` help facilitate field resolution during planning.
- Planner 'resolves' names in the API query and pairs them with any implicit conversions to be used in the query builder stage.
- Tests for all of the above.
- More consistent naming (TypedX -> CoercibleX).

New: unit tests for more data representation use cases; helpful as examples as well.

New: update CHANGELOG with data representations feature description.

Fixed failing idempotence test.

New: replace date formatter test with one that does something.

Fixup: inadvertent CHANGELOG change after rebase.

Cleanup: `tfName` -> `cfName` and related.

Document what IRType means.

Formatting.

New: use a subquery to interpret `IN` literals requiring data rep transformation.

- With the previous method, very long queries such as `ANY (ARRAY[test.color('000100'), test.color('CAFE12'), test.color('01E240'), ...` could be generated. Consider the case where the parser function name is 45 characters and there's a hundred literals. That's 4.5kB of SQL just for the function name alone!
- New version uses `unnest`: `ANY (SELECT test.color(unnest('{000100,CAFE12,01E240,...}'::text[]))` to produce a much shorter query.
- This is likely to be more performant and either way much more readable and debuggable in the logs.
steve-chavez pushed a commit to steve-chavez/postgrest that referenced this pull request Jun 23, 2023
… fields.

See PR PostgREST#2523. Most notable code changes:

- Load data representation casts into schema cache.
- Data representations for reads, filters, inserts, updates, views, over joins.
- `CoercibleField` represents name references in queries where coercion may be needed.
- `ResolverContext` help facilitate field resolution during planning.
- Planner 'resolves' names in the API query and pairs them with any implicit conversions to be used in the query builder stage.
- Tests for all of the above.
- More consistent naming (TypedX -> CoercibleX).

New: unit tests for more data representation use cases; helpful as examples as well.

New: update CHANGELOG with data representations feature description.

Fixed failing idempotence test.

New: replace date formatter test with one that does something.

Fixup: inadvertent CHANGELOG change after rebase.

Cleanup: `tfName` -> `cfName` and related.

Document what IRType means.

Formatting.

New: use a subquery to interpret `IN` literals requiring data rep transformation.

- With the previous method, very long queries such as `ANY (ARRAY[test.color('000100'), test.color('CAFE12'), test.color('01E240'), ...` could be generated. Consider the case where the parser function name is 45 characters and there's a hundred literals. That's 4.5kB of SQL just for the function name alone!
- New version uses `unnest`: `ANY (SELECT test.color(unnest('{000100,CAFE12,01E240,...}'::text[]))` to produce a much shorter query.
- This is likely to be more performant and either way much more readable and debuggable in the logs.
@imageck
Copy link

imageck commented Jun 27, 2023

I see that #2542 was merged. I don't know which version of PostgREST Supabase deploys but I hope the bug I'm encountering has been fixed.

Since I've added a counter column of type uint, which is just a domain with DEFAULT 0, NOT NULL & CHECK (VALUE >= 0) constraints, I can't insert a new row from the client without setting it to a non-null value explicitly. All requests are rejected with this error message: domain uint does not allow null values.

After a bit of digging, I've found that PostgREST constructs a complex query from incoming requests. Here is one such query that was logged:

WITH pgrst_source AS (WITH pgrst_payload AS (SELECT $1 AS json_data), pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload) INSERT INTO \"public\".\"collections\"(\"cat\", \"desc\", \"name\") SELECT \"cat\", \"desc\", \"name\" FROM json_populate_recordset (null::\"public\".\"collections\", (SELECT val FROM pgrst_body)) _  RETURNING \"public\".\"collections\".*) SELECT '' AS total_result_set, pg_catalog.count(_postgrest_t) AS page_total, array[]::text[] AS header, coalesce((json_agg(_postgrest_t)->0)::text, 'null') AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting('response.status', true), '') AS response_status FROM (SELECT \"collections\".* FROM \"pgrst_source\" AS \"collections\"    ) _postgrest_t

json_populate_recordset() looked suspect to me immediately. I ran this query:

SELECT "cat", "desc", "name"
  FROM json_populate_recordset(null::"public"."collections", '[{"name":"test book","desc":"","cat":"books"}]');

…and of course I got the same error. json_populate_recordset() fills the missing key–value pairs with nulls, right?

Anyway, since it was replaced with json_to_recordset() and there were other related changes, is it safe to say the bug has been fixed? Are domain constraints respected (especially the default value) in the latest version?

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Jun 27, 2023

@imageck

…and of course I got the same error. json_populate_recordset() fills the missing key–value pairs with nulls, right?

And did you get a better result for json_to_recordset()? I don't think so, because they behave the same.

is it safe to say the bug has been fixed? Are domain constraints respected (especially the default value) in the latest version?

No, I don't think so. This does not seem to be connected to this MR either. While we do a lot of stuff with domains in the context of data representations, considering the DEFAULT value for a domain has neither been discussed nor implemented anywhere. In fact, I didn't even know about a default value for domains until now.

Could you please open a separate issue with this with a minimal, reproducible example etc.?

@imageck
Copy link

imageck commented Jun 27, 2023

And did you get a better result for json_to_recordset()? I don't think so, because they behave the same.

I'm coming back with the results and nope, json_to_recordset() did not do any better.

No, I don't think so. This does not seem to be connected to this MR either. While we do a lot of stuff with domains in the context of data representations, considering the DEFAULT value for a domain has neither been discussed nor implemented anywhere. In fact, I didn't even know about a default value for domains until now.
Could you please open a separate issue with this with a minimal, reproducible example etc.?

Yes, my apologies. I just wasn't sure to open a new issue. It should be fairly easy to reproduce. Let me come up with an example.

Thanks.

steve-chavez pushed a commit that referenced this pull request Jun 29, 2023
… fields.

See PR #2523. Most notable code changes:

- Load data representation casts into schema cache.
- Data representations for reads, filters, inserts, updates, views, over joins.
- `CoercibleField` represents name references in queries where coercion may be needed.
- `ResolverContext` help facilitate field resolution during planning.
- Planner 'resolves' names in the API query and pairs them with any implicit conversions to be used in the query builder stage.
- Tests for all of the above.
- More consistent naming (TypedX -> CoercibleX).

New: unit tests for more data representation use cases; helpful as examples as well.

New: update CHANGELOG with data representations feature description.

Fixed failing idempotence test.

New: replace date formatter test with one that does something.

Fixup: inadvertent CHANGELOG change after rebase.

Cleanup: `tfName` -> `cfName` and related.

Document what IRType means.

Formatting.

New: use a subquery to interpret `IN` literals requiring data rep transformation.

- With the previous method, very long queries such as `ANY (ARRAY[test.color('000100'), test.color('CAFE12'), test.color('01E240'), ...` could be generated. Consider the case where the parser function name is 45 characters and there's a hundred literals. That's 4.5kB of SQL just for the function name alone!
- New version uses `unnest`: `ANY (SELECT test.color(unnest('{000100,CAFE12,01E240,...}'::text[]))` to produce a much shorter query.
- This is likely to be more performant and either way much more readable and debuggable in the logs.
@steve-chavez
Copy link
Member

🚀 Merged on #2839

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants