Null filtering on Embedded Resources bug #2800

Etwenn · 2023-05-24T12:35:56Z

Environment

PostgreSQL version: 14
PostgREST version: 11.0.1
Operating system: Windows

Description of issue

Trying to use new feature : Null filtering on Embedded Resources

Table & Data :

Table1 :

row	FieldA	Table2(FieldB) -> foreign key to Table2
1	valueA1	valueB1
2	valueA2	valueB2

Table2

row	FieldB	FieldC	FieldD
1	valueB1	valueC1	valueD1
2	valueB2	ValueC2	NULL

doing :
GET /table1?select=*,table2(*)&table2=not.is.null

Result :

[
    {
        "FieldA": "valueA1",
        "Table2": {
            "FieldB": "valueB1",
            "FieldC": "valueC1",
            "FieldD": "valueD1"
        }
    }
]

expected :

[
    {
        "FieldA": "valueA1",
        "Table2": {
            "FieldB": "valueB1",
            "FieldC": "valueC1",
            "FieldD": "valueD1"
        }
    },
    {
        "FieldA": "valueA2",
        "Table2": {
            "FieldB": "valueB2",
            "FieldC": "valueC2",
            "FieldD": NULL
        }
    }
]

I'm not getting the second row because FieldD is null

it seems to be a problem from the postgresql's request,
here the one i got through the log : (simplified)

select * table1
left join lateral (
	select * from table2
	where id = table1.table2_id ) as "table1_table2_id" on true
where "table1_table2_id" is not null

Any idea how i could get a workaround ?

any idea how i could bypass this problem ?

The text was updated successfully, but these errors were encountered:

laurenceisla · 2023-05-24T17:49:24Z

Problem

It looks like a bug. Here's a reproducible example:

create table api.table2 (
  field_b text primary key,
  field_c text,
  field_d text
);

create table api.table1 (
  field_a text primary key ,
  field_b text references api.table2(field_b)
);

insert into api.table2(field_b, field_c, field_d)
VALUES ('b1', 'c1', 'd1'),
       ('b2', 'c2', null);

insert into api.table1(field_a, field_b)
VALUES ('a1', 'b1'),
       ('a2', 'b2');

 GET /table1?select=*,table2(*)&table2=not.is.null

From your log:

select * table1
left join lateral (
	select * from table2
	where id = table1.table2_id ) as "table1_table2_id" on true
where "table1_table2_id" is not null

Looks like the WHERE is not filtering an empty row, but every column of the table, so it's the same as doing:

where table1_table2_id.* is not null
-- or
where table1_table2_id.field_b is not null and table1_table2_id.field_c is not null and table1_table2_id.field_d is not null

That's why it does not show rows that have null values in any of those columns.

Also, this issue only happens with To-One relationships. When doing a To-Many embedding, it works as expected.

Workaround

This is equivalent to an inner join, so, for now, keep using !inner as a workaround:

 GET /table1?select=*,table2!inner(*)

Solution

This part of the query needs to change to fix the issue:

postgrest/src/PostgREST/Query/QueryBuilder.hs

Line 81 in 0092500

    
           ) aggAlias $ if relJoinType == Just JTInner then SQL.sql aggAlias <> " IS NOT NULL" else "TRUE")

Maybe adding the primary key in the filter or verifying if the row is null:

where "table1_table2_id.field_b" is not null
-- or
where row_to_json(table1_table2_id.*) is not null

Etwenn · 2023-05-25T08:11:44Z

@laurenceisla Thanks for the answer,
actually i wanted to use the not.is.null with the OR to perform some tricky request between different field from related table or not
I came to the problem i described in #2563
And you figured out the exact same conclusion i've got 😞

#2563 (comment)

wolfgangwalther · 2023-05-25T09:55:28Z

This is how PostgreSQL behaves for IS NULL / IS NOT NULL checks on row-valued expressions. The docs:

If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL and IS NOT NULL do not always return inverse results for row-valued expressions; in particular, a row-valued expression that contains both null and non-null fields will return false for both tests.

I'm not sure whether it makes sense for PostgREST to try to change how PostgreSQL behaves here. The query string filter &table2=not.is.null is exactly the equivalent of table2 IS NOT NULL - and that's how PostgreSQL behaves.

However, I see that the PostgREST docs are describing it like this:

For example, doing actors=not.is.null returns the same result as actors!inner(*)

If that's the goal, then actors=not.is.null must be rewritten to NOT (actors IS NULL), which is the inverse of actors IS NULL and not the same as actors IS NOT NULL. Or alternatively, from the PostgreSQL docs above:

In some cases, it may be preferable to write row IS DISTINCT FROM NULL or row IS NOT DISTINCT FROM NULL, which will simply check whether the overall row value is null without any additional tests on the row fields.

In any case the difference (or similarity, depending on which approach is taken) between actors=not.is.null and actors IS NOT NULL should be pointed out in the docs.

steve-chavez · 2023-09-14T20:56:46Z

For example, doing actors=not.is.null returns the same result as actors!inner(*)
If that's the goal, then actors=not.is.null must be rewritten to NOT (actors IS NULL)

Yeah, I think we should do it like that. It's more intuitive for the API client and not much work for us too.

laurenceisla · 2023-09-15T00:18:30Z

After doing some tests, I checked that the NOT (table IS NULL) does not work as expected either. Using the previous example #2800 (comment), this query works correctly after the change:

curl "http://localhost:3000/table1?select=*,table2(*)&table2=not.is.null

But if we select only one column that has a null value, that is:

curl "http://localhost:3000/table1?select=*,table2(field_c)&table2=not.is.null

Then the result will be the same as table IS NOT NULL. The only option that gives us the expected result seems to be IS DISTINCT FROM NULL instead, as per the docs:

In some cases, it may be preferable to write row IS DISTINCT FROM NULL or row IS NOT DISTINCT FROM NULL, which will simply check whether the overall row value is null without any additional tests on the row fields.

~~This would be in conflict with the ?table=not.is.null syntax... maybe needs to be changed to ?table=isdistinct.null?~~
Edit:Scratch that, we could document the above and still keep the idea of expecting a "not null entire row" for the user.

laurenceisla added the bug label May 24, 2023

laurenceisla mentioned this issue May 25, 2023

Related and/or conditions #2563

Open

laurenceisla mentioned this issue Sep 14, 2023

fix: Error when requesting count with Null embed filters #2930

Merged

laurenceisla mentioned this issue Sep 15, 2023

fix: Bug when Null Filtering on embedded resources #2951

Merged

laurenceisla closed this as completed in #2951 Sep 15, 2023

laurenceisla mentioned this issue Sep 26, 2023

Top-level filtering of one-to-many embedded resources with is.null and not.is.null silently malfunctions #2965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Null filtering on Embedded Resources bug #2800

Null filtering on Embedded Resources bug #2800

Etwenn commented May 24, 2023

laurenceisla commented May 24, 2023 •

edited

Loading

Etwenn commented May 25, 2023

wolfgangwalther commented May 25, 2023

steve-chavez commented Sep 14, 2023

laurenceisla commented Sep 15, 2023 •

edited

Loading

Null filtering on Embedded Resources bug #2800

Null filtering on Embedded Resources bug #2800

Comments

Etwenn commented May 24, 2023

Environment

Description of issue

laurenceisla commented May 24, 2023 • edited Loading

Problem

Workaround

Solution

Etwenn commented May 25, 2023

wolfgangwalther commented May 25, 2023

steve-chavez commented Sep 14, 2023

laurenceisla commented Sep 15, 2023 • edited Loading

laurenceisla commented May 24, 2023 •

edited

Loading

laurenceisla commented Sep 15, 2023 •

edited

Loading