Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null filtering on Embedded Resources bug #2800

Closed
Etwenn opened this issue May 24, 2023 · 5 comments · Fixed by #2951
Closed

Null filtering on Embedded Resources bug #2800

Etwenn opened this issue May 24, 2023 · 5 comments · Fixed by #2951
Labels

Comments

@Etwenn
Copy link

Etwenn commented May 24, 2023

Environment

  • PostgreSQL version: 14
  • PostgREST version: 11.0.1
  • Operating system: Windows

Description of issue

Trying to use new feature : Null filtering on Embedded Resources

Table & Data :

Table1 :

row FieldA Table2(FieldB) -> foreign key to Table2
1 valueA1 valueB1
2 valueA2 valueB2

Table2

row FieldB FieldC FieldD
1 valueB1 valueC1 valueD1
2 valueB2 ValueC2 NULL

doing :
GET /table1?select=*,table2(*)&table2=not.is.null

Result :

[
    {
        "FieldA": "valueA1",
        "Table2": {
            "FieldB": "valueB1",
            "FieldC": "valueC1",
            "FieldD": "valueD1"
        }
    }
]

expected :

[
    {
        "FieldA": "valueA1",
        "Table2": {
            "FieldB": "valueB1",
            "FieldC": "valueC1",
            "FieldD": "valueD1"
        }
    },
    {
        "FieldA": "valueA2",
        "Table2": {
            "FieldB": "valueB2",
            "FieldC": "valueC2",
            "FieldD": NULL
        }
    }
]

I'm not getting the second row because FieldD is null

it seems to be a problem from the postgresql's request,
here the one i got through the log : (simplified)

select * table1
left join lateral (
	select * from table2
	where id = table1.table2_id ) as "table1_table2_id" on true
where "table1_table2_id" is not null

Any idea how i could get a workaround ?

any idea how i could bypass this problem ?

@laurenceisla
Copy link
Member

laurenceisla commented May 24, 2023

Problem

It looks like a bug. Here's a reproducible example:

create table api.table2 (
  field_b text primary key,
  field_c text,
  field_d text
);

create table api.table1 (
  field_a text primary key ,
  field_b text references api.table2(field_b)
);

insert into api.table2(field_b, field_c, field_d)
VALUES ('b1', 'c1', 'd1'),
       ('b2', 'c2', null);

insert into api.table1(field_a, field_b)
VALUES ('a1', 'b1'),
       ('a2', 'b2');
 GET /table1?select=*,table2(*)&table2=not.is.null

From your log:

select * table1
left join lateral (
	select * from table2
	where id = table1.table2_id ) as "table1_table2_id" on true
where "table1_table2_id" is not null

Looks like the WHERE is not filtering an empty row, but every column of the table, so it's the same as doing:

where table1_table2_id.* is not null
-- or
where table1_table2_id.field_b is not null and table1_table2_id.field_c is not null and table1_table2_id.field_d is not null

That's why it does not show rows that have null values in any of those columns.

Also, this issue only happens with To-One relationships. When doing a To-Many embedding, it works as expected.

Workaround

This is equivalent to an inner join, so, for now, keep using !inner as a workaround:

 GET /table1?select=*,table2!inner(*)

Solution

This part of the query needs to change to fix the issue:

) aggAlias $ if relJoinType == Just JTInner then SQL.sql aggAlias <> " IS NOT NULL" else "TRUE")

Maybe adding the primary key in the filter or verifying if the row is null:

where "table1_table2_id.field_b" is not null
-- or
where row_to_json(table1_table2_id.*) is not null

@Etwenn
Copy link
Author

Etwenn commented May 25, 2023

@laurenceisla Thanks for the answer,
actually i wanted to use the not.is.null with the OR to perform some tricky request between different field from related table or not
I came to the problem i described in #2563
And you figured out the exact same conclusion i've got 😞

#2563 (comment)

@wolfgangwalther
Copy link
Member

This is how PostgreSQL behaves for IS NULL / IS NOT NULL checks on row-valued expressions. The docs:

If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL and IS NOT NULL do not always return inverse results for row-valued expressions; in particular, a row-valued expression that contains both null and non-null fields will return false for both tests.

I'm not sure whether it makes sense for PostgREST to try to change how PostgreSQL behaves here. The query string filter &table2=not.is.null is exactly the equivalent of table2 IS NOT NULL - and that's how PostgreSQL behaves.

However, I see that the PostgREST docs are describing it like this:

For example, doing actors=not.is.null returns the same result as actors!inner(*)

If that's the goal, then actors=not.is.null must be rewritten to NOT (actors IS NULL), which is the inverse of actors IS NULL and not the same as actors IS NOT NULL. Or alternatively, from the PostgreSQL docs above:

In some cases, it may be preferable to write row IS DISTINCT FROM NULL or row IS NOT DISTINCT FROM NULL, which will simply check whether the overall row value is null without any additional tests on the row fields.

In any case the difference (or similarity, depending on which approach is taken) between actors=not.is.null and actors IS NOT NULL should be pointed out in the docs.

@steve-chavez
Copy link
Member

For example, doing actors=not.is.null returns the same result as actors!inner(*)
If that's the goal, then actors=not.is.null must be rewritten to NOT (actors IS NULL)

Yeah, I think we should do it like that. It's more intuitive for the API client and not much work for us too.

@laurenceisla
Copy link
Member

laurenceisla commented Sep 15, 2023

After doing some tests, I checked that the NOT (table IS NULL) does not work as expected either. Using the previous example #2800 (comment), this query works correctly after the change:

curl "http://localhost:3000/table1?select=*,table2(*)&table2=not.is.null

But if we select only one column that has a null value, that is:

curl "http://localhost:3000/table1?select=*,table2(field_c)&table2=not.is.null

Then the result will be the same as table IS NOT NULL. The only option that gives us the expected result seems to be IS DISTINCT FROM NULL instead, as per the docs:

In some cases, it may be preferable to write row IS DISTINCT FROM NULL or row IS NOT DISTINCT FROM NULL, which will simply check whether the overall row value is null without any additional tests on the row fields.

This would be in conflict with the ?table=not.is.null syntax... maybe needs to be changed to ?table=isdistinct.null?
Edit:Scratch that, we could document the above and still keep the idea of expecting a "not null entire row" for the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

4 participants