Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rel8 relies on undefined behavior of ORDER BY #151

Open
bitc opened this issue Jan 6, 2022 · 4 comments
Open

rel8 relies on undefined behavior of ORDER BY #151

bitc opened this issue Jan 6, 2022 · 4 comments

Comments

@bitc
Copy link

bitc commented Jan 6, 2022

Hello, this library is really cool. The best feature is that it can return trees of data instead of just tables.

But I noticed that the SQL it generates relies on undefined (or unspecified?) behavior of PostgreSQL, with regards to ORDER BY and LIMIT and ARRAY_AGG.

Here is an example query with ORDER BY and LIMIT:

SELECT name
FROM author
ORDER BY name
LIMIT 5

If this is changed to:

SELECT name
FROM
    (SELECT *
    FROM author
    ORDER BY name
    ) T1
LIMIT 5

then this is no longer the same query. Even though the inner SELECT has an ORDER BY clause, once it is selected from by the outer select, it is treated as an unordered relation, and the rows may result in any order. So the LIMIT clause will choose an arbitrary 5 rows. In practice, it seems that PostgreSQL will always returned the "expected" results for this query, but it is not guaranteed. The general rule of thumb is that LIMIT clause only makes sense when attached directly to an ORDER BY clause

And now for ARRAY_AGG there is a similar issue.

Here is an example query:

SELECT ARRAY_AGG(name)
FROM
    (SELECT *
    FROM author
    ORDER BY name
    ) T1;

This query doesn't do what we want. PostgreSQL aggregate functions aren't influenced by the ORDER BY clause that we have here, and so the result can be that the elements will be in any arbitrary order. The correct version of this query is this:

SELECT ARRAY_AGG(name ORDER BY name)
FROM
    (SELECT *
    FROM author
    ) T1;

If we want a LIMIT clause then we actually need two ORDER BY clauses like this:

SELECT ARRAY_AGG(name ORDER BY name)
FROM
    (SELECT *
    FROM author
    ORDER BY name
    LIMIT 5
    ) T1;

Also in these examples, even with the "incorrect" ARRAY_AGG query, it seems that in practice PostgreSQL will always return the "expected" result, but again this is not guaranteed.

I am not sure when these types of "incorrect" queries are likely to give incorrect results in practice (it may be never), but a good guess is when the queries become very complicated (go beyond the from_collapse_limit) or involve parallel scans.

@ocharles
Copy link
Contributor

ocharles commented Jan 6, 2022

I understand exactly what you're saying, and don't disagree with it. You could probably escalate this issue to Opaleye (cc @tomjaguarpaw), as the same issue exists there, too. In practice, I guess we're "lucky"

@tomjaguarpaw
Copy link
Contributor

Yes, this as an Opaleye issue. @bitc can you open an issue on the Opaleye tracker? (You can paste the contents of you message here almost verbatim.)

https://github.com/tomjaguarpaw/haskell-opaleye/issues/new

@bitc
Copy link
Author

bitc commented Jan 9, 2022

I have opened issue tomjaguarpaw/haskell-opaleye#538 against haskell-opaleye

I suppose this issue can be closed now.

@ocharles
Copy link
Contributor

I'm gonna keep it open, because I believe Rel8 has its own complications here (for example, adding CAST() around a final SELECT would technically re-introduce this problem).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants