-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reliance on undefined behavior of ORDER BY #538
Comments
Interesting, thank you for reporting this. So is it true that attaching |
My understanding is that
But there may be other scenarios I'm not aware of. In any case, I would be happy to hear more about all this from a real SQL/PostgreSQL expert. |
After a bit of research, I've discovered that at least MS SQL Server does not allow it, and gives the following error:
( Here is the sqlfiddle I used: http://sqlfiddle.com/#!18/a7540/47886 Finding official documentation about the issues I have raised is tricky, but I've found some stack overflow answers :) like this one:
And some more discussion here: https://dba.stackexchange.com/questions/184149/is-it-really-possible-that-the-order-will-not-be-guaranteed-for-this-particular And also wikipedia, from https://en.wikipedia.org/wiki/Order_by (which sadly does not supply any official references for this claim):
|
That's very interesting. Thank you for digging further. If someone can come up with a reproducible test case then I will fix this. On the other hand, without a reproducible test case I fear that any such "bug fix" is liable to get undone by accident. |
This is a copy of the issue that I opened against the rel8 repo here: circuithub/rel8#151
Hello, this library is really cool. The best feature is that it can return trees of data instead of just tables.
But I noticed that the SQL it generates relies on undefined (or unspecified?) behavior of PostgreSQL, with regards to ORDER BY and LIMIT and ARRAY_AGG.
Here is an example query with ORDER BY and LIMIT:
If this is changed to:
then this is no longer the same query. Even though the inner SELECT has an ORDER BY clause, once it is selected from by the outer select, it is treated as an unordered relation, and the rows may result in any order. So the LIMIT clause will choose an arbitrary 5 rows. In practice, it seems that PostgreSQL will always returned the "expected" results for this query, but it is not guaranteed. The general rule of thumb is that LIMIT clause only makes sense when attached directly to an ORDER BY clause
And now for ARRAY_AGG there is a similar issue.
Here is an example query:
This query doesn't do what we want. PostgreSQL aggregate functions aren't influenced by the ORDER BY clause that we have here, and so the result can be that the elements will be in any arbitrary order. The correct version of this query is this:
If we want a LIMIT clause then we actually need two ORDER BY clauses like this:
Also in these examples, even with the "incorrect" ARRAY_AGG query, it seems that in practice PostgreSQL will always return the "expected" result, but again this is not guaranteed.
I am not sure when these types of "incorrect" queries are likely to give incorrect results in practice (it may be never), but a good guess is when the queries become very complicated (go beyond the
from_collapse_limit
) or involve parallel scans.The text was updated successfully, but these errors were encountered: