Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result of fix changes on subsequent runs (fix is non-deterministic) #2134

Closed
juhoautio opened this issue Dec 18, 2021 · 53 comments · Fixed by #2546
Closed

Result of fix changes on subsequent runs (fix is non-deterministic) #2134

juhoautio opened this issue Dec 18, 2021 · 53 comments · Fixed by #2546
Labels
bug Something isn't working core Issues relating to core design or architecture of SQLFluff

Comments

@juhoautio
Copy link
Contributor

juhoautio commented Dec 18, 2021

Expected Behaviour

Running sqlfluff fix my_query.sql -f modifies query format – but running it again on the same file should not make any further changes.

Observed Behaviour

Summary: The result changes considerably also on the 2nd run. On further runs the result keeps changing with a minor whitespace change.

The behavior is demonstrated in this branch: https://github.com/juhoautio/sqlfluff/commits/non_deterministic_fix_bug

Console output for each command is included below.

1st run of sqlfluff fix

$ python -m sqlfluff fix my_query.sql -f
==== finding fixable violations ====
== [my_query.sql] FAIL
L:   3 | P:  20 | L010 | Keywords must be consistently upper case.
L:   4 | P:  20 | L010 | Keywords must be consistently upper case.
L:   5 | P:  23 | L010 | Keywords must be consistently upper case.
L:   7 | P:  16 | L008 | Commas should be followed by a single whitespace unless
                       | followed by a comment.
L:   9 | P:   2 | L003 | Indentation not consistent with line #8
L:   9 | P:   2 | L022 | Blank line expected but not found after CTE closing
                       | bracket.
L:  10 | P:   6 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  11 | P:  10 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  12 | P:  10 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  13 | P:  10 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  13 | P:  10 | L030 | Function names must be consistently lower case.
L:  14 | P:   6 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  14 | P:  30 | L031 | Avoid aliases in from clauses and join conditions.
L:  15 | P:  15 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  15 | P:  15 | L051 | INNER JOIN must be fully qualified.
L:  15 | P:  38 | L031 | Avoid aliases in from clauses and join conditions.
L:  16 | P:  15 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  17 | P:  15 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  17 | P:  15 | L051 | INNER JOIN must be fully qualified.
L:  17 | P:  44 | L031 | Avoid aliases in from clauses and join conditions.
L:  18 | P:  15 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  19 | P:  17 | L003 | Line over-indented compared to line #8
L:  20 | P:   6 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  21 | P:   8 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  22 | P:   8 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  23 | P:   6 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  24 | P:   2 | L003 | Indentation not consistent with line #8
L:  25 | P:   2 | L003 | Indentation not consistent with line #8
L:  25 | P:   2 | L022 | Blank line expected but not found after CTE closing
                       | bracket.
L:  26 | P:   6 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  27 | P:  10 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  27 | P:  23 | L010 | Keywords must be consistently upper case.
L:  28 | P:  10 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  28 | P:  23 | L010 | Keywords must be consistently upper case.
L:  29 | P:  10 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  29 | P:  29 | L006 | Missing whitespace before -
L:  29 | P:  29 | L006 | Missing whitespace after -
L:  29 | P:  42 | L010 | Keywords must be consistently upper case.
L:  30 | P:  10 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  30 | P:  30 | L006 | Missing whitespace before -
L:  30 | P:  30 | L006 | Missing whitespace after -
L:  30 | P:  41 | L006 | Missing whitespace before /
L:  30 | P:  41 | L006 | Missing whitespace after /
L:  30 | P:  52 | L006 | Missing whitespace before *
L:  30 | P:  52 | L006 | Missing whitespace after *
L:  30 | P:  58 | L010 | Keywords must be consistently upper case.
L:  30 | P:  78 | L016 | Line is too long
L:  31 | P:   6 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  31 | P:  26 | L011 | Implicit/explicit aliasing of table.
L:  31 | P:  26 | L031 | Avoid aliases in from clauses and join conditions.
L:  32 | P:   6 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  32 | P:  31 | L011 | Implicit/explicit aliasing of table.
L:  32 | P:  31 | L031 | Avoid aliases in from clauses and join conditions.
L:  33 | P:   6 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  34 | P:   6 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  35 | P:   2 | L003 | Indentation not consistent with line #8
L:  36 | P:   1 | L022 | Blank line expected but not found after CTE closing
                       | bracket.
L:  39 | P:   9 | L030 | Function names must be consistently lower case.
L:  40 | P:  23 | L005 | Commas should not have whitespace directly before them.
L:  40 | P:  33 | L030 | Function names must be consistently lower case.
L:  42 | P:  13 | L030 | Function names must be consistently lower case.
L:  46 | P:   7 | L003 | Indentation not hanging or a multiple of 4 spaces
L:  46 | P:  36 | L010 | Keywords must be consistently upper case.
L:  46 | P:  45 | L048 | Missing whitespace after '2'
L:  55 | P:   1 | L022 | Blank line expected but not found after CTE closing
                       | bracket.
L:  57 | P:   9 | L030 | Function names must be consistently lower case.
L:  64 | P:  13 | L030 | Function names must be consistently lower case.
L:  69 | P:   1 | L022 | Blank line expected but not found after CTE closing
                       | bracket.
==== fixing violations ====
68 fixable linting violations found
FORCE MODE: Attempting fixes...
Persisting Changes...
== [my_query.sql] PASS
Done. Please check your files to confirm.
  [16 unfixable linting violations found]

Resulting diff here: juhoautio@bc60c57

Many formatting fixes, as expected. The result looks good!

2nd run of sqlfluff fix

$ python -m sqlfluff fix my_query.sql -f
==== finding fixable violations ====
== [my_query.sql] FAIL
L:  31 | P:  87 | L016 | Line is too long
L:  33 | P: 100 | L016 | Line is too long
L:  51 | P:  47 | L048 | Missing whitespace after '2'
==== fixing violations ====
3 fixable linting violations found
FORCE MODE: Attempting fixes...
Persisting Changes...
== [my_query.sql] PASS
Done. Please check your files to confirm.
  [2 unfixable linting violations found]

Resulting diff here: juhoautio@148c91d

The formatting still changes greatly. I'm not sure if this is for better or worse, but that's a matter of taste. The problem is that formatting changes at all.

3rd run of sqlfluff fix

$ python -m sqlfluff fix my_query.sql -f
==== finding fixable violations ====
== [my_query.sql] FAIL
L:  57 | P:  47 | L048 | Missing whitespace after '2'
==== fixing violations ====
1 fixable linting violations found
FORCE MODE: Attempting fixes...
Persisting Changes...
== [my_query.sql] PASS
Done. Please check your files to confirm.
  [2 unfixable linting violations found]

Resulting diff here: juhoautio@befff64

Now formatting doesn't change much but it still does: redundant whitespace is added.

4th run of sqlfluff fix

$ python -m sqlfluff fix my_query.sql -f
==== finding fixable violations ====
== [my_query.sql] FAIL
L:  57 | P:  47 | L048 | Missing whitespace after '2'
==== fixing violations ====
1 fixable linting violations found
FORCE MODE: Attempting fixes...
Persisting Changes...
== [my_query.sql] PASS
Done. Please check your files to confirm.
  [2 unfixable linting violations found]

Resulting diff here: juhoautio@d9bcbf6

Adds yet another redundant space character. I suspect it would keep adding it until infinity if sqlfluff fix is run again.

Steps to Reproduce

Get my_query.sql from here:
juhoautio@2b99acd

(This query is an obfuscated version of a query that executes successfully in AWS Athena)

Run python -m sqlfluff fix my_query.sql -f and check the file for changes after each execution.

Dialect

Not specified. Dialects are not important here: the expectation here is that sqlfluff can do generic autoformatting on any sql file and the result shouldn't change after the initial run.

Version

To make sure that these bugs haven't been fixed recently but are not in a release yet, I ran this with local dev version of sqlfluff.

The version I used was the latest commit in main branch: 7fd452d

$ tox -e py --devenv .venv
$ source .venv/bin/activate
$ python -m sqlfluff --version
python -m sqlfluff, version 0.9.0
$ python --version
Python 3.7.5

Not using dbt.

Configuration

N/A

@juhoautio juhoautio added the bug Something isn't working label Dec 18, 2021
@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

@juhoautio Which dialect is this query for, I get a parsing error on your query at line 46 if I just use ANSI (default)? You should see it if you run python -m sqlfluff lint my_query.sql

image

Not sure if relevant to the actual issue but would be good to clear this up incase it it causing issues

@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

Actually yes, that repeating whitespace issue is because that line doesn't parse.
If i change the query to something that parses in the ANSI dialect then you don't get the repeating whitespace additions.
image

Rules can't properly handle unparseable sections since they won't follow the same logical structure as a properly formed parse tree.

@barrywhart I notice that since we only show the fixable lints when running sqlfluff fix it's hard for users to know that it's just a parse error.
I think my preference here would be to introduce a general check on LintFixes to see if they are applied within an unparseable segment and if so throw them away? Basically the same as we do for fixes to templated segments, because anything that follows from an unparseable segment is pretty much nonsense?
Alternatively we would need to add some output to the CLI to let users who just use sqlfluff fix know about parsing/templating errors.

@juhoautio
Copy link
Contributor Author

This query is primarily for AWS Athena but also works on Presto. Neither of those seem to be supported dialects.

But as I wrote about my expectation:

Dialects are not important here: the expectation here is that sqlfluff can do generic autoformatting on any sql file and the result shouldn't change after the initial run.

Is this a valid assumption?

@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

for the "line is too long" lints that get raised twice they are from these two lines:
image

Minimum reproducible example:

SELECT
    abs(round(a.metricx-b.metricx)) as col_c_rel_diff,
    abs((round(a.metricx-b.metricx)/a.metricx)*100) as metric_x_rel_diff
FROM foo_bar_report a
LEFT JOIN xxx_yyy_report b
ON a.event_date = b.event_date;

Seems like there's multiple fixes (namely L006 and L031) being applied to these two segments that push the line length over the threshold for L016, which doesn't get picked up to the subsequent run.

After the first fix run it looks like this, which should understandably trigger L016.

SELECT
    abs(round(foo_bar_report.metricx - xxx_yyy_report.metricx)) AS col_c_rel_diff,
    abs((round(foo_bar_report.metricx - xxx_yyy_report.metricx) / foo_bar_report.metricx) * 100) AS metric_x_rel_diff
FROM foo_bar_report
LEFT JOIN xxx_yyy_report
    ON foo_bar_report.event_date = xxx_yyy_report.event_date;

L006 and L031 are working exactly as they should, so the main question is why does L016 not just get applied at the next iteration of the fix loop?

@barrywhart thoughts? My initial guess would be the that L006 and L031 are maybe applying their fixes at weird levels of the parse tree which stops L016 from working properly. That would explain why it then works on a subsequent sqlfluff fix since the tree gets reparsed.

@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

This query is primarily for AWS Athena but also works on Presto. Neither of those seem to be supported dialects.

But as I wrote about my expectation:

Dialects are not important here: the expectation here is that sqlfluff can do generic autoformatting on any sql file and the result shouldn't change after the initial run.

Is this a valid assumption?

@juhoautio well dialects are important since the structure of the sql varies depending on them. ANSI doesn't know how to interpret that specific line, so might as well be a random line of python in there. As explained here we should probably not be trying to apply fixes to a section we can't parse.

FYI on complexity of parsing SQL in SQLFluff vs Python in Black. Python has one dialect (different versions but the syntax is mostly the same) and Python progamatically provides its own AST. SQL on the other hand has a lot of different dialects (postgres, tsql, bigquery, etc) each with their own interpretation of how to structure a query and custom grammar. Additionally the various SQL dialects do not provide their AST so part of this project also requires building out the dialects.

@juhoautio
Copy link
Contributor Author

Sure, I understand that by specifying a dialect it will be possible to do more. But my comment was about how to deal with parsing issues. And as you said:

we should probably not be trying to apply fixes to a section we can't parse

That's exactly what I was expecting :)

@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

Cool makes sense, I think it would be a sensible approach and hopefully not too tough to implement 👍

As for the line length being highlighted on the second run, I'm pretty sure it's due to an issue (#1304 and #1668 N.B. not the same lint error but related conceptually) with fixes messing up the parse tree. In theory L016 would be picked up in subsequent fix loops and so fixed in a single sqlfluff fix run (i.e. the desired non-deterministic behaviour), but the parse tree after the fixes is not the same as the parse tree if we were to fully reparse the fixed sql between loops. The maintainers are aware of this so will need a more general approach to fix this issue.

Let's give @barrywhart some time to read and give a second opinion 😄

@juhoautio
Copy link
Contributor Author

A general question inspired by black: if black's auto-formatting produces a different result on the 2nd run, black gives up and fails with an error, leaving the problematic file untouched. Is sqlfluff supposed to also work like that? If not, why? In this case it doesn't seem so. I'm talking about an isolated check on top of everything that would be done by SQLFluff itself so that user doesn't need to do it.

@tunetheweb
Copy link
Member

I can understand the desire to have that, and it makes sense in many ways. However I have had issues in the past where two runs the settled and so it would be a shame to not have got the benefit of those fixes because it didn’t settled after just one run.

But the downside of that is if it never settles (like in your case) then it’s pointless continuing to change. The problem is finding those cases and following the Black way of doing this may be a bit naive for us for now.

I think ultimately we need some changes to rules for cases like this. A few rules can conflict with each other, and some rules so only a partial fix (e.g. a syntax change), expecting the whitespace rules to format it correctly later on a subsequent run. That’s the ultimate root cause of this but for good reason (we don’t want to have to embed the whitespace rules in amongst ever other rule). We’ve talked for a bit now about having a more generic whitespace fixer that lives out side the rules and could be called by each rule but have not done the work yet.

IIRC SQLFluff internally calls 10 iterations for each run. So that should be enough. Maybe short term we could increase that to 20 (to catch a few more cases like I suggested in my first paragraph) and then bomb out with no change if not settled by run 20? Not sure how difficult that would be to do though as not looked at that code previously.

@sqlfluff sqlfluff deleted a comment from juhoautio Dec 18, 2021
@sqlfluff sqlfluff deleted a comment from juhoautio Dec 18, 2021
@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

2 things:

@juhoautio

A general question inspired by black: if black's auto-formatting produces a different result on the 2nd run, black gives up and fails with an error, leaving the problematic file untouched. Is sqlfluff supposed to also work like that? If not, why? In this case it doesn't seem so. I'm talking about an isolated check on top of everything that would be done by SQLFluff itself so that user doesn't need to do it.

I don't believe Black remembers what it did in a previous run of black <file> (please provide link to relevant code if you feel otherwise), it just gets it right the first time.

@tunetheweb
we're not proposing throwing away any changes. We are proposing a) don't apply fixes to unparsable sections and b) fix the parse tree between iterations so that we catch these extra errors within a single run of sqlfluff fix (it's not that we don't have enough iterations but that the parse tree after a fix != the parse tree of a fixed sql that's been reparsed from scratch on a subsequent sqlfluff fix) (@barrywhart is well aware of this issue and already has discussed potential fixes so don't think we need to retread here, it does highlight that it is a big issue that we need to address though).

The two issues in this query are summarised in comments here and here

Suggest that this is a topic better suited for maintainers meeting.

(to keep this issue focussed I've removed the hive discussion)

@juhoautio
Copy link
Contributor Author

(Thanks a lot for the comments!)

@tunetheweb:

However I have had issues in the past where two runs the settled and so it would be a shame to not have got the benefit of those fixes because it didn’t settled after just one run.

Yes. So it can be left to the user to choose what to do. In the usual case I could imagine doing this:

  • Run fix on the file in some kind of forced mode that modifies the file anyway (if this doesn't exist it could be added)
  • Add comments to disable sqlfluff on problematic parts of the query or the entire file (is that supported?)
  • Commit the changes
  • Submit an issue about the bug

Eventually if the issue is fixed the manual disabling rules can be removed from the file.

In general it's very problematic if sqlfluff doesn't settle already on the first execution of fix (I mean the command. Internal iteration rounds can be as many as needed) without raising an error, because then we end up with problems like this:

  • User runs eg. precommit hook that applies sqlfluff fix on all files and commits the result
  • CI run in a PR does the same, but a build check fails because the build notices that after fixing there's diff wrt the committed files

@juhoautio
Copy link
Contributor Author

@jpy-git:

I don't believe Black remembers what it did in a previous run of black (please provide link to relevant code if you feel otherwise), it just gets it right the first time.

I don't believe either that black remembers what previous execution of executing black did. But my understanding (based on watching the PyCon 2019 talk by Łukasz Langa, the author of black) is that internally black always runs the full process twice for this particular reason. It checks that the result doesn't change on the 2nd run. If it does -> error out. This guarantees deterministic results. If you would run black again, there would be no difference.

If there's reason to doubt that, please let me know and I'll try to find a more fresh source to back that.

@tunetheweb
Copy link
Member

@juhoautio doesn’t the fact the CI changes it raise the issue to the user in a more obvious way rather ? So they can manually fix the SQL and/or exempt the unstable lines (we do support this btw). Some might argue that is better than just not applying any fixes.

@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

@juhoautio as mentioned in the original slack response that is exactly what SQLFluff does. If the mentioned parse tree issue was fixed then this would start working exactly as both you, me, black, and sqlfluff would expect.

in theory SQLFluff is deterministic.
It loops multiple times over the parse tree to apply fixes and stops when the parse tree is no longer changed between loops i.e. subsequent lints should leave the tree unchanged.
It should abort the changes if it reaches some upper limit on the fix loops to avoid making repeated fixes like you describe.

@juhoautio
Copy link
Contributor Author

juhoautio commented Dec 18, 2021

@tunetheweb

Ok, in my opinion the issue should be raised already when sqlfluff fix is run locally. I actually fail to see how raising it on CI makes it "more obvious" (not sure what you mean by that?)

User should be able to know already in local dev environment that the committed code passes all checks before pushing to remote. First of all that allows avoiding some redundant steps of waiting, debugging, fixing, pushing, communicating..

If an issue is raised only at CI it causes slower turnaround, it may distract other developers without a good reason ("reviewer: this PR looks good otherwise but the build failed!"), and debugging the issue in general is not as nicely isolated because initially you don't know if there is something wrong or different in the CI setup vs. the developer's local environment and you may need to dig into the CI logs to see what exactly failed, as opposed to seeing the error in local dev console (this is rather minor though – you will see it as soon as you run fix again locally).

edit: @jpy-git with that, sounds perfect. Sorry, I misunderstood the scope of "parse tree issue". I thought that fixing that would not yet globally guarantee determinism :)

@juhoautio
Copy link
Contributor Author

Regarding what to do if fix is found to not settle – I proposed in #2134 (comment) that the file would not be touched at all, and an error would be raised. However, maybe a best effort strategy could be more useful in case of sqlfluff: keep the changes from the first round (or even the 2nd) and raise an error. That way user could immediately see if there are still helpful format fixes without having to manually run fix with some kind of force flag (this also helps keeping the CLI simpler). So, I would vote for keeping the formatting changes, if possible. The important change in behavior would be to raise en error though. I'm repeating myself a bit, but then user could 1. commit the format changes and/or apply some manual fixes 2. mark the problematic file/section as ignored. 3. After that fix would not change anything and it would not raise an error either.

@tunetheweb
Copy link
Member

tunetheweb commented Dec 18, 2021

Ok, in my opinion the issue should be raised already when sqlfluff fix is run locally. I actually fail to see how raising it on CI makes it "more obvious" (not sure what you mean by that?)

Agreed. An in general it does. It should be exceptional times when it does not do that, which implies one of the following:

  1. SQLFluff is incredibly busted and unstable. I do not believe this is the case in general, but there are certainly cases that are not covered (and we want to know about them!)
  2. You are using an unsupported dialect. Well I'm sorry but if a dialect is not covered then there are no guarantees!
  3. The SQL is so badly written that is needs a LOT of work to clean it up. In this case the developer is not confirming to the internal coding style. A linter is basically a basic syntax checker and style guide enforcer. Of course it would be ideal if it can be automatically enforced by the linter but if the code is so far off what's expected that that makes it require extra effort on the developer's side then I'm not going to lose that much sleep about it to be honest. Either the developer should code closer to the preferred style for that project and have the linter only do minor corrections, or the style guide/SQFluff config is wrong and it should be corrected to align to this project. The point of a linter IMHO is not only to enforce style, but also to teach that.

And what I mean by making it more obvious is that something has broken. Silently rolling back the change could "hide" the error.

I'd certainly like this non-deterministic nature to happen less (ideally not at all), but we've lots to do here in this project and all volunteers so need to prioritise. Back to your original question from Slack:

Does sqlfluff have a general goal of being deterministic in the sense that the result of autoformatting doesn't change on subsequent runs?

The answer IMHO is: Yes it has that general goal but it is not 100% guaranteed. Maybe it should be as you suggest but that's not the case at the moment.

User should be able to know already in local dev environment that the committed code passes all checks before pushing to remote. First of all that allows avoiding some redundant steps of waiting, debugging, fixing, pushing, communicating..

If an issue is raised only at CI it causes slower turnaround, it may distract other developers without a good reason ("reviewer: this PR looks good otherwise but the build failed!"), and debugging the issue in general is not as nicely isolated because initially you don't know if there is something wrong or different in the CI setup vs. the developer's local environment and you may need to dig into the CI logs to see what exactly failed, as opposed to seeing the error in local dev console (this is rather minor though – you will see it as soon as you run fix again locally).

Absolutely agree. And as I say this should very much be an exception not the norm. But CI is there to catch the unexpected exceptions (otherwise if anything can be, and is, tested locally there is no need to run CI at all!).

You seem to have a particular problem with this but are also falling under items 2 and 3 above. IMHO when moving to using SQLFluff you should only do so if i) the dialect is supported (or another dialect ca be proxied with good enough support) and ii) you've run your entire code through SQLFluff to clean it up and then made one commit with that. Expecting that to happen in a pre-commit for the first time is asking for trouble IMHO. Then in future, and especially over time, SQLFluff pre-commit should act as a safe guard for small misses rather than large-scale rewrites of the SQL.

Regarding what to do if fix is found to not settle – I proposed in #2134 (comment) that the file would not be touched at all, and an error would be raised. However, maybe a best effort strategy could be more useful in case of sqlfluff: keep the changes from the first round (or even the 2nd) and raise an error. That way user could immediately see if there are still helpful format fixes without having to manually run fix with some kind of force flag (this also helps keeping the CLI simpler). So, I would vote for keeping the formatting changes, if possible. The important change in behavior would be to raise en error though. I'm repeating myself a bit, but then user could 1. commit the format changes and/or apply some manual fixes 2. mark the problematic file/section as ignored. 3. After that fix would not change anything and it would not raise an error either.

Potentially we should do this, I agree. However as long as this remains exceptional (and I'm the one repeating myself here now😁), I don't think it's the highest priority we have. I would much prefer the changes @jpy-git has suggest that should make this less likely, even if it doesn't prevent this ever happening again (by either raising the error as you suggest, or by rolling back the change as per the black model). If your option IS that important to you then happy to review any PR you can make.

@tunetheweb
Copy link
Member

However, maybe a best effort strategy could be more useful in case of sqlfluff: keep the changes from the first round (or even the 2nd) and raise an error. That way user could immediately see if there are still helpful format fixes without having to manually run fix with some kind of force flag (this also helps keeping the CLI simpler). So, I would vote for keeping the formatting changes, if possible. The important change in behavior would be to raise en error though.

I just tested your SQL and it keeps some fixes, AND raises an error, AND ALSO exits with a non-zero status code (indicating an error):

% sqlfluff fix -f test.sql
==== finding fixable violations ====
== [test.sql] FAIL                                                                                                                                                                                                                                                             
L:   3 | P:  20 | L010 | Keywords must be consistently upper case.
... LOTS and LOTS of other errors...
L:  69 | P:   1 | L022 | Blank line expected but not found after CTE closing
                       | bracket.
==== fixing violations ====
68 fixable linting violations found
FORCE MODE: Attempting fixes...
Persisting Changes...
== [test.sql] PASS
Done. Please check your files to confirm.
  [16 unfixable linting violations found]
% echo $?
1

So does SQLFluff already behave like you would want it to?

@barrywhart
Copy link
Member

I haven't read the entire discussion above, but want to mention there may be an easy partial fix. Each lint rule can specify _works_on_unparsable = False so it won't run on files that fail to parse. Perhaps this setting should be applied to one or more of the rules involved here? (Note: This setting is in the Python rule class, not the user .sqlfluff file.)

@juhoautio
Copy link
Contributor Author

So does SQLFluff already behave like you would want it to?

Oh, yes, indeed it does! Sorry! I totally missed the non-zero exit code before.

Just wanted to quickly respond to that.

I'll try to do this next:

  1. mark the problematic file/section as ignored.

P.S. If it's worth mentioning, about "raising an error" I was confused by this being the last line printed, more or less indicating a successful outcome:

All Finished 📜 🎉!

@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

I haven't read the entire discussion above, but want to mention there may be an easy partial fix. Each lint rule can specify _works_on_unparsable = False so it won't run on files that fail to parse. Perhaps this setting should be applied to one or more of the rules involved here? (Note: This setting is in the Python rule class, not the user .sqlfluff file.)

@barrywhart will add this to L048 👍

@tunetheweb
Copy link
Member

@barrywhart will add this to L048 👍

I tried it and doesn’t help here 😞

I think there’s something wrong with our parse error detection. See #2118 as another example. It used to work so not sure why it’s not working now.

@juhoautio that might scupper your next step.

@juhoautio
Copy link
Contributor Author

Adding these ignore markers on the problematic line fixed the fundamental issue for me:

event_date <= current_date - INTERVAL '2' day --noqa: PRS,L048

Disabling PRS wasn't enough to avoid an error because there was this:

L:  60 | P:  47 | L048 | Missing whitespace after '2'

I think that was the actual rule that caused fix to modify that line, not the parsing error (correct me if wrong). But parsing error would contribute to non-zero exit code so that's why that needs to be disabled as well.

Now the output is not changing any more on further execution of fix 🥳

So from my perspective the "big picture" looks totally ok – thanks a lot for all the support!

In this case I had to run fix twice though, which is not ideal. Looking forward to also making the formatter more deterministic to fix the line too long issue #2134 (comment) so that fix is more deterministic with supported syntax.

@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

Thanks @juhoautio, some good discussion and definitely some things for us to take away and investigate from this 👍

@juhoautio
Copy link
Contributor Author

Sorry, but I need to still come back to the expectation of raising en error (exit code != 0) if deterministic formatting is not guaranteed. That is not always the case currently.

Based on #2134 (comment) I expect that after the parse tree bug fix it will be guaranteed and that sounds great :)

But I want to also say that that maybe it's an acceptable tradeoff that in some rare cases formatting may change on a subsequent run even though the previous run didn't raise an error, if adding stronger guarantees would basically mean that fix takes 2x time to run every time. But if there are already eg. 10 iterations I presume adding 1 more as a sanity check would only make it 10% slower. But maybe even that is not needed – I don't know the design well enough.

Below are the steps to show once more that currently it's possible for formatting changes to happen on the 2nd run even though previous fix's exit code was 0 (based on #2134 (comment)).

$ cat minimal.sql
SELECT
    abs(round(a.metricx-b.metricx)) as col_c_rel_diff,
    abs((round(a.metricx-b.metricx)/a.metricx)*100) as metric_x_rel_diff
FROM foo_bar_report a
LEFT JOIN xxx_yyy_report b
ON a.event_date = b.event_date;

1st execution:

$ sqlfluff fix minimal.sql -f
==== finding fixable violations ====
== [minimal.sql] FAIL
L:   2 | P:  24 | L006 | Missing whitespace before -
L:   2 | P:  24 | L006 | Missing whitespace after -
L:   2 | P:  37 | L010 | Keywords must be consistently upper case.
L:   3 | P:  25 | L006 | Missing whitespace before -
L:   3 | P:  25 | L006 | Missing whitespace after -
L:   3 | P:  36 | L006 | Missing whitespace before /
L:   3 | P:  36 | L006 | Missing whitespace after /
L:   3 | P:  47 | L006 | Missing whitespace before *
L:   3 | P:  47 | L006 | Missing whitespace after *
L:   3 | P:  53 | L010 | Keywords must be consistently upper case.
L:   4 | P:  21 | L011 | Implicit/explicit aliasing of table.
L:   4 | P:  21 | L031 | Avoid aliases in from clauses and join conditions.
L:   5 | P:  26 | L011 | Implicit/explicit aliasing of table.
L:   5 | P:  26 | L031 | Avoid aliases in from clauses and join conditions.
L:   6 | P:   1 | L003 | Indent expected and not found compared to line #5
==== fixing violations ====
15 fixable linting violations found
FORCE MODE: Attempting fixes...
Persisting Changes...
== [minimal.sql] PASS
Done. Please check your files to confirm.
$ echo $?
0
$ cat minimal.sql
SELECT
    abs(round(foo_bar_report.metricx - xxx_yyy_report.metricx)) AS col_c_rel_diff,
    abs((round(foo_bar_report.metricx - xxx_yyy_report.metricx) / foo_bar_report.metricx) * 100) AS metric_x_rel_diff
FROM foo_bar_report
LEFT JOIN xxx_yyy_report
    ON foo_bar_report.event_date = xxx_yyy_report.event_date;

So, it didn't raise an error.

2nd execution:

$ sqlfluff fix minimal.sql -f
==== finding fixable violations ====
== [minimal.sql] FAIL
L:   2 | P:  83 | L016 | Line is too long
L:   3 | P: 118 | L016 | Line is too long
==== fixing violations ====
2 fixable linting violations found
FORCE MODE: Attempting fixes...
Persisting Changes...
== [minimal.sql] PASS
Done. Please check your files to confirm.
$ echo $?
0
$ cat minimal.sql
SELECT
    abs(
        round(foo_bar_report.metricx - xxx_yyy_report.metricx)
    ) AS col_c_rel_diff,
    abs(
        (
            round(
                foo_bar_report.metricx - xxx_yyy_report.metricx
            ) / foo_bar_report.metricx
        ) * 100
    ) AS metric_x_rel_diff
FROM foo_bar_report
LEFT JOIN xxx_yyy_report
    ON foo_bar_report.event_date = xxx_yyy_report.event_date;

Changed, again (no error raised this time either, but this time that's expected).

@jpy-git
Copy link
Contributor

jpy-git commented Dec 18, 2021

This is the parse tree issue mentioned several times, if you run the sqlfluff command a second time it has no concept of what has previously been done to the query, just reparses and tries to fix. No amount of extra iterations in the first fix run will change the fact that some rules aren't giving a correct parse tree. Because of the differing parse trees both commands see themselves as being correct.

Let's leave this before we start discussing in circles.

@barrywhart
Copy link
Member

Suffice it to say, there are known issues, and we'll address them as we have the time. It's a volunteer project, still quite new. We want perfection, but it'll take time. 💪

@barrywhart
Copy link
Member

It's been set to True for as long as I remember. @alanmcruickshank may know why. My guess is, in the early days of the project, the emphasis was on trying to be useful, but now the project is more popular, we're starting to shift towards safety. (I agree with this shift.)

Thanks for the info about how the flag works -- I didn't know it only applies to specific unparseable areas of the code

@jpy-git
Copy link
Contributor

jpy-git commented Dec 21, 2021

@barrywhart @alanmcruickshank I've raised a PR #2161 to set _works_on_unparsable = False by default. That should fix the majority of situations.

It still wouldn't fix the situation I mentioned here but to fix that I would actually suggest a stricter approach to handling parsing/templating errors when applying fixes.
Consider black on this messy function:
image

  • If the Python is parseable it applies fixes:
    image
  • If the Python is unparseable it will not apply any fixes (even if they don't touch the bad section) to be safe:
    image

Given that SQLFluff is being used in production environments now, I would propose we take the same strict approach. If we detect a parse error or a templating error anywhere in the file we should not apply any fixes as it is impossible for us to guarantee that a rule will not corrupt the code when there is unparseable somewhere in it.

(I'm fine with allowing an attempt at linting the code using the current logic since it's not destructive if you prefer, although would personally prefer a similar strict approach to what I've suggested to fix)

Doing this would remove a lot of these nonsense issues we get where people mis-report lint/fix issues that are actually just due to not parsing. And also we shouldn't be allowing inline parsing/templating errors as ignoring them doesn't suddenly mean there's not a serious issue with the parse tree for rule evaluation. Most importantly we don't want to corrupting people's code (if they run fix it's not like they'll get any warning that their code is unparsable and could get corrupted).

Actually implementing the above should be simple but would be good to get a consensus agreement from us?

@jpy-git
Copy link
Contributor

jpy-git commented Dec 21, 2021

mypy is similar:

  • raises type errors for parseable:
    image
  • only raises syntax (parse) error for unparseable:
    image

@jpy-git
Copy link
Contributor

jpy-git commented Dec 21, 2021

and again for flake8:

  • parseable raises lint errors:
    image
  • unparseable only raises a syntax error:
    image

@jpy-git
Copy link
Contributor

jpy-git commented Dec 21, 2021

Think those set a pretty strong precedent for adopting the strict approach for both sqlfluff lint and sqlfluff fix

@tunetheweb
Copy link
Member

tunetheweb commented Dec 21, 2021

I'm not as convinced @jpy-git 😁 Especially for lint.

I just worry that unparsable sections are far too common for SQLFluff at the moment and this would exclude a lot of lintable (and potentially fixable) errors in files because of one line of syntax that we haven't added yet.

I'd be willing to bet all those examples don't get daily issues on unparsable, like we do.

But on fix I could be more convinced as that is potentially dangerous to fix (though even there there are some that might be safe to fix as discussed in #2161 (comment), but probably not enough value in them to merit this and more value in firm line).

We should also really look at #2118 to allow unparsable sections to be ignored. Or how would that work in this stricter world? Would they be unlintable and unfixable too? Cause that would be a real restriction and cut people's SQL out of the project completely until releases are fixed?

@barrywhart
Copy link
Member

I'm okay allowing fixes to unparsable code, but would prefer that the default behavior would not do that. I'd rather people see and complain about dialect issues than to have their SQL corrupted. In other words, "fail safe" is the default behavior.

My rationale is that the more popular the project becomes, the larger the percentage of less technical users becomes, e.g. data scientists, analysts, mathematicians. If a new, less technical user has an initial bad experience, they're more likely to abandon using SQLFluff, and it hurts the project's reputation out in the larger world.

@mcannamela
Copy link
Contributor

As a user, +1 to @barrywhart 's sentiment

This tool is excellent and it will save me much more time if it just explodes when it can't parse 2 out of 20 files rather than send me backtracking through my commit history to find the place where it made an unsafe change, especially since it could leave me with valid but wrong SQL. I can easily address failures I know about by hand, and shunt the time-saved into fixing the parsing issue.

Especially since part of the allure of tools like dbt (whose templating is what motivated me to seek out something like sqlfluff) is that less experienced devs who "only know SQL" can contribute, it has to be safety first for me.

btw while so many maintainers are in one place, thank you all!

@jpy-git
Copy link
Contributor

jpy-git commented Dec 21, 2021

@barrywhart So just to be clear, so we can make this actionable:

  • You agree with me that we should raise an error for any parsing/templating errors in a file and not apply fixing in the style of black/mypy/flake8?
  • If we want to aim for less technical users, then why add an option to fix unparsable code? Users will find a way to enable it and start creating issues that we'll still have to support, and risk corrupting their code. Better to just stick to the "safe fail" as that's well defined behaviour.
  • Agree that the reputation damage of someone corrupting their files is much worse than someone being annoyed that we haven't added all dialects yet (especially since we're still in beta so the latter is to be expected).
  • What is your opinion on the same strict parsing/templating conditions applied to linting? This would allow us to better direct issues to improving the dialects rather than leading people to false conclusions and wasting time.

I just worry that unparsable sections are far too common for SQLFluff at the moment and this would exclude a lot of lintable (and potentially fixable) errors in files because of one line of syntax that we haven't added yet.

@tunetheweb I think we have pretty good coverage of dialects for a lot of common statements (don't forget we're likely biased on thinking there's bad coverage since we see all the submitted issues, a lot of the issues that get submitted aren't for code that the majority of analysts use commonly 😄).

We should also really look at #2118 to allow unparsable sections to be ignored. Or how would that work in this stricter world? Would they be unlintable and unfixable too? Cause that would be a real restriction and cut people's SQL out of the project completely until releases are fixed?

inline templating/parsing ignores would be removed in the strict world, we shouldn't be doing anything with this code. Just because someone ignores a parsing/templating error doesn't mean that there aren't serious issues with the parse tree that can and will lead to code corruption. Nonsense begets nonsense.

@jpy-git
Copy link
Contributor

jpy-git commented Dec 21, 2021

yamllint:
parseable: raises lints
image
unparseable: only syntax error
image

@juhoautio
Copy link
Contributor Author

Coming back to the point about how black ensures "deterministic changes or nothing" (#2134 (comment)), I happened to come across the source. It's pretty readable in the source code itself:
https://github.com/psf/black/blob/7c94ed61a55f8ae0c60737cbc6cfee3b5066ce11/src/black/__init__.py#L1348-L1361

As an example of hitting that in real life, here's one resolved issue: psf/black#1629.

So, even though black already has a deterministic way of knowing the exact AST it can sometimes fail to settle regarding the formatting on the first pass and there is explicit protection against that on the top level. I presume sqlfluff could suffer the same in some corner cases even if there is nothing wrong with the parsing. Or then sqlfluff is doing something better than black on the formatting layer :)

Based on the previous comments I understood that sqlfluff fix does not have that level of determinism as a strong guarantee although it is a soft goal. Also in the discussion here it seems that the main worry is about producing invalid result. The topic of settling on the formatting doesn't seem to be seen as important for sqlfluff maybe.

But as a user I do think it's really nice to have that top level guarantee of being deterministic – but maybe it would cost just a bit too much in case of sqlfluff which is already on the slow side if executed against any substantial amount of sql. But it would be interesting to hear how expensive you think it would be to add that level of protection: would it mean 2x runtime or something less.

@tunetheweb
Copy link
Member

So it looks like black gets everything in one pass. So a if a second pass produces a different result then it's an error. Nice and simple.

SQLFluff takes a slightly different approach (and not saying this is the right or wrong one btw), as discussed above, where certain rules deliberately do not do the full fix (e.g. just slot a new word in without worrying about whitespace, as another rule will fix that for it). So some rules depend on other rules, often in a subsequent pass. This keeps the logic of each rule simpler, but at the cost of not being able to do it in one pass - which is both costly in execution times, and also makes this particular ask here more tricky.

I think there is also a question is to whether we want this to be deterministic or not. I mean it seems obvious that yes you would, but as I say I’ve had instances where running it two (or even three) times fixed more and more errors until it finally settled. Yes that was not ideal, and a bit confusing and painful, but when first onboarding SQLFluff I much preferred that, than bombing out with no changes because it wasn’t deterministic after one run. The OP also changed their mind and decided they’d rather keep the fixes than throw them away, which appears to be the black model. So it’s not quite as clear cut as it initially seems.

Now currently, because of above design, we already run a max of 10 iterations per execution. Should that be 20 (or 30? or even 40?) to capture the above scenarios? That would make it considerably slower, particularly for the non-deterministic queries which would go all the way to the end.

So doing one more run (to 11 iterations) and not persisting that one might not add much run time, and might allow bombing out at the end if that shows its non-deterministic but to what avail? If it doesn’t work with 10 then why will it with 11? And some (like me! And also the original OP) would prefer to get the fixes it can and then investigate why others are not settling. Others may prefer getting nothing and just bombing out.

Not to mention we have on occasion seen some bugs rather mean the parse tree gets messed up and future iterations in the current execution won’t fix this, but a fresh execution will. Not ideal and shouldn’t happen but another reason why subsequent executions can ultimately lead to a deterministic answer, even if not in that first execution of 10 runs.

So, I don’t think we have a clear answer here of what the ask is? But keen to hear what all your interpretations of the ask are and maybe there is more consensus than I realise. I get that being deterministic is the ask, but to understand what that means need to understand what to do in the edge cases where 1) it’s not deterministic at all and 2) where it is but requires several runs to get there: when do we persist the changes we can make, and when do we give up trying further?

@juhoautio
Copy link
Contributor Author

I think there is also a question is to whether we want this to be deterministic or not. I mean it seems obvious that yes you

Just want to clarify that I wasn't necessarily saying that. I don't think I'm educated enough to form an actual opinion, at least at the current stage of SQLFluff. Maybe it depends on the level of coverage for dialects and so on. But sure, in an ideal situation eventually I would warmly welcome fully deterministic behavior :)

So doing one more run (to 11 iterations) and not persisting that one might not add much run time, and might allow bombing out at the end if that shows its non-deterministic but to what avail? If it doesn’t work with 10 then why will it with 11?

Right, so with just 10% additional cost to the execution time, SQLFluff could quite easily do the same that black does: if the formatting still changes between rounds 10 & 11, it could raise an error that a deterministic result cannot be reached. User could then act upon that to fix it way or another (or mark the file ignored) to not hit the issue later on CI. Again, I'm not saying, at least strongly, that it should be done, but just pointing out it as an opportunity. Even if fix needs to fail at that point, it could be possible to keep the format changes done so far and allow user to for example commit those (if happy with it), then mark the file as ignored & report the issue to SQLFluff.

@barrywhart
Copy link
Member

You agree with me that we should raise an error for any parsing/templating errors in a file and not apply fixing in the style of black/mypy/flake8?

Yes.

If we want to aim for less technical users, then why add an option to fix unparsable code? Users will find a way to enable it and start creating issues that we'll still have to support, and risk corrupting their code. Better to just stick to the "safe fail" as that's well defined behaviour.

I'm okay without this. I was mainly making the point that, if this capability is valuable, we could make it optional rather than removing it completely. At this time, I'm +1 for removing it completely but open to discussion.

What is your opinion on the same strict parsing/templating conditions applied to linting? This would allow us to better direct issues to improving the dialects rather than leading people to false conclusions and wasting time.

Same, I think.

@barrywhart
Copy link
Member

About the "10 vs 11" discussion, that value is configurable, and you can make it as high as you want. We've had a few rules which were dumb, like they'd move some code one line at a time, and if it needed to move 20 lines, that would require 20 passes.

I found and updated one such rule. There may be others. I would say that as a long-term goal, we try and avoid creating rules like that, and fix them when we find them. In other words, we allow combinations of rules to require multiple passes, but any single fix should be completed in one pass. We don't have a way to enforce that for now, so let's call that a soft, long-term goal.

@barrywhart
Copy link
Member

@jpy-git: Much earlier in the discussion, you asked about the behavior of L006, L031, and L016 on a particular query. Should we create a separate issue for that? This discussion bounced around a lot, and I couldn't quite follow it all without 🤯.

@barrywhart
Copy link
Member

@jpy-git: Would it be reasonable to rename this issue as "'sqlfluff fix' should not make changes if the file has parsing or templating errors"?

@tunetheweb tunetheweb added the core Issues relating to core design or architecture of SQLFluff label Jan 26, 2022
@barrywhart
Copy link
Member

The discussion on this issue covered a lot of areas, but it seems we settled on the above, sqlfluff fix should not make changes if there are parsing or templating errors. Marking this as fixed by #2546.

@juhoautio
Copy link
Contributor Author

Thank you for the fix of not formatting at all in case of parsing errors! That helps a lot.

However, the original issue still exists: result of fix is non-deterministic. The "minimum example" in #2134 (comment) still happens: it takes 2 calls of sqlfluff fix -f to settle on the final format. Is it possible to reopen this issue, please?

I didn't spot an answer to this question by @jpy-git yet:

L006 and L031 are working exactly as they should, so the main question is why does L016 not just get applied at the next iteration of the fix loop?

That question sounds to me like it is not expected for sqlfluff to not settle in this case. Could there be even some fundamental issue that is causing some cases like this to be missed? Or should this particular issue at least be fixed somehow?

I would still like to see at least this issue fixed if possible, even if sqlfluff won't have general guarantee of being deterministic. My hope would still be that the format is final or then there is no change at all and sqlfluff fails instead. But I understood based on this discussion that project is not going to have that as a strict goal.

@tunetheweb
Copy link
Member

Added a suggestion here that might make less far less of an issue: #1304 (comment)

@barrywhart
Copy link
Member

@juhoautio: Feel free to create new issues for the specific questions you raised. As maintainers, we look at the discussion on issues and make a decision what to do, then close the issue. When the discussion meanders, we can't track all the threads of discussion and keep the issue open until all of them are addressed. It's better to have one issue per issue. 🙂 You didn't do anything wrong, but this is how we work. We are a team of part-time volunteers, figuring out the right way for SQLFluff to work one day, one issue at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Issues relating to core design or architecture of SQLFluff
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants