-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lint SQL with SQLFluff #321
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #321 +/- ##
==========================================
+ Coverage 87.33% 87.34% +0.01%
==========================================
Files 185 185
Lines 2384 2387 +3
Branches 321 314 -7
==========================================
+ Hits 2082 2085 +3
- Misses 276 302 +26
+ Partials 26 0 -26 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
@jasonaowen The following comments are prior to trying SQLFluff. I am hesitant to introduce a Python dependency in a Node.js project, but I am open to it for several reasons:
My other hesitation is the net value to the project. Is SQLFluff going to make a big difference? I know it will improve the SQL scripts aesthetically, probably will prevent a couple bugs, but considering the cost of bringing in this external dependency, my thought prior to actually running SQLFluff is "it is probably not worth the hassle." In general, though, static analysis is great. So I am not really against it. My next step is to try SQLFluff locally and see what it does. |
Already I have a bad User Experience when following the https://docs.sqlfluff.com/en/stable/gettingstarted.html#gettingstartedref verbatim (I have Python 3.9.2 and pip 20.3.4 installed). First, this seemed to work OK:
I see the ERROR but
I should be able resolve this in a few minutes, but the "out of the box" experience, especially for a non-Python developer, might leave something to be desired. Update: yes, searching online reveals that this is a common error, the workaround is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to work:
$ sqlfluff lint src/database/queries/baseFields/selectAll.sql
== [src/database/queries/baseFields/selectAll.sql] FAIL
L: 4 | P: 14 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords]
L: 5 | P: 13 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords]
L: 6 | P: 14 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords]
All Finished 📜 🎉!
$ cat src/database/queries/baseFields/selectAll.sql
SELECT
id,
label,
short_code as "shortCode",
data_type as "dataType",
created_at as "createdAt"
FROM base_fields;
There is work to be done to replace labels (as noted in the original post) and that would need to happen before I can get anything other than sqlfluff.core.errors.SQLTemplaterError: Failure in placeholder templating: 'label'. Have you configured your variables?
for more complex queries.
I see a few rules that hint that SQLFluff can find actual bugs, but it primarily seems to be about whitespace and capitalization. If it's really mostly about linting, I lean toward "it is not worth the cost to me." I don't find the inconsistent capitalization distracting. If anything, I find the convention "when we work with SQL now we pretend it's 1980 and use CAPS TO SHOUT KEYWORDS" distracting. Admittedly, this last point is more of a rule argument rather than an argument against SQLFluff. I am also curious if any real bugs have been found with SQLFluff. If you can provide an example or two of bugs prevented, I think that would lean me toward using SQLFluff.
Taking a step back, I detect two things in this PR.
- Distracting inconsistencies in the SQL code.
- A tool that can reduce (1) now and long-term.
With regard to (1), while I am personally not as distracted with the inconsistencies in the SQL code I acknowledge that it can distract people and I am willing to accept some rules around formatting if it helps. But I'm not convinced (yet) that SQLFluff is the means to that end. At least I'm not convinced enough to be the one to enthusiastically integrate SQLFluff. If someone else does the work and is willing to provide SQLFluff support when issues come up, I have no objection. Carry on. Likewise if any other member of the team wants it, I also don't object to using it.
Thanks for reviewing, @bickelj!
I believe the answer is yes. Any time we can avoid arguing about lintable errors in pull requests is worthwhile; we've already spent some amount of time on that, which is what prompted this PR.
I believe the primary purpose of linters is not to find or prevent bugs directly, but to make bugs more easily found. Inconsistent formatting is a distraction, and standardizing the way we write code - any code - makes it easier for a human reviewer to notice problems. Anything we can do to help our pattern-matching brains better see both patterns and exceptions is a force multiplier. We've since agreed to use a consensus process (which I still need to write up). Under that framework, I'll interpret your comments as having no blocking concerns, @bickelj. I'll put getting this into shape on my agenda, although it might be a bit before it's ready, as it's not (yet) the highest priority. I do want to get it in before we start adding any major new SQL code! |
@jasonaowen Yes, my interpretation is we agreed to "aim for consensus", and I am willing to use SQLFluff if someone wants to add it. |
Just to add my voice: I really dislike adding a python dependency to this project -- and the upkeep tasks / more complex dev setup that demands -- but not enough to block, because I totally grok the value of linting, and I recognize there simply isn't a node native solution (grumble grumble) |
We discussed this further today, and @slifty suggested the possibility of using SQLFluff via docker rather than a locally managed Python installation. |
One of the great advantages to using tinypg and postgres-migrations is the ability to separate our SQL queries and migrations into SQL files. We've already been benefiting from the editor support that approach gains us! Another benefit is the ability to use static analysis to improve our SQL coding practices. Start using SQLFluff[1], a SQL linter that works with PostgreSQL-flavored SQL. WIP because it still needs: - dev docs - CI - configuration - to fix existing violations - to ignore violations in existing migrations, since those can't be changed - team consent [1] https://docs.sqlfluff.com/en/stable/index.html
"object" is a non-reserved keyword in SQL, and does not need to be quoted in this context. Remove the unneccessary quotes as part of preparing to add linting to SQL. SQL distinguishes between reserved and non-reserved key words. According to the standard, reserved key words are the only real key words; they are never allowed as identifiers. Non-reserved key words only have a special meaning in particular contexts and can be used as identifiers in other contexts. https://www.postgresql.org/docs/current/sql-keywords-appendix.html https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-RF06
There should be exactly one newline at the end of a text file. Fix the SQL queries that had an extra newline. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-layout.end_of_file
Because it is not strictly necessary for query files loaded by TinyPG, we have a mix of trailing semicolons and no semicolons. Settle on requiring terminating semicolons, in order to be consistent both with the initialization SQL (which must have semicolons) and with TypeScript. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-convention.terminator
We have been using both forms of selecting which rows to return. SQLFluff is currently unable to parse the `OFFSET..FETCH` syntax, so convert them to the PostgreSQL convention of `LIMIT..OFFSET`.
Be consistent in making SQL keywords be all uppercase. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-capitalisation.keywords
It turns out that `EXCLUDED` is not a keyword, and therefore should not be capitalized. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-capitalisation.identifiers
Use the `FROM <table> AS <alias>` form consistently. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-aliasing.table
It is not necessary to qualify column names in single table queries. SQLFluff is happy if either all of the references are qualified, or none of them, but not some mix. Update the single query in violation of this rule to not use qualifiers. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-references.consistent`
When writing the `ON` clause of a `JOIN`, it is clearer to refer to earlier / leftmost tables first. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-structure.join_condition_order
SQLFluff wants for function names to be either consistently uppercase or lowercase, including both standard functions and user defined functions. Lowercase the few violations of that rule. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-capitalisation.function
Also move the comment above the `UPDATE` statement, so that it is more obviously indented correctly. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-layout.long_lines
There should be no space between a function name and the opening parenthesis. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-layout.functions
There should be no spaces after the opening parenthesis or before the closing parenthesis. https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-layout.spacing
One of the great advantages to using tinypg and postgres-migrations is the ability to separate our SQL queries and migrations into SQL files. We've already been benefiting from the editor support that approach gains us!
Another benefit is the ability to use static analysis to improve our SQL coding practices. I propose that we should start using SQLFluff, a SQL linter that works with PostgreSQL-flavored SQL. This is the best tool I've found to do so.
In the spirit of recent conversations, I wanted to open this as a draft PR to make sure the dev team is aligned on this. There's still work to be done to make this ready to merge, and I won't do that unless we agree that it's a good idea!
I don't think SQL is the strongest language of anyone on this team, so I think we can all learn from the rules this linter offers - I know in the past linters have been one of the main ways I've leveled up in other languages. In addition, I find it's always useful to have some automated enforcement of style guide stuff, such as capitalization, indenting, and so on; those are distracting during code reviews, and robots can find - and fix - such problems better than humans.
However, there are a few drawbacks to using SQLFluff. I think they're worthwhile tradeoffs, but I wanted to surface them up front:
:parameter
in the SQLFluff configuration; there are a few demonstrated in the initial.sqlfluff
configuration file, but we'd have to add the rest, and continue to add more as we add queriesOFFSET / FETCH
, so we'd have to go back to the PostgreSQL-flavoredLIMIT / OFFSET
You can play with this by following the getting started instructions. I have the start of a configuration file included in this PR, so check out the branch and try running it:
sqlfluff lint src/database/queries/proposals/selectWithPagination.sql
or so.If this feels like a worthwhile addition, here's what still needs to be done to be ready to merge: