Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint SQL with SQLFluff #321

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from
Draft

Lint SQL with SQLFluff #321

wants to merge 26 commits into from

Conversation

jasonaowen
Copy link
Contributor

@jasonaowen jasonaowen commented May 1, 2023

One of the great advantages to using tinypg and postgres-migrations is the ability to separate our SQL queries and migrations into SQL files. We've already been benefiting from the editor support that approach gains us!

Another benefit is the ability to use static analysis to improve our SQL coding practices. I propose that we should start using SQLFluff, a SQL linter that works with PostgreSQL-flavored SQL. This is the best tool I've found to do so.

In the spirit of recent conversations, I wanted to open this as a draft PR to make sure the dev team is aligned on this. There's still work to be done to make this ready to merge, and I won't do that unless we agree that it's a good idea!

I don't think SQL is the strongest language of anyone on this team, so I think we can all learn from the rules this linter offers - I know in the past linters have been one of the main ways I've leveled up in other languages. In addition, I find it's always useful to have some automated enforcement of style guide stuff, such as capitalization, indenting, and so on; those are distracting during code reviews, and robots can find - and fix - such problems better than humans.

However, there are a few drawbacks to using SQLFluff. I think they're worthwhile tradeoffs, but I wanted to surface them up front:

  • we'll need to add placeholder values for each :parameter in the SQLFluff configuration; there are a few demonstrated in the initial .sqlfluff configuration file, but we'd have to add the rest, and continue to add more as we add queries
  • SQLFluff is written in Python; it's unfortunate to add a dependency from another language
  • not all the syntax we're using is supported; for example, it is not yet able to parse SQL2011 OFFSET / FETCH, so we'd have to go back to the PostgreSQL-flavored LIMIT / OFFSET

You can play with this by following the getting started instructions. I have the start of a configuration file included in this PR, so check out the branch and try running it: sqlfluff lint src/database/queries/proposals/selectWithPagination.sql or so.

If this feels like a worthwhile addition, here's what still needs to be done to be ready to merge:

  • write dev docs on how to install & run
  • run as part of CI
  • make sure we have it configured the way we want
    • add placeholder values for all the template parameters
    • argue about rules
  • fix existing violations
  • ignore violations in existing migrations, since those can't be changed

@codecov
Copy link

codecov bot commented May 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.34%. Comparing base (c25c1a0) to head (c3e6b0f).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #321      +/-   ##
==========================================
+ Coverage   87.33%   87.34%   +0.01%     
==========================================
  Files         185      185              
  Lines        2384     2387       +3     
  Branches      321      314       -7     
==========================================
+ Hits         2082     2085       +3     
- Misses        276      302      +26     
+ Partials       26        0      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@bickelj
Copy link
Contributor

bickelj commented Aug 24, 2023

@jasonaowen The following comments are prior to trying SQLFluff.

I am hesitant to introduce a Python dependency in a Node.js project, but I am open to it for several reasons:

  1. It appears to be a mature, actively-developed, and widely-used tool.
  2. The dependency is used at development and build time, not in a test or production environment.
  3. I'm OK with a little Python wrangling and I expect others on the team are too, assuming there is no good substitute in TypeScript/JavaScript.

My other hesitation is the net value to the project. Is SQLFluff going to make a big difference? I know it will improve the SQL scripts aesthetically, probably will prevent a couple bugs, but considering the cost of bringing in this external dependency, my thought prior to actually running SQLFluff is "it is probably not worth the hassle."

In general, though, static analysis is great. So I am not really against it. My next step is to try SQLFluff locally and see what it does.

@bickelj
Copy link
Contributor

bickelj commented Aug 24, 2023

Already I have a bad User Experience when following the https://docs.sqlfluff.com/en/stable/gettingstarted.html#gettingstartedref verbatim (I have Python 3.9.2 and pip 20.3.4 installed).

First, this seemed to work OK:

$ pip install sqlfluff
Collecting sqlfluff
  Downloading sqlfluff-2.3.0-py3-none-any.whl (701 kB)
     |████████████████████████████████| 701 kB 1.4 MB/s 
Collecting tblib
  Downloading tblib-2.0.0-py3-none-any.whl (11 kB)
Requirement already satisfied: Jinja2 in /usr/lib/python3/dist-packages (from sqlfluff) (2.11.3)
Collecting regex
  Downloading regex-2023.8.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (771 kB)
     |████████████████████████████████| 771 kB 75 kB/s 
Requirement already satisfied: typing-extensions in /home/user/.local/lib/python3.9/site-packages (from sqlfluff) (4.7.1)
Collecting click
  Downloading click-8.1.7-py3-none-any.whl (97 kB)
     |████████████████████████████████| 97 kB 252 kB/s 
Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3/dist-packages (from sqlfluff) (5.3.1)
Requirement already satisfied: tqdm in /home/user/.local/lib/python3.9/site-packages (from sqlfluff) (4.65.0)
Requirement already satisfied: pytest in /usr/lib/python3/dist-packages (from sqlfluff) (6.0.2)
Collecting pathspec
  Downloading pathspec-0.11.2-py3-none-any.whl (29 kB)
Requirement already satisfied: toml in /usr/lib/python3/dist-packages (from sqlfluff) (0.10.1)
Requirement already satisfied: appdirs in /usr/lib/python3/dist-packages (from sqlfluff) (1.4.4)
Requirement already satisfied: chardet in /usr/lib/python3/dist-packages (from sqlfluff) (4.0.0)
Collecting colorama>=0.3
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting diff-cover>=2.5.0
  Downloading diff_cover-7.7.0-py3-none-any.whl (51 kB)
     |████████████████████████████████| 51 kB 1.8 MB/s 
Collecting Pygments<3.0.0,>=2.9.0
  Downloading Pygments-2.16.1-py3-none-any.whl (1.2 MB)
     |████████████████████████████████| 1.2 MB 283 kB/s 
Collecting pluggy<2,>=0.13.1
  Using cached pluggy-1.2.0-py3-none-any.whl (17 kB)
Installing collected packages: Pygments, pluggy, tblib, regex, pathspec, diff-cover, colorama, click, sqlfluff
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyhanko 0.19.0 requires qrcode>=6.1, which is not installed.
Successfully installed Pygments-2.16.1 click-8.1.7 colorama-0.4.6 diff-cover-7.7.0 pathspec-0.11.2 pluggy-1.2.0 regex-2023.8.8 sqlfluff-2.3.0 tblib-2.0.0

I see the ERROR but pip also said it successfully installed sqlfluff so I tried it:

$ sqlfluff version
Traceback (most recent call last):
  File "/home/user/.local/bin/sqlfluff", line 5, in <module>
    from sqlfluff.cli.commands import cli
  File "/home/user/.local/lib/python3.9/site-packages/sqlfluff/__init__.py", line 6, in <module>
    from sqlfluff.api import lint, fix, parse, list_rules, list_dialects
  File "/home/user/.local/lib/python3.9/site-packages/sqlfluff/api/__init__.py", line 4, in <module>
    from sqlfluff.api.simple import lint, fix, parse, APIParsingError
  File "/home/user/.local/lib/python3.9/site-packages/sqlfluff/api/simple.py", line 4, in <module>
    from sqlfluff.core import (
  File "/home/user/.local/lib/python3.9/site-packages/sqlfluff/core/__init__.py", line 9, in <module>
    from sqlfluff.core.linter import Linter
  File "/home/user/.local/lib/python3.9/site-packages/sqlfluff/core/linter/__init__.py", line 3, in <module>
    from sqlfluff.core.linter.common import (
  File "/home/user/.local/lib/python3.9/site-packages/sqlfluff/core/linter/common.py", line 13, in <module>
    from sqlfluff.core.templaters import TemplatedFile
  File "/home/user/.local/lib/python3.9/site-packages/sqlfluff/core/templaters/__init__.py", line 8, in <module>
    from sqlfluff.core.templaters.jinja import JinjaTemplater
  File "/home/user/.local/lib/python3.9/site-packages/sqlfluff/core/templaters/jinja.py", line 10, in <module>
    import jinja2.nodes
  File "/usr/lib/python3/dist-packages/jinja2/__init__.py", line 12, in <module>
    from .environment import Environment
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 25, in <module>
    from .defaults import BLOCK_END_STRING
  File "/usr/lib/python3/dist-packages/jinja2/defaults.py", line 3, in <module>
    from .filters import FILTERS as DEFAULT_FILTERS  # noqa: F401
  File "/usr/lib/python3/dist-packages/jinja2/filters.py", line 13, in <module>
    from markupsafe import soft_unicode
ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/home/user/.local/lib/python3.9/site-packages/markupsafe/__init__.py)

I should be able resolve this in a few minutes, but the "out of the box" experience, especially for a non-Python developer, might leave something to be desired. Update: yes, searching online reveals that this is a common error, the workaround is pip install markupsafe==2.0.1 which after running lets this happen:

sqlfluff version
2.3.0

Copy link
Contributor

@bickelj bickelj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work:

$ sqlfluff lint src/database/queries/baseFields/selectAll.sql 
== [src/database/queries/baseFields/selectAll.sql] FAIL                                     
L:   4 | P:  14 | CP01 | Keywords must be consistently upper case.
                       | [capitalisation.keywords]
L:   5 | P:  13 | CP01 | Keywords must be consistently upper case.
                       | [capitalisation.keywords]
L:   6 | P:  14 | CP01 | Keywords must be consistently upper case.
                       | [capitalisation.keywords]
All Finished 📜 🎉!
$ cat src/database/queries/baseFields/selectAll.sql 
SELECT
  id,
  label,
  short_code as "shortCode",
  data_type as "dataType",
  created_at as "createdAt"
FROM base_fields;

There is work to be done to replace labels (as noted in the original post) and that would need to happen before I can get anything other than sqlfluff.core.errors.SQLTemplaterError: Failure in placeholder templating: 'label'. Have you configured your variables? for more complex queries.

I see a few rules that hint that SQLFluff can find actual bugs, but it primarily seems to be about whitespace and capitalization. If it's really mostly about linting, I lean toward "it is not worth the cost to me." I don't find the inconsistent capitalization distracting. If anything, I find the convention "when we work with SQL now we pretend it's 1980 and use CAPS TO SHOUT KEYWORDS" distracting. Admittedly, this last point is more of a rule argument rather than an argument against SQLFluff. I am also curious if any real bugs have been found with SQLFluff. If you can provide an example or two of bugs prevented, I think that would lean me toward using SQLFluff.

Taking a step back, I detect two things in this PR.

  1. Distracting inconsistencies in the SQL code.
  2. A tool that can reduce (1) now and long-term.

With regard to (1), while I am personally not as distracted with the inconsistencies in the SQL code I acknowledge that it can distract people and I am willing to accept some rules around formatting if it helps. But I'm not convinced (yet) that SQLFluff is the means to that end. At least I'm not convinced enough to be the one to enthusiastically integrate SQLFluff. If someone else does the work and is willing to provide SQLFluff support when issues come up, I have no objection. Carry on. Likewise if any other member of the team wants it, I also don't object to using it.

@jasonaowen
Copy link
Contributor Author

Thanks for reviewing, @bickelj!

Is SQLFluff going to make a big difference?

I believe the answer is yes. Any time we can avoid arguing about lintable errors in pull requests is worthwhile; we've already spent some amount of time on that, which is what prompted this PR.

I am also curious if any real bugs have been found with SQLFluff.

I believe the primary purpose of linters is not to find or prevent bugs directly, but to make bugs more easily found. Inconsistent formatting is a distraction, and standardizing the way we write code - any code - makes it easier for a human reviewer to notice problems. Anything we can do to help our pattern-matching brains better see both patterns and exceptions is a force multiplier.

We've since agreed to use a consensus process (which I still need to write up). Under that framework, I'll interpret your comments as having no blocking concerns, @bickelj.

I'll put getting this into shape on my agenda, although it might be a bit before it's ready, as it's not (yet) the highest priority. I do want to get it in before we start adding any major new SQL code!

@bickelj
Copy link
Contributor

bickelj commented Sep 12, 2023

@jasonaowen Yes, my interpretation is we agreed to "aim for consensus", and I am willing to use SQLFluff if someone wants to add it.

@slifty
Copy link
Member

slifty commented Sep 12, 2023

Just to add my voice: I really dislike adding a python dependency to this project -- and the upkeep tasks / more complex dev setup that demands -- but not enough to block, because I totally grok the value of linting, and I recognize there simply isn't a node native solution (grumble grumble)

@jasonaowen
Copy link
Contributor Author

We discussed this further today, and @slifty suggested the possibility of using SQLFluff via docker rather than a locally managed Python installation.

jasonaowen and others added 19 commits November 22, 2024 14:38
One of the great advantages to using tinypg and postgres-migrations is
the ability to separate our SQL queries and migrations into SQL files.
We've already been benefiting from the editor support that approach
gains us!

Another benefit is the ability to use static analysis to improve our SQL
coding practices. Start using SQLFluff[1], a SQL linter that works with
PostgreSQL-flavored SQL.

WIP because it still needs:
- dev docs
- CI
- configuration
- to fix existing violations
- to ignore violations in existing migrations, since those can't be
  changed
- team consent

[1] https://docs.sqlfluff.com/en/stable/index.html
"object" is a non-reserved keyword in SQL, and does not need to be
quoted in this context. Remove the unneccessary quotes as part of
preparing to add linting to SQL.

    SQL distinguishes between reserved and non-reserved key words.
    According to the standard, reserved key words are the only real key
    words; they are never allowed as identifiers. Non-reserved key words
    only have a special meaning in particular contexts and can be used
    as identifiers in other contexts.

https://www.postgresql.org/docs/current/sql-keywords-appendix.html
https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-RF06
There should be exactly one newline at the end of a text file.
Fix the SQL queries that had an extra newline.

https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-layout.end_of_file
Because it is not strictly necessary for query files loaded by TinyPG,
we have a mix of trailing semicolons and no semicolons. Settle on
requiring terminating semicolons, in order to be consistent both with
the initialization SQL (which must have semicolons) and with TypeScript.

https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-convention.terminator
We have been using both forms of selecting which rows to return.
SQLFluff is currently unable to parse the `OFFSET..FETCH` syntax, so
convert them to the PostgreSQL convention of `LIMIT..OFFSET`.
It turns out that `EXCLUDED` is not a keyword, and therefore should not
be capitalized.

https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-capitalisation.identifiers
It is not necessary to qualify column names in single table queries.
SQLFluff is happy if either all of the references are qualified, or none
of them, but not some mix. Update the single query in violation of this
rule to not use qualifiers.

https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-references.consistent`
When writing the `ON` clause of a `JOIN`, it is clearer to refer to
earlier / leftmost tables first.

https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-structure.join_condition_order
SQLFluff wants for function names to be either consistently uppercase or
lowercase, including both standard functions and user defined functions.
Lowercase the few violations of that rule.

https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-capitalisation.function
Also move the comment above the `UPDATE` statement, so that it is more
obviously indented correctly.

https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-layout.long_lines
There should be no space between a function name and the opening
parenthesis.

https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-layout.functions
There should be no spaces after the opening parenthesis or before the
closing parenthesis.

https://docs.sqlfluff.com/en/stable/reference/rules.html#rule-layout.spacing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants