Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding eugene lint command #37

Closed
kaaveland opened this issue May 9, 2024 · 7 comments
Closed

Consider adding eugene lint command #37

kaaveland opened this issue May 9, 2024 · 7 comments

Comments

@kaaveland
Copy link
Owner

kaaveland commented May 9, 2024

With #36 we're introducing a dependency to pg_query.rs, which gives us the full postgres SQL parser. It would be interesting to see if we can write some lint rules that do not depend on running the postgres server for some of the hints we currently provide. Here's a list of things we may be able to warn about using the parse tree from pg_query:

  • Adding constraints that aren't NOT VALID:
  • Altering a column to NOT NULL
  • Creating indexes without CONCURRENTLY
  • Type changes
  • Introducing a JSON column

It may also be that we can do some manual inspection of the parse tree to figure out that some statements take dangerous locks, then warn about that. However, many of these will give us false positives that we can't statically verify now, so we probably want to support some sort of comment syntax to let eugene ignore the statement, eg.

-- We already have a check (col is not null)
-- eugene: ignore alter_to_not_null
alter table documents set col not null;
@kaaveland
Copy link
Owner Author

kaaveland commented May 9, 2024

I think we can probably do all of the hints we have right now, if we accept some false positives and we make some of the hints optional (eg. we can look for SET [LOCAL] lock_timout, we can figure out that ALTER TABLE takes AccessExclusiveLock, that CREATE INDEX takes ShareLock, and that some statements ran after and we can let the user ignore f. ex the lock timeout thing via parameters or configuration. I think it's worth taking a stab at this. It would make it really easy to run the hints in a CI/CD pipeline, or a commit-hook, which may make eugene much easier to start using.

@kaaveland
Copy link
Owner Author

Been taking a stab at working with the parse trees from pg_query, and I think it's very clunky. The grammar is huge and because of the structure, there's so much if let going on. I think the best way forward here is to translate the parse tree to a smaller grammar that contains only the things we care about, then make rules for that.

@kaaveland
Copy link
Owner Author

Here's a small example for accessing the type of an alter table ... alter column .. set type kind of statement, which is incomplete:

fn alters_table_lints(node: &pg_query::NodeRef) -> Vec<String> {
    if let pg_query::NodeRef::AlterTableStmt(stmt) = node {
        let next = &stmt.cmds;
        next.iter()
            .filter_map(|child| {
                let node_ref = child.node.as_ref()?;
                is_alters_type_cmd(&node_ref.to_ref())
            })
            .collect()
    } else {
        Vec::new()
    }
}

fn is_alters_type_cmd(node: &pg_query::NodeRef) -> Option<String> {
    if let pg_query::NodeRef::AlterTableCmd(cmd) = node {
        let def_node = cmd.def.as_ref()?.node.as_ref()?;
        if let pg_query::NodeRef::ColumnDef(def) = def_node.to_ref() {
            if def.type_name.is_some() {
                Some(cmd.name.to_string())
            } else { None }
        } else {
            None
        }
    } else {
        None
    }
}

@kaaveland
Copy link
Owner Author

This is huge, and perhaps not the right place to start: https://www.postgresql.org/docs/current/sql-altertable.html

I have a 130 line match now, and only barely touching the surface of alter table ... with column focus.

@kaaveland
Copy link
Owner Author

Like, what goes on the end of set default or add constraint check can be anything and it's really a lot of work to do anything like what eugene trace can do here.

@kaaveland
Copy link
Owner Author

kaaveland commented May 11, 2024

The mvp here is to:

  • Discover from context which objects in the script that are visible to other transactions
  • Emit a lint warning if we lock those without lock timeout
  • The nonconcurrent create index lint
  • Setting a column to not null
  • Changing the type of a column
  • Adding a json column

@kaaveland kaaveland mentioned this issue May 12, 2024
@kaaveland
Copy link
Owner Author

Currently looks feasible to add useful stuff here, although it will be more prone to false positives and there's stuff that we're bound to miss that eugene trace can pick up. Because of the difference in information that we receive here, I'm not sure we should be reusing the Hint struct and the associated information, although I think it is probably a good idea to reuse the identifiers and names/headlines of the lints/hints as much as we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant