Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: migrate wdl-gauntlet to the new parser implementation. #76

Merged
merged 6 commits into from
Jun 13, 2024

Conversation

peterhuene
Copy link
Collaborator

This commit removes the use of the existing parser implementation in favor of the new implementation in wdl-gauntlet.

As a result, Arena.toml and Gauntlet.toml have been refreshed; in the case of Arena.toml, there's a ~10X increase in logged diagnostics due to the new parser implementation successfully parsing files that the existing parser implementation incorrectly parsed with errors.

Also added some wall-clock timing for the time spent analyzing a source file and an aggregate duration displayed at the end of gauntlet's output. From the before and after comparisons, the new parser implementation appears to be at least 20X faster in analyzing files, likely due to using logos for lexing and the new parser implementation does not backtrack like pest does.

Also added configuration for ignoring specific files so that task-templates.wdl can be excluded as it isn't expected to be valid WDL (it has non-WDL placeholders in it to serve as a template).

Once this merges and we're in consensus, removal of the existing parser implementation can begin.

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • You have added yourself or the appropriate individual as the assignee.
  • You have added at least one relevant code reviewer to the PR.
  • Your code builds clean without any errors or warnings.
  • You have added tests (when appropriate).
  • You have updated the README or other documentation to account for these
    changes (when appropriate).
  • You have added an entry to the relevant CHANGELOG.md (see
    "keep a changelog" for more information).
  • Your commit messages follow the conventional commit style.

This commit removes the use of the existing parser implementation in favor of
the new implementation in `wdl-gauntlet`.

As a result, `Arena.toml` and `Gauntlet.toml` have been refreshed; in the case
of `Arena.toml`, there's a ~10X increase in logged diagnostics due to the new
parser implementation successfully parsing files that the existing parser
implementation incorrectly parsed with errors.

Also added some wall-clock timing for the time spent analyzing a source file
and an aggregate duration displayed at the end of gauntlet's output. From the
before and after comparisons, the new parser implementation appears to be at
least 20X faster in analyzing files, likely due to using `logos` for lexing and
the new parser implementation does not backtrack like `pest` does.

Also added configuration for ignoring specific files so that
`task-templates.wdl` can be excluded as it isn't expected to be valid WDL (it
has non-WDL placeholders in it to serve as a template).

Once this merges and we're in consensus, removal of the existing parser
implementation can begin.
@peterhuene peterhuene self-assigned this Jun 11, 2024
Gauntlet.toml Outdated Show resolved Hide resolved
Arena.toml Show resolved Hide resolved
The `ENCODE-DCC/chip-seq-pipeline2` repo has too many lint rule violations to
be useful to track in `Arena.toml`.

This also skips over logging diagnostics for files that fail to parse when
arena mode is turned on; this removes the duplicated diagnostics from
`Arena.toml` and `Gauntlet.toml`.
Arena.toml Outdated Show resolved Hide resolved
Arena.toml Show resolved Hide resolved
Arena.toml Outdated Show resolved Hide resolved
This commit removes the suffix `[rule: <name>]` from lint diagnostics in favor
of just using the "code" in the codespan diagnostic, which will instead use:

```
<severity>[<code>]: message
```
@peterhuene peterhuene requested a review from adthrasher June 12, 2024 20:30
@a-frantz
Copy link
Member

Was kicking tires. Noticed a regression in the PreambleComments rule. Alerting repo was aws-samples/amazon-omics-tutorials. They use our concept of preamble comments but after the version statement. This results in one diagnostic per line of a long block. One of the changes included in #61 was to report the entire span of problematic comments as one concern (as it was in the old parser).

Can we update the PreambleComments rule to report a multiline span? That should cut down the number of diagnostics coming out of that AWS repo, making them a candidate to add to Arena.toml (maybe, I haven't done a wc -l to see how many new ones they would add)

@a-frantz
Copy link
Member

Another very similar case found in this files preamble: https://github.com/biowdl/tasks/blob/develop/deconstructsigs.wdl

We should similarly be reporting a multiline span instead of one per line

@peterhuene
Copy link
Collaborator Author

@a-frantz good catch. That should be easily fixed. Do you want me to include the fix in this PR?

@a-frantz
Copy link
Member

Sure, if you say it's an easy one!

@peterhuene
Copy link
Collaborator Author

Question:

# BAD
# BAD



# BAD

Is that one diagnostic about the incorrect comments or two? Put another way, are the comments consecutive if they have only a single newline in separating whitespace or any number of blank lines inbetween?

@a-frantz
Copy link
Member

Question:

# BAD
# BAD



# BAD

Is that one diagnostic about the incorrect comments or two? Put another way, are the comments consecutive if they have only a single newline in separating whitespace or any number of blank lines inbetween?

IMO one per "chunk" of trivia. Does that make sense?

@a-frantz
Copy link
Member

Last comment I'll make 😅
Can we add getwilds/ww-vc-trio and getwilds/ww-star-deseq2 to Arena? Both are very small repos and add a limited number of diagnostics. Gives us a little more to work with without being overwhelming. And ww-vs-trio has one of the preamble cases I pointed out, so a good test case.

This fixes the `PreambleComments` rule to emit a single diagnostic for any
number of consecutive (excluding whitespace) comments.
@peterhuene
Copy link
Collaborator Author

@a-frantz pushed up both changes.

Copy link
Member

@a-frantz a-frantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@peterhuene peterhuene merged commit 3555a5e into stjude-rust-labs:main Jun 13, 2024
7 checks passed
@peterhuene peterhuene deleted the gauntlet branch June 13, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants