feat: migrate `wdl-gauntlet` to the new parser implementation. #76

peterhuene · 2024-06-11T23:50:09Z

This commit removes the use of the existing parser implementation in favor of the new implementation in wdl-gauntlet.

As a result, Arena.toml and Gauntlet.toml have been refreshed; in the case of Arena.toml, there's a ~10X increase in logged diagnostics due to the new parser implementation successfully parsing files that the existing parser implementation incorrectly parsed with errors.

Also added some wall-clock timing for the time spent analyzing a source file and an aggregate duration displayed at the end of gauntlet's output. From the before and after comparisons, the new parser implementation appears to be at least 20X faster in analyzing files, likely due to using logos for lexing and the new parser implementation does not backtrack like pest does.

Also added configuration for ignoring specific files so that task-templates.wdl can be excluded as it isn't expected to be valid WDL (it has non-WDL placeholders in it to serve as a template).

Once this merges and we're in consensus, removal of the existing parser implementation can begin.

Before submitting this PR, please make sure:

You have added a few sentences describing the PR here.
You have added yourself or the appropriate individual as the assignee.
You have added at least one relevant code reviewer to the PR.
Your code builds clean without any errors or warnings.
You have added tests (when appropriate).
You have updated the README or other documentation to account for these
changes (when appropriate).
You have added an entry to the relevant CHANGELOG.md (see
"keep a changelog" for more information).
Your commit messages follow the conventional commit style.

This commit removes the use of the existing parser implementation in favor of the new implementation in `wdl-gauntlet`. As a result, `Arena.toml` and `Gauntlet.toml` have been refreshed; in the case of `Arena.toml`, there's a ~10X increase in logged diagnostics due to the new parser implementation successfully parsing files that the existing parser implementation incorrectly parsed with errors. Also added some wall-clock timing for the time spent analyzing a source file and an aggregate duration displayed at the end of gauntlet's output. From the before and after comparisons, the new parser implementation appears to be at least 20X faster in analyzing files, likely due to using `logos` for lexing and the new parser implementation does not backtrack like `pest` does. Also added configuration for ignoring specific files so that `task-templates.wdl` can be excluded as it isn't expected to be valid WDL (it has non-WDL placeholders in it to serve as a template). Once this merges and we're in consensus, removal of the existing parser implementation can begin.

Gauntlet.toml

Arena.toml

The `ENCODE-DCC/chip-seq-pipeline2` repo has too many lint rule violations to be useful to track in `Arena.toml`. This also skips over logging diagnostics for files that fail to parse when arena mode is turned on; this removes the duplicated diagnostics from `Arena.toml` and `Gauntlet.toml`.

Arena.toml

…tlet.

Arena.toml

This commit removes the suffix `[rule: <name>]` from lint diagnostics in favor of just using the "code" in the codespan diagnostic, which will instead use: ``` <severity>[<code>]: message ```

a-frantz · 2024-06-12T21:04:51Z

Was kicking tires. Noticed a regression in the PreambleComments rule. Alerting repo was aws-samples/amazon-omics-tutorials. They use our concept of preamble comments but after the version statement. This results in one diagnostic per line of a long block. One of the changes included in #61 was to report the entire span of problematic comments as one concern (as it was in the old parser).

Can we update the PreambleComments rule to report a multiline span? That should cut down the number of diagnostics coming out of that AWS repo, making them a candidate to add to Arena.toml (maybe, I haven't done a wc -l to see how many new ones they would add)

a-frantz · 2024-06-12T21:14:11Z

Another very similar case found in this files preamble: https://github.com/biowdl/tasks/blob/develop/deconstructsigs.wdl

We should similarly be reporting a multiline span instead of one per line

peterhuene · 2024-06-12T21:26:19Z

@a-frantz good catch. That should be easily fixed. Do you want me to include the fix in this PR?

a-frantz · 2024-06-12T21:30:51Z

Sure, if you say it's an easy one!

peterhuene · 2024-06-12T21:36:32Z

Question:

# BAD
# BAD



# BAD

Is that one diagnostic about the incorrect comments or two? Put another way, are the comments consecutive if they have only a single newline in separating whitespace or any number of blank lines inbetween?

a-frantz · 2024-06-12T21:44:05Z

Question:
# BAD
# BAD



# BAD
Is that one diagnostic about the incorrect comments or two? Put another way, are the comments consecutive if they have only a single newline in separating whitespace or any number of blank lines inbetween?

IMO one per "chunk" of trivia. Does that make sense?

a-frantz · 2024-06-12T22:05:34Z

Last comment I'll make 😅
Can we add getwilds/ww-vc-trio and getwilds/ww-star-deseq2 to Arena? Both are very small repos and add a limited number of diagnostics. Gives us a little more to work with without being overwhelming. And ww-vs-trio has one of the preamble cases I pointed out, so a good test case.

This fixes the `PreambleComments` rule to emit a single diagnostic for any number of consecutive (excluding whitespace) comments.

… arena.

peterhuene · 2024-06-12T22:49:48Z

@a-frantz pushed up both changes.

a-frantz

Awesome work!

peterhuene requested review from adthrasher and a-frantz June 11, 2024 23:50

peterhuene self-assigned this Jun 11, 2024

a-frantz reviewed Jun 12, 2024

View reviewed changes

Gauntlet.toml Outdated Show resolved Hide resolved

a-frantz reviewed Jun 12, 2024

View reviewed changes

Arena.toml Show resolved Hide resolved

peterhuene mentioned this pull request Jun 12, 2024

New parser implementation should accept numbers for the default placeholder option. #77

Open

peterhuene requested a review from a-frantz June 12, 2024 17:47

peterhuene commented Jun 12, 2024

View reviewed changes

Arena.toml Outdated Show resolved Hide resolved

fix: remove redundant relative paths from diagnostic messages in gaun…

803bc43

…tlet.

adthrasher reviewed Jun 12, 2024

View reviewed changes

Arena.toml Show resolved Hide resolved

Arena.toml Outdated Show resolved Hide resolved

fix: use code in codespan diagnostic for rule name.

aef0fad

This commit removes the suffix `[rule: <name>]` from lint diagnostics in favor of just using the "code" in the codespan diagnostic, which will instead use: ``` <severity>[<code>]: message ```

peterhuene requested a review from adthrasher June 12, 2024 20:30

peterhuene added 2 commits June 12, 2024 15:46

fix: emit only one diagnostic for consecutive comment tokens.

5422302

This fixes the `PreambleComments` rule to emit a single diagnostic for any number of consecutive (excluding whitespace) comments.

chore: add getwilds/ww-vc-trio and getwilds/ww-star-deseq2 to the…

51cc206

… arena.

a-frantz approved these changes Jun 13, 2024

View reviewed changes

peterhuene merged commit 3555a5e into stjude-rust-labs:main Jun 13, 2024
7 checks passed

peterhuene deleted the gauntlet branch June 13, 2024 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: migrate `wdl-gauntlet` to the new parser implementation. #76

feat: migrate `wdl-gauntlet` to the new parser implementation. #76

peterhuene commented Jun 11, 2024

a-frantz commented Jun 12, 2024

a-frantz commented Jun 12, 2024

peterhuene commented Jun 12, 2024

a-frantz commented Jun 12, 2024

peterhuene commented Jun 12, 2024

a-frantz commented Jun 12, 2024

a-frantz commented Jun 12, 2024

peterhuene commented Jun 12, 2024

a-frantz left a comment

feat: migrate wdl-gauntlet to the new parser implementation. #76

feat: migrate wdl-gauntlet to the new parser implementation. #76

Conversation

peterhuene commented Jun 11, 2024

a-frantz commented Jun 12, 2024

a-frantz commented Jun 12, 2024

peterhuene commented Jun 12, 2024

a-frantz commented Jun 12, 2024

peterhuene commented Jun 12, 2024

a-frantz commented Jun 12, 2024

a-frantz commented Jun 12, 2024

peterhuene commented Jun 12, 2024

a-frantz left a comment

Choose a reason for hiding this comment

feat: migrate `wdl-gauntlet` to the new parser implementation. #76

feat: migrate `wdl-gauntlet` to the new parser implementation. #76