Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(test): Break up integration tests per language #859

Merged
merged 10 commits into from
Feb 11, 2025

Conversation

mkatychev
Copy link
Contributor

@mkatychev mkatychev commented Jan 30, 2025

chore(test): Break up integration tests per language

Related Issues: #519, #539

Description

  • add a unit test per language in the format mod test_fmt and mod test_coverage test modules with a macro generating the test functions
    lang_test!(
    "bash",
    "css",
    "json",
    "nickel",
    "ocaml",
    "ocaml_interface",
    "ocamllex",
    "openscad",
    "rust",
    "toml",
    "tree_sitter_query",
    fmt_input
    );
  • add openscad to coverage tests and modified openscad test to have 100% coverage

Checklist

Checklist before merging:

  • CHANGELOG.md updated
  • README.md up-to-date

@mkatychev
Copy link
Contributor Author

While trying to solve some failing formatting tests for #845 I noticed some unusual slowness from the test cases. Once tests were broken up, it seems that the OCaml tests perform significantly worse than other languages and #519 may still be a problem (ocaml iface test is commented out for this PR, so ignore test failure):

$ cargo nextest run
   Compiling topiary-cli v0.6.0 (/Users/mkatychev/Documents/rust/topiary/topiary-cli)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.90s
    Starting 32 tests across 3 binaries (run ID: d0702afd-a92f-45ff-b1cf-1e3266a708c9, nextest profile: default)
        PASS [   0.618s] topiary-cli::cli-tester test_cfg
        PASS [   0.805s] topiary-cli::cli-tester test_fmt_stdin
        PASS [   0.815s] topiary-cli::cli-tester test_fmt_invalid
        PASS [   0.820s] topiary-cli::cli-tester test_fmt_dir
        PASS [   0.851s] topiary-cli::cli-tester test_fmt_stdin_query_fallback
        PASS [   0.856s] topiary-cli::cli-tester test_fmt_stdin_query
        PASS [   0.908s] topiary-cli::cli-tester test_fmt_files_query_fallback
        PASS [   0.921s] topiary-cli::cli-tester test_fmt_files
        PASS [   0.316s] topiary-cli::cli-tester test_vis
        PASS [   0.109s] topiary-cli::sample-tester test_coverage::coverage_input_json
        PASS [   0.308s] topiary-cli::cli-tester test_vis_invalid
        PASS [   0.179s] topiary-cli::sample-tester test_coverage::coverage_input_ocamllex
        PASS [   1.112s] topiary-cli::sample-tester test_coverage::coverage_input_toml
        PASS [   2.171s] topiary-cli::sample-tester test_coverage::coverage_input_css
        PASS [   0.986s] topiary-cli::sample-tester test_coverage::coverage_input_tree_sitter_query
        PASS [   2.153s] topiary-cli::sample-tester test_coverage::coverage_input_openscad
        PASS [   0.129s] topiary-cli::sample-tester test_fmt::fmt_input_json
        PASS [   2.593s] topiary-cli::sample-tester formatted_query_tester
        PASS [   0.619s] topiary-cli::sample-tester test_fmt::fmt_input_bash
        PASS [   1.022s] topiary-cli::sample-tester test_fmt::fmt_input_css
        PASS [   0.103s] topiary-cli::sample-tester test_fmt::fmt_input_ocamllex
        PASS [   0.197s] topiary-cli::sample-tester test_fmt::fmt_input_openscad
        PASS [   0.135s] topiary-cli::sample-tester test_fmt::fmt_input_toml
        PASS [   1.356s] topiary-cli::sample-tester test_fmt::fmt_input_nickel
        PASS [   0.355s] topiary-cli::sample-tester test_fmt::fmt_input_tree_sitter_query
        PASS [   2.349s] topiary-cli::sample-tester test_fmt::fmt_input_ocaml
        PASS [   2.314s] topiary-cli::sample-tester test_fmt::fmt_queries
        PASS [   5.225s] topiary-cli::sample-tester test_fmt::fmt_input_ocaml_interface
        PASS [   8.159s] topiary-cli::sample-tester test_coverage::coverage_input_nickel
        PASS [  26.421s] topiary-cli::sample-tester test_coverage::coverage_input_bash
        SLOW [> 60.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml
        SLOW [> 60.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface
        SLOW [>120.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml
        SLOW [>120.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface
        PASS [ 147.559s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml
        SLOW [>180.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface
        SLOW [>240.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface
        FAIL [ 261.056s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface

--- STDOUT:              topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface ---

running 1 test
test test_coverage::coverage_input_ocaml_interface has been running for over 60 seconds
test test_coverage::coverage_input_ocaml_interface ... FAILED

failures:

failures:
    test_coverage::coverage_input_ocaml_interface

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 21 filtered out; finished in 261.04s

The test_coverage::coverage_input_ocaml_interface test, if uncommented, fails coverage but runs very slowly:

$ time topiary coverage -v tests/samples/input/ocaml_interface.mli
Query coverage: 32.89%

@mkatychev mkatychev force-pushed the chore/break-up-lang-tests branch from 3311e6b to 283e501 Compare February 2, 2025 18:40
@mkatychev
Copy link
Contributor Author

Going off of the tree-sitter tips for performance, the tree-sitter-ocaml base grammar has about ten times the underperformant example's state counts:

On 0.20.5, we're at 2243 for STATE_COUNT and 666 for LARGE_STATE_COUNT

with the ts-ocaml state cont being about 10x and large state count a 5x:

$ cat src/parser.c | rg "#define.*STATE"
#define STATE_COUNT 22540
#define LARGE_STATE_COUNT 3324

with operators and expressions having a rather large state count:

$ tree-sitter generate --report-states-for-rule -
parenthesized_operator                  1633
infix_expression                        1296
object_expression                       1196
product_expression                      932

@Xophmeister
Copy link
Member

I've had a quick look through this and, in principle, I really like the idea of using a macro to abstract this 👍

Before reviewing this more thoroughly, I have some initial thoughts:

  1. AIUI, your macro should give almost identical code to the original; at least semantically. The test functions will be hoisted to the module level, for each feature, but they're not doing anything that the tests weren't previously doing. As such, why does it run slower? Or, maybe your point is that it's not running slower overall, but now we can see that OCaml is running particularly slowly (whereas before, it was still running slowly, but we were blind to this)? Could you please clarify.
  2. The Windows build is failing for some reason.

@mkatychev
Copy link
Contributor Author

mkatychev commented Feb 3, 2025

that it's not running slower overall, but now we can see that OCaml is running particularly slowly

@Xophmeister that's my conclusion, if you see the post above, ocaml AST state count is about 10x that of the rust repo. I think the definitions in the ocaml grammar.js is much more complex than any other languages around. I played around with the repo and only managed to slim the definitions by a fraction:

$ cat ocaml/grammars/ocaml/src/parser.c | rg "#define.*STATE"
#define STATE_COUNT 19687
#define LARGE_STATE_COUNT 3128

Here's one of the recent main commit test durations (228s):
https://github.com/tweag/topiary/actions/runs/13117520812/job/36595334995#step:8:835
V.S. one from this PR (202s):
https://github.com/tweag/topiary/actions/runs/13101475641/job/36550251190#step:8:761

The Windows build is failing for some reason.

Yes that's very odd, not sure what to make of it, seems to be some sort of linker error. Hopefully it wasn't cargo update that modified the Cargo.lock.

EDIT: I wonder if it has something to do with the lazy fetching of the dylib grammar since there may be multiple topiary calls going on at once, that would explain the ephemeral ocaml test failures (they sometimes pass).

@mkatychev
Copy link
Contributor Author

mkatychev commented Feb 3, 2025

Continued from #859 (comment):
I'm not familiar with Ocaml as a language but I suspect there are many places for optimization that can be done, part of the problem (I suspect) is that the *_expression nodes in the Ocaml repo are encountering a tricky O(n^2) complexity.

A good illustration of this is to call tree-sitter generate --report-states-for-rule infix_expression. it's a bit hard to see what the smoking gun is at the moment but one of my guesses is that the identifier aliases are overly permuted/separated: https://github.com/tree-sitter/tree-sitter-ocaml/blob/57644edfbba0edb38ac17dba2add4c243fa3539b/grammars/ocaml/grammar.js#L1926-L1947 this is in contrast to tree-sitter-rust where the identifier node is heavily used/reused.

@Xophmeister
Copy link
Member

Thanks for this analysis 🙏 Bear in mind that we are currently pinned to a version of tree-sitter-ocaml from about a year ago; these accidental complexity problems may have been (partially) resolved in the meantime. (Or, when you were performing this analysis, did you pull the current HEAD?) (cc @314eter)

It's also worth pointing out that Topiary largely is a consumer of the grammar. If we can patch this then we of course should, but the Topiary team doesn't have that kind of capacity at the moment. Of course, just to reiterate, that does not in any way diminish the usefulness of your analysis.

@Xophmeister
Copy link
Member

Xophmeister commented Feb 4, 2025

The Windows build is failing for some reason.

Yes that's very odd, not sure what to make of it, seems to be some sort of linker error. Hopefully it wasn't cargo update that modified the Cargo.lock.

EDIT: I wonder if it has something to do with the lazy fetching of the dylib grammar since there may be multiple topiary calls going on at once, that would explain the ephemeral ocaml test failures (they sometimes pass).

I don't see anything in the changes to Cargo.{toml,lock} that are suggestive of changing the grammar build tooling. As such, I think your race condition theory is a good one. To test this out, maybe it would be worth running topiary prefetch before the test suite in the build-windows job.

I don't know much anything about how Windows temporary files persist on GitHub runners, but perhaps it would be as easy as adding cargo run --all-features -- prefetch1 as a step in the appropriate place in .github/workflows/ci.yml. It's worth a try, I think 🙂

Footnotes

  1. The parallel feature, which is enabled by default and --all-features shouldn't matter, because the theorised race is when building the same grammar near-simultaneously, rather than all grammars once but in parallel.

CHANGELOG.md Outdated Show resolved Hide resolved
@mkatychev
Copy link
Contributor Author

Going off @Xophmeister's suggestion of prefetching the grammars, seems the CI is now failing at that step, not sure how my changes could have impacted topiary prefetch behaviour 🤔.

@mkatychev
Copy link
Contributor Author

mkatychev commented Feb 4, 2025

It's also worth pointing out that Topiary largely is a consumer of the grammar. If we can patch this then we of course should, but the Topiary team doesn't have that kind of capacity at the moment. Of course, just to reiterate, that does not in any way diminish the usefulness of your analysis.

My takeaway from the ocaml performance is that as more grammars are added to the repo, whatever grammars topiary leverages means that their performance issues carry over on top of actual topiary specific logic.

For new languages, I don't have any immediate solutions to this issue other than restricting additions based on an acceptable limit in duration of coverate/perf tests.

In ocaml's case the recourse is limited to raising/fixing the issues upstream, dropping support (I'm not advocating for this), or downgrading it to a community grammar (so it's not in the hot path of testing).

Long term perhaps moving community topiary-queries into a separate repo/CI flow that always uses a precomplied topiary binary would help the long feedback loop when there's no rust code changes.

@Xophmeister
Copy link
Member

Going off @Xophmeister's suggestion of prefetching the grammars, seems the CI is now failing at that step, not sure how my changes could have impacted topiary prefetch behaviour 🤔.

So this is a particularly weird error:

C:\Users\RUNNER~1\AppData\Local\Temp\.tmp2VQMTF\toml\src\parser.c : fatal error C1083: Cannot open compiler generated file: 'C:\Users\RUNNER~1\AppData\Local\Temp\.tmp2VQMTF\rust\parser.obj': Permission denied

Building the TOML grammar failed because it's trying to link with the Rust build artefact, which has already been deleted! 🤷

I agree that I cannot see how your changes are causing this problem. Yet, other post-v0.6.0 PRs are not suffering from it. While I don't think it's your responsibility to fix, I'm reluctant to merge this PR until this Windows problem is resolved.

That being said, I'm now strongly of the opinion that a race condition is the problem. Indeed, @nbacquey put in this code in cli-tester.rs to serialise dynamic grammar building in tests:

// We need to prefetch JSON and TOML grammars before running the tests, on pain of race condition:
// If multiple calls to Topiary are made in parallel and the grammar is missing, they will all try
// to fetch and build it, thus creating an empty .so file while g++ is running. If another instance
// of topiary starts at this moment, it will mistake the empty .so file for an already built grammar,
// and try to run with it, resulting in an error. See https://github.com/tweag/topiary/issues/767
static INIT: Once = Once::new();
pub fn initialize() {
INIT.call_once(|| {
#[cfg(feature = "json")]
Command::cargo_bin("topiary")
.expect("Unable to build Topiary")
.arg("fmt")
.arg("--language")
.arg("json")
.write_stdin("")
.assert()
.success();
#[cfg(feature = "toml")]
Command::cargo_bin("topiary")
.expect("Unable to build Topiary")
.arg("fmt")
.arg("--language")
.arg("toml")
.write_stdin("")
.assert()
.success();
});
}

Could you revert the CI change I suggested and try implementing something similar in your sample testing macros? (With my apologies for the wild goose chase!)

(Meanwhile, I will try to look into a potential problem with parallel builds on Windows.)


It's also worth pointing out that Topiary largely is a consumer of the grammar. If we can patch this then we of course should, but the Topiary team doesn't have that kind of capacity at the moment. Of course, just to reiterate, that does not in any way diminish the usefulness of your analysis.

My takeaway from the ocaml performance is that as more grammars are added to the repo, whatever grammars topiary leverages means that their performance issues carry over on top of actual topiary specific logic.

For new languages, I don't have any immediate solutions to this issue other than restricting additions based on an acceptable limit in duration of coverate/perf tests.

In ocaml's case the recourse is limited to raising/fixing the issues upstream, dropping support (I'm not advocating for this), or downgrading it to a community grammar (so it's not in the hot path of testing).

Long term perhaps moving community topiary-queries into a separate repo/CI flow that always uses a precomplied topiary binary would help the long feedback loop when there's no rust code changes.

We are considering separating queries into their own repositories, so they're not coupled to Topiary releases. This is still very much in the "thinking about" stage -- not even "planning" -- but it's definitely on our radar 👍

I think your point adds weight to this: The original thought was just simply for decoupling's sake. However, unburdening the tests from those dependencies is going to be worthwhile. Of course, that inversion will require us to rethink how we test in the absence of a real grammar.

@mkatychev
Copy link
Contributor Author

I'm reluctant to merge this PR until this Windows problem is resolved.

I'm definitely against merging this PR if it breaks CI, I'll take a look at the cli-tester.rs block.

@Xophmeister
Copy link
Member

I'm definitely against merging this PR if it breaks CI, I'll take a look at the cli-tester.rs block.

Thanks, @mkatychev 🙏

(Meanwhile, I will try to look into a potential problem with parallel builds on Windows.)

Serialising the builds on Windows seems to fix the problem (see #869). Hopefully doing something similar in your macros will do the same and we can merge 🤞

@Xophmeister
Copy link
Member

@mkatychev #869 has now merged; apologies for the delay.

@mkatychev
Copy link
Contributor Author

@Xophmeister merging it seems to have fixed the issue 🥳

Copy link
Member

@Xophmeister Xophmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great; thank you 🙏

topiary-cli/tests/sample-tester.rs Outdated Show resolved Hide resolved
@mkatychev mkatychev mentioned this pull request Feb 11, 2025
2 tasks
@Xophmeister Xophmeister merged commit 2af1310 into tweag:main Feb 11, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants