chore(test): Break up integration tests per language #859

mkatychev · 2025-01-30T21:46:52Z

chore(test): Break up integration tests per language

Related Issues: #519, #539

Description

add a unit test per language in the format mod test_fmt and mod test_coverage test modules with a macro generating the test functions

topiary/topiary-cli/tests/sample-tester.rs

Lines 79 to 92 in 68e5844

    
           lang_test!( 
        
               "bash", 
        
               "css", 
        
               "json", 
        
               "nickel", 
        
               "ocaml", 
        
               "ocaml_interface", 
        
               "ocamllex", 
        
               "openscad", 
        
               "rust", 
        
               "toml", 
        
               "tree_sitter_query", 
        
               fmt_input 
        
           );

add openscad to coverage tests and modified openscad test to have 100% coverage

Checklist

Checklist before merging:

CHANGELOG.md updated
README.md up-to-date

mkatychev · 2025-01-30T21:54:47Z

While trying to solve some failing formatting tests for #845 I noticed some unusual slowness from the test cases. Once tests were broken up, it seems that the OCaml tests perform significantly worse than other languages and #519 may still be a problem (ocaml iface test is commented out for this PR, so ignore test failure):

$ cargo nextest run
   Compiling topiary-cli v0.6.0 (/Users/mkatychev/Documents/rust/topiary/topiary-cli)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.90s
    Starting 32 tests across 3 binaries (run ID: d0702afd-a92f-45ff-b1cf-1e3266a708c9, nextest profile: default)
        PASS [   0.618s] topiary-cli::cli-tester test_cfg
        PASS [   0.805s] topiary-cli::cli-tester test_fmt_stdin
        PASS [   0.815s] topiary-cli::cli-tester test_fmt_invalid
        PASS [   0.820s] topiary-cli::cli-tester test_fmt_dir
        PASS [   0.851s] topiary-cli::cli-tester test_fmt_stdin_query_fallback
        PASS [   0.856s] topiary-cli::cli-tester test_fmt_stdin_query
        PASS [   0.908s] topiary-cli::cli-tester test_fmt_files_query_fallback
        PASS [   0.921s] topiary-cli::cli-tester test_fmt_files
        PASS [   0.316s] topiary-cli::cli-tester test_vis
        PASS [   0.109s] topiary-cli::sample-tester test_coverage::coverage_input_json
        PASS [   0.308s] topiary-cli::cli-tester test_vis_invalid
        PASS [   0.179s] topiary-cli::sample-tester test_coverage::coverage_input_ocamllex
        PASS [   1.112s] topiary-cli::sample-tester test_coverage::coverage_input_toml
        PASS [   2.171s] topiary-cli::sample-tester test_coverage::coverage_input_css
        PASS [   0.986s] topiary-cli::sample-tester test_coverage::coverage_input_tree_sitter_query
        PASS [   2.153s] topiary-cli::sample-tester test_coverage::coverage_input_openscad
        PASS [   0.129s] topiary-cli::sample-tester test_fmt::fmt_input_json
        PASS [   2.593s] topiary-cli::sample-tester formatted_query_tester
        PASS [   0.619s] topiary-cli::sample-tester test_fmt::fmt_input_bash
        PASS [   1.022s] topiary-cli::sample-tester test_fmt::fmt_input_css
        PASS [   0.103s] topiary-cli::sample-tester test_fmt::fmt_input_ocamllex
        PASS [   0.197s] topiary-cli::sample-tester test_fmt::fmt_input_openscad
        PASS [   0.135s] topiary-cli::sample-tester test_fmt::fmt_input_toml
        PASS [   1.356s] topiary-cli::sample-tester test_fmt::fmt_input_nickel
        PASS [   0.355s] topiary-cli::sample-tester test_fmt::fmt_input_tree_sitter_query
        PASS [   2.349s] topiary-cli::sample-tester test_fmt::fmt_input_ocaml
        PASS [   2.314s] topiary-cli::sample-tester test_fmt::fmt_queries
        PASS [   5.225s] topiary-cli::sample-tester test_fmt::fmt_input_ocaml_interface
        PASS [   8.159s] topiary-cli::sample-tester test_coverage::coverage_input_nickel
        PASS [  26.421s] topiary-cli::sample-tester test_coverage::coverage_input_bash
        SLOW [> 60.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml
        SLOW [> 60.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface
        SLOW [>120.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml
        SLOW [>120.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface
        PASS [ 147.559s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml
        SLOW [>180.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface
        SLOW [>240.000s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface
        FAIL [ 261.056s] topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface

--- STDOUT:              topiary-cli::sample-tester test_coverage::coverage_input_ocaml_interface ---

running 1 test
test test_coverage::coverage_input_ocaml_interface has been running for over 60 seconds
test test_coverage::coverage_input_ocaml_interface ... FAILED

failures:

failures:
    test_coverage::coverage_input_ocaml_interface

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 21 filtered out; finished in 261.04s

The test_coverage::coverage_input_ocaml_interface test, if uncommented, fails coverage but runs very slowly:

$ time topiary coverage -v tests/samples/input/ocaml_interface.mli
Query coverage: 32.89%

mkatychev · 2025-02-02T19:32:38Z

Going off of the tree-sitter tips for performance, the tree-sitter-ocaml base grammar has about ten times the underperformant example's state counts:

On 0.20.5, we're at 2243 for STATE_COUNT and 666 for LARGE_STATE_COUNT

with the ts-ocaml state cont being about 10x and large state count a 5x:

$ cat src/parser.c | rg "#define.*STATE"
#define STATE_COUNT 22540
#define LARGE_STATE_COUNT 3324

with operators and expressions having a rather large state count:

$ tree-sitter generate --report-states-for-rule -
parenthesized_operator                  1633
infix_expression                        1296
object_expression                       1196
product_expression                      932

Xophmeister · 2025-02-03T17:35:16Z

I've had a quick look through this and, in principle, I really like the idea of using a macro to abstract this 👍

Before reviewing this more thoroughly, I have some initial thoughts:

AIUI, your macro should give almost identical code to the original; at least semantically. The test functions will be hoisted to the module level, for each feature, but they're not doing anything that the tests weren't previously doing. As such, why does it run slower? Or, maybe your point is that it's not running slower overall, but now we can see that OCaml is running particularly slowly (whereas before, it was still running slowly, but we were blind to this)? Could you please clarify.
The Windows build is failing for some reason.

mkatychev · 2025-02-03T22:54:57Z

that it's not running slower overall, but now we can see that OCaml is running particularly slowly

@Xophmeister that's my conclusion, if you see the post above, ocaml AST state count is about 10x that of the rust repo. I think the definitions in the ocaml grammar.js is much more complex than any other languages around. I played around with the repo and only managed to slim the definitions by a fraction:

$ cat ocaml/grammars/ocaml/src/parser.c | rg "#define.*STATE"
#define STATE_COUNT 19687
#define LARGE_STATE_COUNT 3128

Here's one of the recent main commit test durations (228s):
https://github.com/tweag/topiary/actions/runs/13117520812/job/36595334995#step:8:835
V.S. one from this PR (202s):
https://github.com/tweag/topiary/actions/runs/13101475641/job/36550251190#step:8:761

The Windows build is failing for some reason.

Yes that's very odd, not sure what to make of it, seems to be some sort of linker error. Hopefully it wasn't cargo update that modified the Cargo.lock.

EDIT: I wonder if it has something to do with the lazy fetching of the dylib grammar since there may be multiple topiary calls going on at once, that would explain the ephemeral ocaml test failures (they sometimes pass).

mkatychev · 2025-02-03T23:18:18Z

Continued from #859 (comment):
I'm not familiar with Ocaml as a language but I suspect there are many places for optimization that can be done, part of the problem (I suspect) is that the *_expression nodes in the Ocaml repo are encountering a tricky O(n^2) complexity.

A good illustration of this is to call tree-sitter generate --report-states-for-rule infix_expression. it's a bit hard to see what the smoking gun is at the moment but one of my guesses is that the identifier aliases are overly permuted/separated: https://github.com/tree-sitter/tree-sitter-ocaml/blob/57644edfbba0edb38ac17dba2add4c243fa3539b/grammars/ocaml/grammar.js#L1926-L1947 this is in contrast to tree-sitter-rust where the identifier node is heavily used/reused.

Xophmeister · 2025-02-04T17:01:22Z

Thanks for this analysis 🙏 Bear in mind that we are currently pinned to a version of tree-sitter-ocaml from about a year ago; these accidental complexity problems may have been (partially) resolved in the meantime. (Or, when you were performing this analysis, did you pull the current HEAD?) (cc @314eter)

It's also worth pointing out that Topiary largely is a consumer of the grammar. If we can patch this then we of course should, but the Topiary team doesn't have that kind of capacity at the moment. Of course, just to reiterate, that does not in any way diminish the usefulness of your analysis.

Xophmeister · 2025-02-04T17:18:06Z

The Windows build is failing for some reason.

Yes that's very odd, not sure what to make of it, seems to be some sort of linker error. Hopefully it wasn't cargo update that modified the Cargo.lock.

EDIT: I wonder if it has something to do with the lazy fetching of the dylib grammar since there may be multiple topiary calls going on at once, that would explain the ephemeral ocaml test failures (they sometimes pass).

I don't see anything in the changes to Cargo.{toml,lock} that are suggestive of changing the grammar build tooling. As such, I think your race condition theory is a good one. To test this out, maybe it would be worth running topiary prefetch before the test suite in the build-windows job.

I don't know ~~much~~ anything about how Windows temporary files persist on GitHub runners, but perhaps it would be as easy as adding cargo run --all-features -- prefetch¹ as a step in the appropriate place in .github/workflows/ci.yml. It's worth a try, I think 🙂

The parallel feature, which is enabled by default and --all-features shouldn't matter, because the theorised race is when building the same grammar near-simultaneously, rather than all grammars once but in parallel. ↩

CHANGELOG.md

mkatychev · 2025-02-04T18:29:31Z

Going off @Xophmeister's suggestion of prefetching the grammars, seems the CI is now failing at that step, not sure how my changes could have impacted topiary prefetch behaviour 🤔.

mkatychev · 2025-02-04T18:51:38Z

It's also worth pointing out that Topiary largely is a consumer of the grammar. If we can patch this then we of course should, but the Topiary team doesn't have that kind of capacity at the moment. Of course, just to reiterate, that does not in any way diminish the usefulness of your analysis.

My takeaway from the ocaml performance is that as more grammars are added to the repo, whatever grammars topiary leverages means that their performance issues carry over on top of actual topiary specific logic.

For new languages, I don't have any immediate solutions to this issue other than restricting additions based on an acceptable limit in duration of coverate/perf tests.

In ocaml's case the recourse is limited to raising/fixing the issues upstream, dropping support (I'm not advocating for this), or downgrading it to a community grammar (so it's not in the hot path of testing).

Long term perhaps moving community topiary-queries into a separate repo/CI flow that always uses a precomplied topiary binary would help the long feedback loop when there's no rust code changes.

Xophmeister · 2025-02-05T12:20:01Z

Going off @Xophmeister's suggestion of prefetching the grammars, seems the CI is now failing at that step, not sure how my changes could have impacted topiary prefetch behaviour 🤔.

So this is a particularly weird error:

C:\Users\RUNNER~1\AppData\Local\Temp\.tmp2VQMTF\toml\src\parser.c : fatal error C1083: Cannot open compiler generated file: 'C:\Users\RUNNER~1\AppData\Local\Temp\.tmp2VQMTF\rust\parser.obj': Permission denied

Building the TOML grammar failed because it's trying to link with the Rust build artefact, which has already been deleted! 🤷

I agree that I cannot see how your changes are causing this problem. Yet, other post-v0.6.0 PRs are not suffering from it. While I don't think it's your responsibility to fix, I'm reluctant to merge this PR until this Windows problem is resolved.

That being said, I'm now strongly of the opinion that a race condition is the problem. Indeed, @nbacquey put in this code in cli-tester.rs to serialise dynamic grammar building in tests:

topiary/topiary-cli/tests/cli-tester.rs

Lines 20 to 47 in df1cfc3

    
           // We need to prefetch JSON and TOML grammars before running the tests, on pain of race condition: 
        
           // If multiple calls to Topiary are made in parallel and the grammar is missing, they will all try 
        
           // to fetch and build it, thus creating an empty .so file while g++ is running. If another instance 
        
           // of topiary starts at this moment, it will mistake the empty .so file for an already built grammar, 
        
           // and try to run with it, resulting in an error. See https://github.com/tweag/topiary/issues/767 
        
           static INIT: Once = Once::new(); 
        
           pub fn initialize() { 
        
               INIT.call_once(|| { 
        
                   #[cfg(feature = "json")] 
        
                   Command::cargo_bin("topiary") 
        
                       .expect("Unable to build Topiary") 
        
                       .arg("fmt") 
        
                       .arg("--language") 
        
                       .arg("json") 
        
                       .write_stdin("") 
        
                       .assert() 
        
                       .success(); 
        
                   #[cfg(feature = "toml")] 
        
                   Command::cargo_bin("topiary") 
        
                       .expect("Unable to build Topiary") 
        
                       .arg("fmt") 
        
                       .arg("--language") 
        
                       .arg("toml") 
        
                       .write_stdin("") 
        
                       .assert() 
        
                       .success(); 
        
               }); 
        
           }

Could you revert the CI change I suggested and try implementing something similar in your sample testing macros? (With my apologies for the wild goose chase!)

(Meanwhile, I will try to look into a potential problem with parallel builds on Windows.)

It's also worth pointing out that Topiary largely is a consumer of the grammar. If we can patch this then we of course should, but the Topiary team doesn't have that kind of capacity at the moment. Of course, just to reiterate, that does not in any way diminish the usefulness of your analysis.

My takeaway from the ocaml performance is that as more grammars are added to the repo, whatever grammars topiary leverages means that their performance issues carry over on top of actual topiary specific logic.

For new languages, I don't have any immediate solutions to this issue other than restricting additions based on an acceptable limit in duration of coverate/perf tests.

In ocaml's case the recourse is limited to raising/fixing the issues upstream, dropping support (I'm not advocating for this), or downgrading it to a community grammar (so it's not in the hot path of testing).

Long term perhaps moving community topiary-queries into a separate repo/CI flow that always uses a precomplied topiary binary would help the long feedback loop when there's no rust code changes.

We are considering separating queries into their own repositories, so they're not coupled to Topiary releases. This is still very much in the "thinking about" stage -- not even "planning" -- but it's definitely on our radar 👍

I think your point adds weight to this: The original thought was just simply for decoupling's sake. However, unburdening the tests from those dependencies is going to be worthwhile. Of course, that inversion will require us to rethink how we test in the absence of a real grammar.

mkatychev · 2025-02-05T18:17:10Z

I'm reluctant to merge this PR until this Windows problem is resolved.

I'm definitely against merging this PR if it breaks CI, I'll take a look at the cli-tester.rs block.

Xophmeister · 2025-02-05T18:21:38Z

I'm definitely against merging this PR if it breaks CI, I'll take a look at the cli-tester.rs block.

Thanks, @mkatychev 🙏

(Meanwhile, I will try to look into a potential problem with parallel builds on Windows.)

Serialising the builds on Windows seems to fix the problem (see #869). Hopefully doing something similar in your macros will do the same and we can merge 🤞

Xophmeister · 2025-02-10T10:45:16Z

@mkatychev #869 has now merged; apologies for the delay.

mkatychev · 2025-02-11T00:45:16Z

@Xophmeister merging it seems to have fixed the issue 🥳

Xophmeister

Looks great; thank you 🙏

topiary-cli/tests/sample-tester.rs

mkatychev added 2 commits January 30, 2025 15:41

break up tests per language

68e5844

updated CHANGELOG

2ff159a

add back changelog teplate comment

283e501

mkatychev force-pushed the chore/break-up-lang-tests branch from 3311e6b to 283e501 Compare February 2, 2025 18:40

mkatychev added 2 commits February 3, 2025 16:56

Merge branch 'main' into chore/break-up-lang-tests

9c3706c

revert lock changes

1b43392

Xophmeister reviewed Feb 4, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

mkatychev added 2 commits February 4, 2025 11:57

added prefetch step to windows CI test

00ff70f

Merge branch 'main' into chore/break-up-lang-tests

946cc75

Xophmeister mentioned this pull request Feb 5, 2025

Potential dynamic build problem on Windows #868

Open

mkatychev mentioned this pull request Feb 5, 2025

fix: Add coverage check and more samples for OpenSCAD #867

Merged

2 tasks

Merge branch 'main' into chore/break-up-lang-tests

fff8d4d

Xophmeister approved these changes Feb 11, 2025

View reviewed changes

topiary-cli/tests/sample-tester.rs Outdated Show resolved Hide resolved

mkatychev added 2 commits February 11, 2025 12:24

Merge branch 'main' into chore/break-up-lang-tests

55497e2

updated name

e8cb03e

mkatychev mentioned this pull request Feb 11, 2025

Add mold linker to CI #871

Open

2 tasks

Xophmeister merged commit 2af1310 into tweag:main Feb 11, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(test): Break up integration tests per language #859

chore(test): Break up integration tests per language #859

mkatychev commented Jan 30, 2025 •

edited

Loading

mkatychev commented Jan 30, 2025

mkatychev commented Feb 2, 2025

Xophmeister commented Feb 3, 2025

mkatychev commented Feb 3, 2025 •

edited

Loading

mkatychev commented Feb 3, 2025 •

edited

Loading

Xophmeister commented Feb 4, 2025

Xophmeister commented Feb 4, 2025 •

edited

Loading

mkatychev commented Feb 4, 2025

mkatychev commented Feb 4, 2025 •

edited

Loading

Xophmeister commented Feb 5, 2025

mkatychev commented Feb 5, 2025

Xophmeister commented Feb 5, 2025

Xophmeister commented Feb 10, 2025

mkatychev commented Feb 11, 2025

Xophmeister left a comment

	lang_test!(
	"bash",
	"css",
	"json",
	"nickel",
	"ocaml",
	"ocaml_interface",
	"ocamllex",
	"openscad",
	"rust",
	"toml",
	"tree_sitter_query",
	fmt_input
	);

chore(test): Break up integration tests per language #859

chore(test): Break up integration tests per language #859

Conversation

mkatychev commented Jan 30, 2025 • edited Loading

chore(test): Break up integration tests per language

Description

Checklist

mkatychev commented Jan 30, 2025

mkatychev commented Feb 2, 2025

Xophmeister commented Feb 3, 2025

mkatychev commented Feb 3, 2025 • edited Loading

mkatychev commented Feb 3, 2025 • edited Loading

Xophmeister commented Feb 4, 2025

Xophmeister commented Feb 4, 2025 • edited Loading

Footnotes

mkatychev commented Feb 4, 2025

mkatychev commented Feb 4, 2025 • edited Loading

Xophmeister commented Feb 5, 2025

mkatychev commented Feb 5, 2025

Xophmeister commented Feb 5, 2025

Xophmeister commented Feb 10, 2025

mkatychev commented Feb 11, 2025

Xophmeister left a comment

Choose a reason for hiding this comment

mkatychev commented Jan 30, 2025 •

edited

Loading

mkatychev commented Feb 3, 2025 •

edited

Loading

mkatychev commented Feb 3, 2025 •

edited

Loading

Xophmeister commented Feb 4, 2025 •

edited

Loading

mkatychev commented Feb 4, 2025 •

edited

Loading