Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help wanted: ideas for improving the test run time #37

Closed
jmcnamara opened this issue May 7, 2023 · 10 comments
Closed

Help wanted: ideas for improving the test run time #37

jmcnamara opened this issue May 7, 2023 · 10 comments

Comments

@jmcnamara
Copy link
Owner

jmcnamara commented May 7, 2023

Help wanted

One of the goals of rust_xlsxwriter is fidelity with the xlsx file format generated by Excel. This is achieved using integration tests that take files created in Excel 2007 and compares them file by file and element by element with files created using rust_xlsxwriter.

Here is a typical test file, the associated xlsx file and the test runner code.

This approach has a number of advantages from an maintenance point of view:

  • It allows incremental test-drive development of Excel features.
  • It allows bug reports to be replicated quickly in Excel and compared with rust_xlsxwriter
  • It avoids subjective arguments about whether rust_xlsxwriter or some other third party Excel reading software is correct in its implementation/interpretation of the XLSX file specification since it uses Excel as the standard.

For the end user the benefits of having output files that are effectively identical to files produced by Excel means the maximum possible interoperability with Excel and applications that read XLSX files.

The test suite contains an individual test for each file (although there is sometimes more than one test against the same input file). Each of these tests in compiled into and run as a crate which means the test suite is slow. For usability reasons I don't want to test more than one xlsx file per file/crate (apart from maybe the grouping scheme outlined below).

There are currently ~540 test files and it takes 8+ minutes to run on a 3.2 GHz 6-Core Intel Core i7 with 32GB of fast RAM:

$ time cargo test

real	8m36.340s
user	30m34.062s
sys	9m0.802s

In the GitHub Actions CI this is currently taking around 18 minutes.

There will eventually be around 800 test files so the runtime will be ~50% longer.

nextest is bit faster but not significantly so. The timing also doesn't include the doc tests:

$ time cargo nextest run

real	7m45.029s
user	26m44.624s
sys	6m59.271s

A few months ago when the test suite took around 4 minutes I tried to consolidate the tests into one crate using the advice in this article on Delete Cargo Integration Tests. This was significantly faster by around 5-10x but didn't allow me to run individual tests (I'm 99% sure). I tried to replicate that again to redo the performance testing and verify the running of individual tests but failed for some reasons related to test refactoring since then.

For comparison the Python bytes test suite runs 1600 integration and unit tests in 18 seconds. The Perl test suite takes around 3 minutes and the C test suite takes 5 minutes.

Anyway to the help wanted: if anyone has any ideas how the test runtime might be improved or if you can get the above "Delete Cargo Integration Tests" approach to work again for comparison let me know. I might be able to come up with a hybrid approach where the tests under development or debug are in their own crates and moved back to an overall test crate/folder afterwards.

@jmcnamara jmcnamara added question This is a general question help wanted and removed question This is a general question labels May 7, 2023
@jmcnamara
Copy link
Owner Author

Closing since there probably isn't any fix for this. But if anyone comes up with anything I'd be happy to hear.

@jmcnamara jmcnamara reopened this Jun 21, 2023
@jmcnamara jmcnamara closed this as not planned Won't fix, can't repro, duplicate, stale Jul 5, 2023
@adriandelgado
Copy link
Contributor

adriandelgado commented Jul 24, 2023

Hi, I always thought the time to test this crate is massive and I almost gave up last time I contributed.
My recommendation is using the insta crate instead of pretty_assertions. It's specifically designed for this sort of testing (snapshot testing). Look at this quote from their description:

Snapshot tests are particularly useful if your reference values are very large or change often.

I haven't tried to implement this to check if it is indeed faster, I'm just speculating.

@adriandelgado
Copy link
Contributor

You can put all the tests in a single crate and use the insta's cli to check specific snapshots.
See: https://insta.rs/docs/cli/#review

@adriandelgado
Copy link
Contributor

@jmcnamara I think this is a very important issue to solve to attract more contributors in the future.

@jmcnamara
Copy link
Owner Author

jmcnamara commented Jul 25, 2023

@adriandelgado I agree it is important to fix this issue. It is starting to have a direct effect on my development time.

I’m offline for a few days so I’ll try your suggestions when I get home.

Ultimately I think the issue is the amount of time required to compile, link and run several hundred test crates. The only way around that is to group them all into one sub-folder crate.

As I said in the initial post above I did that several months ago and it was significantly faster. However, it limited my ability to run individual tests so I reverted the change. I tried to get it to work again around the time of posting this but couldn’t get it to compile due to some changes made to the test suite in the meantime. If you think you could get that working based on the outline in the article linked above that would be helpful. If not I’ll get around to it soon myself.

@jmcnamara jmcnamara reopened this Jul 25, 2023
@jmcnamara
Copy link
Owner Author

I'm working on this now and I think I have a reasonable fix.

jmcnamara added a commit that referenced this issue Jul 27, 2023
Refactored integration tests into single crate for increased
performance. Tests now run ~ 300% faster.

Issue #37
jmcnamara added a commit that referenced this issue Jul 27, 2023
Refactored integration tests into single crate for increased
performance. Tests now run ~ 300% faster.

Issue #37
jmcnamara added a commit that referenced this issue Jul 27, 2023
Refactored integration tests into single crate for increased
performance. Tests now run ~ 300% faster.

Issue #37
@jmcnamara
Copy link
Owner Author

My guess about time wasted on compiling and linking was right. I moved all the tests into a sub-directory/crate and the test execution when from around 700 sec to 2 sec!!

I now have the issue of not being able to run individual tests (useful when working on new features) but I'll figure out a way around that.

Note cargo test compiles the example files as well and that can take a bit longer. I'm looking for a way to turn that off by default and only running it explicitly but I haven't found it yet. In the meantime you can run just the integration tests like this:

cargo test --test '*'

Or the integration+lib tests (slightly slower but not much):

cargo test --tests

Also, I turned off the doctests by default since they are slow (~ 60 secs). You can test those with:

cargo test --doc

Try it out and if you spot any other improvements let me know.

@jmcnamara
Copy link
Owner Author

jmcnamara commented Jul 28, 2023

It doesn't seem to be currently possible to cargo test without building the examples (see this question) but the other 2 more verbose options above are workable.

Closing since the single crate solution solved the problem this requisition was opened for.

@arifd
Copy link

arifd commented Aug 14, 2024

This just happened, which may be of interest.

@jmcnamara
Copy link
Owner Author

This just happened, which may be of interest.

Thanks for the heads up. That does indeed look promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants