Block bootstrap: verify performance & accuracy against legacy `xbootstrap` #819

nikeethr · 2025-02-12T03:57:22Z

We should probably verify that no side-effects were introduced (and any other unknowns) during the port by comparing against the legacy xbootstrap in a real-world analysis.

@nicholasloveday Greatly appreciate your effort in this. As mentioned I'm happy to differ any functional changes and refactoring to other pull request. My focus in this review is to make sure we establish a reasonable baseline algorithm to build upon. I'll leave the scope of the tutorial and documentation to others.

I think we've addressed most of the comments. I also think it may be good to do a comparison against the archived xbootstrap library, even as a one-off code for some real-world inputs, applied to a few different metrics and post the results here. Ideally done by a subject matter expert. I'm hoping if there's an existing code already used to do this that it can be swiftly tailored to use scores's bootstrap and do a comparison.

I recommend that a table like this might be nice, using the same random seed for xbootstrap & current implementation. Note that seeds may not be totally reliable with parallelisation - if that's the case there's some tips in #818 at the end to run dask in single-threaded debug mode - to verify accuracy initially, before repeating with a worker pool.
(I don't think we need an extensive number of examples, just enough to cover a couple common, but different, scores 2-3 cases; and with small and large data sizes 2-3 cases, ideally chunked on disk so that we know for sure it triggers the ufunc - total 4 to 9 experiments max. Though even 1 or 2 such experiments is better than none)

Metric Bootstrap args (iterations, dims etc.) Data source [description (size)] Xbootstrap [result (time taken) ] Scores [result (time taken)]

MSE (iterations=100, dims=["time", "x", "y"]) ACCESS-* 2024-01->2024-02 (5gig) ...mean/variance along preserved dimensions may be sufficient, if there are too many values - or alternatively add a "error" column and/or use (?)xr.approx_equal (1.2s) ... (1.3s)

... ... ... ... ...

This is a recommendation only - there is always a trade-off between holding up this PR too much and getting it fully verified - so I leave it to @tennlee if he would like this done before merging or if it can wait till later.

Originally posted by @nikeethr in #418 (comment)

The text was updated successfully, but these errors were encountered:

tennlee · 2025-02-12T04:44:37Z

Duplicate of #821

nikeethr changed the title ~~Circular bootstrap: verify performance & accuracy against legacy xbootstrap~~ Block bootstrap: verify performance & accuracy against legacy xbootstrap Feb 12, 2025

nikeethr mentioned this issue Feb 12, 2025

Add block bootstrapping #418

Merged

10 tasks

tennlee marked this as a duplicate of #821 Feb 12, 2025

tennlee closed this as completed Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block bootstrap: verify performance & accuracy against legacy `xbootstrap` #819

Block bootstrap: verify performance & accuracy against legacy `xbootstrap` #819

nikeethr commented Feb 12, 2025 •

edited

Loading

tennlee commented Feb 12, 2025

Block bootstrap: verify performance & accuracy against legacy xbootstrap #819

Block bootstrap: verify performance & accuracy against legacy xbootstrap #819

Comments

nikeethr commented Feb 12, 2025 • edited Loading

tennlee commented Feb 12, 2025

Block bootstrap: verify performance & accuracy against legacy `xbootstrap` #819

Block bootstrap: verify performance & accuracy against legacy `xbootstrap` #819

nikeethr commented Feb 12, 2025 •

edited

Loading