Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block bootstrap: verify performance & accuracy against legacy xbootstrap #819

Closed
nikeethr opened this issue Feb 12, 2025 · 1 comment
Closed

Comments

@nikeethr
Copy link
Collaborator

nikeethr commented Feb 12, 2025

We should probably verify that no side-effects were introduced (and any other unknowns) during the port by comparing against the legacy xbootstrap in a real-world analysis.

@nicholasloveday Greatly appreciate your effort in this. As mentioned I'm happy to differ any functional changes and refactoring to other pull request. My focus in this review is to make sure we establish a reasonable baseline algorithm to build upon. I'll leave the scope of the tutorial and documentation to others.

I think we've addressed most of the comments. I also think it may be good to do a comparison against the archived xbootstrap library, even as a one-off code for some real-world inputs, applied to a few different metrics and post the results here. Ideally done by a subject matter expert. I'm hoping if there's an existing code already used to do this that it can be swiftly tailored to use scores's bootstrap and do a comparison.

I recommend that a table like this might be nice, using the same random seed for xbootstrap & current implementation. Note that seeds may not be totally reliable with parallelisation - if that's the case there's some tips in #818 at the end to run dask in single-threaded debug mode - to verify accuracy initially, before repeating with a worker pool.
(I don't think we need an extensive number of examples, just enough to cover a couple common, but different, scores 2-3 cases; and with small and large data sizes 2-3 cases, ideally chunked on disk so that we know for sure it triggers the ufunc - total 4 to 9 experiments max. Though even 1 or 2 such experiments is better than none)

Metric Bootstrap args (iterations, dims etc.) Data source [description (size)] Xbootstrap [result (time taken) ] Scores [result (time taken)]
MSE (iterations=100, dims=["time", "x", "y"]) ACCESS-* 2024-01->2024-02 (5gig) ...mean/variance along preserved dimensions may be sufficient, if there are too many values - or alternatively add a "error" column and/or use (?)xr.approx_equal (1.2s) ... (1.3s)
... ... ... ... ...

This is a recommendation only - there is always a trade-off between holding up this PR too much and getting it fully verified - so I leave it to @tennlee if he would like this done before merging or if it can wait till later.

Originally posted by @nikeethr in #418 (comment)

@nikeethr nikeethr changed the title Circular bootstrap: verify performance & accuracy against legacy xbootstrap Block bootstrap: verify performance & accuracy against legacy xbootstrap Feb 12, 2025
@nikeethr nikeethr mentioned this issue Feb 12, 2025
10 tasks
@tennlee
Copy link
Collaborator

tennlee commented Feb 12, 2025

Duplicate of #821

@tennlee tennlee marked this as a duplicate of #821 Feb 12, 2025
@tennlee tennlee closed this as completed Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants