Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test suite #186

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

Schultzer
Copy link
Contributor

@Schultzer Schultzer commented Jun 21, 2019

Hi guys.

I created a test suite. when running this in dev it spotted overflow in cos, fma, sin, sincos, tan functions.

let me know what you guys think.

I have not run the formatter yet.

@Schultzer Schultzer force-pushed the add-test-suite branch 2 times, most recently from 1b0c539 to 5e9bae1 Compare June 21, 2019 00:52
@alexcrichton
Copy link
Member

Nice! I'm all for adding more tests, especially if they're just executed with simple cargo test commands :)

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 21, 2019

Looks good to me in general.

I wonder whether we could auto-generate at least some of these tests, e.g., via a proc macro like:

#[libm_test(type = "trigonometric", include = "pi/2", ulp_max_error = "4.0")]
fn sind(x: f64) -> f64; 
#[libm_test(include(x = "2"))]
fn powf(x: f32, y: f32) -> f32; 

Many of these tests appear to be going for corner cases inputs, e.g., +-INFINITY, +-0, sqrt(...), pi / n, 2^n, e^n, etc. and their immediate neighborhoods. Depending on the type of function, some other regions are also worth testing, and well for binary inputs we can compute the cartesian product, and for unary f32 functions we can actually test all inputs. We could generate tests like this, and then use ULPs to compare the results against, e.g., the systems libm, requiring 0.5 ulps by default, and allowing the user to pass in a higher number as part of the proc macro, to allow for slight deviations in, e.g., transcendental functions.

For the manual tests, it might be interesting to consider moving the test inputs and results to some sort of external file, that's loaded by the test suite, so that, e.g., in case of a modification that introduces differences, one might just need to do something like --bless to update the testsuite, as opposed to having to go through all of these by hand.

@Schultzer I think adding these manual tests is good for now, but do you think these approach might be worth exploring in the future?

@Schultzer
Copy link
Contributor Author

@gnzlbg do you know of any projects they're doing something similar?
I've done this one time before with my blas impl. where I used JSON-FORTRAN to generate fortran result into a json file then having serde deserialize them. but I feel like including serde isn't really an option here.

I like your idea with the proc macro. I'm gonna look into that.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 21, 2019

do you know of any projects they're doing something similar?

Kind of. stdsimd auto-generates tests using a proc macro.

but I feel like including serde isn't really an option here.

Why do you think that? AFAICT these dependencies only need to be dev-dependencies.

@Schultzer
Copy link
Contributor Author

My thinking was because one of the goals are to include this in core. But if there is no limit on dev tools then I’ll refactor this draft

@alexcrichton
Copy link
Member

FWIW we only include the library source in libcore, not the test dependencies. It's fine to pull in whatever is needed here for testing this crate.

@Schultzer Schultzer force-pushed the add-test-suite branch 3 times, most recently from e9e3c5f to 78c28e5 Compare June 24, 2019 23:18
@Schultzer
Copy link
Contributor Author

ping @alexcrichton and @gnzlbg, I've done the first part and moved the test out and implemented a proc-macro for the nearest function.
This was the easiest way I could find for making it more DRY.

I'm planning to implement one for each trigonometric, exponential, power, hyperbolic, error, gamma, and manipulation functions, etc.

Please let me know what you guys think about this approach.

@alexcrichton
Copy link
Member

Hm I'm not really sure I fully understand this, why are tests generated via a macro rather than being written as before? Is there enough shared between each set of tests that it's worth having in a macro?

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 25, 2019 via email

@alexcrichton
Copy link
Member

While that's true the randomized testing we already do I think covers a good portion of that. I'm also not sure why it requires extraction into a full-blown procedural macro vs just writing test cases by hand.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 25, 2019

I'm also not sure why it requires extraction into a full-blown procedural macro vs just writing test cases by hand.

This doesn't require a proc macro, but I do think that hardcoding all inputs and outputs is a bad idea, except for super special cases.

With a good testing crate, one can write this code easily by hand, e.g.,

fn sind(x: f64)->f64;

#[cfg(test)]
mod tests {
  #[test] fn sind_zero() { libm_test::trigonometric(|x| sind(x), Around(0_f64), Ulp(4)); }
  #[test] fn sind_pi_half() { libm_test::trigonometric(|x| sind(x), Around(PI / 2.), Ulp(4)); }
  #[test] fn sind_infinity() { ... }
  #[test] fn sind_denormals() { ... }
...
  #[test] fn sind_special() { libm_test::assert_system(|x| sind(x), Exact(MySpecialVal), Ulp(2)) }
}

With the proc macro we can just generate all the boiler plate tests and the code can focus on the special ones instead, e.g.,

#[libm_test(type = "trigonometric", ulp = "4")] 
fn sind(x: f64)->f64;

#[test] fn sind_special() { libm_test::check_output(|x| sind(x), Input(MySpecialVal), Ulp(2)) }

crates/libm-test/Cargo.toml Outdated Show resolved Hide resolved
src/math/floorf.rs Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Member

I primarily just want to make sure that tests are as readable and easy to modify as possible, as I think that's the priority rather than ensuring we have deduplicated testing and such. The cost of a macro for something like this I feel is pretty high, especially if the macro has a lot of complexity.

@Schultzer Schultzer force-pushed the add-test-suite branch 12 times, most recently from a352ede to b709039 Compare July 1, 2019 04:24
@Schultzer Schultzer force-pushed the add-test-suite branch 2 times, most recently from 7d2f781 to ecfec13 Compare July 1, 2019 21:02
@Schultzer
Copy link
Contributor Author

This has been refactored, and includes the validation suite.

closes #150

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton this LGTM.

crates/libm-test/src/lib.rs Show resolved Hide resolved
@Schultzer
Copy link
Contributor Author

@alexcrichton let me know if you want me to squash the commits.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this looks like it deletes the existing test-generation harness? Why was that deleted?

crates/libm-test/Cargo.toml Show resolved Hide resolved
crates/libm-test/src/validation.rs Outdated Show resolved Hide resolved
@Schultzer
Copy link
Contributor Author

It's been moved to libm-test crate, and refactored to include validation for x86_64, and cases for nan and infinities, as mention above we plan to include precision testing via mpfr, so keeping it all together seems like the right way.

@Schultzer Schultzer force-pushed the add-test-suite branch 4 times, most recently from 4415fd1 to bd02d4a Compare July 2, 2019 02:03
@burrbull
Copy link
Contributor

burrbull commented Jul 2, 2019

When you say mpfr, you mean rug?

@alexcrichton
Copy link
Member

It's been moved to libm-test crate

Hm ok I see that but it seems to use sort of odd-looking macros now and has to be exhaustively tested? Previously it would automatically test new functions and they'd conform to a known set of signatures. I'd prefer to avoid new macro soup and duplication if possible.

@Schultzer
Copy link
Contributor Author

Schultzer commented Jul 2, 2019

Hm ok I see that but it seems to use sort of odd-looking macros now and has to be exhaustively tested?

The only part we are exhaustively testing is unary f32 functions on x86_64. The exhaustive tests did catch some bugs.

it would automatically test new functions

Thats one reason why I liked the proc-macro idea, then it would be explicit what we are testing, which it is also now.

I'd prefer to avoid new macro soup and duplication if possible

Which duplication are you referring too?

I would argue that it's easier to change the tests and understand whats get tested now, than before.

@Schultzer
Copy link
Contributor Author

@burrbull It could be rug or another crate there is quite a few.

@Schultzer
Copy link
Contributor Author

I plan to add some validation for f64 too, this could properly catch issues like #165

@Schultzer Schultzer force-pushed the add-test-suite branch 3 times, most recently from 15c9e0e to 2d242bc Compare July 2, 2019 04:32
@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 2, 2019

Previously it would automatically test new functions and they'd conform to a known set of signatures.

Yes. That was the main advantage of the proc macro approach. Now new functions need to be manually added to the test suite. That's a one line change, but needs to be done manually.

@Schultzer Schultzer force-pushed the add-test-suite branch 2 times, most recently from 9ded3ac to fa89234 Compare July 2, 2019 17:42
Signed-off-by: Benjamin Schultzer <benjamin@schultzer.com>
@Schultzer
Copy link
Contributor Author

Schultzer commented Jul 2, 2019

@alexcrichton would you prefer to just write these tests out in each function?

This has to do with libcore itself being built with panic=unwind, but why not just link to std and have it provide these?

Does that mean that when we call 0f32.acos() it makes a call to extern "C" { pub fn acosf(x: f32) -> f32; } ?

@burrbull
Copy link
Contributor

burrbull commented Jul 3, 2019

I made initial commit for support test sleef against rug::Float if someone is interested
burrbull/sleef-rs@289e46e

Results:

Max ULP = 0.703125 at -4.944768947135878e307, Average ULP = 0.25104109375	test f64::u10::test_sin ... ok

Max ULP = 1.53125 at -1.7301104562983707e308, Average ULP = 0.329521328125	test f64::u35::test_cos ... ok

I think something like this can be done for libm.

@alexcrichton
Copy link
Member

Sorry I really do not have a ton of time to maintain this crate, and this PR alone has over 40 (!) comments already for what I was hoping would be very simple, adding some more tests. This is a very important crate to rust-lang/rust and arbitrary rewrites of infrastructure need to be carefully done rather than simply proposed off-the-cuff.

I see testing this crate as falling into a few major categories:

  • One is handwritten tests for each function. For example regression tests due to bugs that have been fixed with specific inputs/outputs.
  • Tests against a reference implementation (like musl). This is what the build script covers today. We need to guarantee that this is present for all functions that are defined in musl.
  • Eventually tests like fuzz tests or some sort of more random/exhaustive checking could be performed, or something like that.

I don't know why the reference tests are being rewritten in this PR. I don't know why you can't already just write normal unit tests for each function. I'm not really sure why there's a big rewrite happening here vs simply adding new tests in appropriate modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants