-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate floats using lexographically ordered encoding #49
base: main
Are you sure you want to change the base?
generate floats using lexographically ordered encoding #49
Conversation
This uses the same encoding as `hypothesis` to give floats better shrinking properties.
A good starting point for tests may be to port some of checks from Hypothesis regarding the properties of the encoding. |
This just ports the tests that check the ordering properties of the encoding and fixes the bugs that were found by the tests.
30533a7
to
e58ac23
Compare
I've implemented the new encoding of floating point numbers and ported the tests from hypothesis for the ordering properties of the encoding. This means that that the shrinking to smaller I think this is probably ready for at least a preliminary review. |
It seems the Julia 1.8 test timed out when trying to fetch the package list? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, thanks for the PR! I think it's mostly good, just some things around docstrings & preventing erroring paths.
I'm not 100% sure about putting everything in a dedicated submodule (I did mention that I wanted to refactor it, but I was referring to all of data.jl
, not just the floating point parts), but with it we might as well go all the way and put the floating point utilities from util.jl
here as well. They're currently unused anyway.
Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com>
Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com>
Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com>
Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com>
I've also come across HypothesisWorks/hypothesis#816 and HypothesisWorks/hypothesis#469, which may be helpful in terms of adding provenance/tests to this! If the issues mentioned there hit Supposition.jl too, it's possible that we'll need our own version of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, thanks! I've added some minor wording changes so we don't unintentionally imply an operation that doesn't happen, but otherwise this looks good to go! I'll play around a bit with some properties and see how they shrink with this, but after that I'll probably tag a release.
Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com>
Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com>
The test was previously failing because NaNs were being generated with the signbit set, but lex_to_float only generates floats with an unset signbit. Refactored the tests to test for all "positive" NaNs, separately from other positive float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that should do the trick! No need to worry about doing manual bitshifts either, the "fancy" version compiles away completely:
julia> function foo(f)
_, expo, frac = tear(f)
assemble(Float64, zero(UInt), expo, frac)
end
foo (generic function with 1 method)
julia> @code_llvm debuginfo=:none dump_module=false foo(1.0)
; Function Signature: foo(Float64)
define double @julia_foo_3453(double %"f::Float64") #0 {
top:
%bitcast_coercion = bitcast double %"f::Float64" to i64
%0 = and i64 %bitcast_coercion, 9223372036854775807
%bitcast_coercion7 = bitcast i64 %0 to double
ret double %bitcast_coercion7
}
Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com>
Alright, I experimented with this a bit and found this when trying to check the generated docs: julia> rand_goal = rand()
0.93768949423463
julia> @check verbose=true function israndgoal(f=Data.Floats{Float64}())
# negative absolute distance, because we want to _minimize_ the distance
target!(-abs(rand_goal - f))
event!(f)
f != rand_goal
end
Test passed
Context: israndgoal
Score: -0.48696306540036444
Events:
UNLABELED_EVENT_0
0.45072642883426556 This should fail though, as the searched-for target is actually found on In essence, it seems like the changed IR resulted in the naive hill-climbing currently used with |
That's unfortunate. I think I need to dig into the testcase code to better understand what's going on before I can say anything substantial. I'll also look into how hypothesis does float shrinking a bit to see if there's any other options there. Another option would be to add the new float encoding as a feature flag since it seems there are cases where the new encoding is more beneficial and some cases where the old encoding is still better. Since, it may be still some time before this PR can be merged, would it be possible to cut a new release with the floating point clamping while this is WIP? Thanks for all your feedback and for taking on the task of bringing proptests back to julia. |
As far as I understand it, they're doing a kind of more limited version of what I would ultimately like to do for #25. The shrinking itself shouldn't be an issue here though - it's the targeting that's unaware of the structure, so it never ends up finding a counterexample to shrink. That gives me an idea though - if we move the fancy encoding from generation to the shrinking logic, we should be able to circumvent this entirely 🤔 E.g. by first shrinking the mantissa until the number is an integer, then the exponent.. I'll have to think about that. This would keep the
Well, it's not like the old encoding is fundamentally broken; the counterexamples just shrink a bit differently by default, and don't shrink to "pretty" integers first.
Yeah, definitely - I just wanted to let you know that the new approach does have some downsides too. It's a bit unfortunate, but that's how this goes sometimes :/ Still, thank you very much for putting in the effort! |
Utilizing the new encoding in the shrinking step, while keeping the simple encoding for the generation and target steps seems like a good possibility. AFAICT right now though there isn't any place in the code that simultaneously knows that shrinking is occurring and that a particular choice represents a float. I guess maybe #25 needs to be implemented first? |
Well.. sort of 😅 Lines 93 to 107 in 1bb7363
Nevertheless, if you want to give it a try, throwing the "inverted encoding" into |
I gave quick try to using the lexical encoding in shrink float, but because it is applied to all choices it screws up shrinking of integers. I did this for a quick and dirty test function shrink_float(ts::TestState, attempt::Attempt)
new = Attempt(copy(attempt.choices), attempt.generation, attempt.max_generation)
for idx in eachindex(new.choices)
old = new.choices[idx]
old_float = reinterpret(Float64, old)
if isnan(old_float)
n_val = copysign(Inf, old_float)
new.choices[idx] = reinterpret(UInt64, n_val)
if consider(ts, new)
return Some(new)
end
end
lex = FloatEncoding.float_to_lex(old_float)
res = bin_search_down(0, lex, n -> begin
new_float = FloatEncoding.lex_to_float(Float64, n)
new.choices[idx] = reinterpret(UInt64, copysign(old_float, new_float))
consider(ts, new)
end)
new.choices[idx] = @something res Some(new.choices[idx])
end
if new.choices == attempt.choices
return nothing
else
return Some(new)
end
end while reverting Supposition.jl/test/runtests.jl Line 121 in 1bb7363
Supposition.jl/test/runtests.jl Line 129 in 1bb7363
Supposition.jl/test/runtests.jl Line 147 in 1bb7363
because the code now |
Haah.. that's unfortunate :( Guess this really does need #25 ... Thanks for giving it a try though, much appreciated! I'll probably cut a release with the current state of |
PR Description
This implements an encoding of floating point numbers which is more amenable to nice shrinking as discussed in #22.
The basic idea is to reinterpret unsigned integers in a manner somewhat similar to the IEE754 encoding, but modified such that smaller
UInt
s correspond to "simpler" floats.The encoding is a port of the one used in Hypothesis
PR Checklist
Draft
!)