Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved functions from Random to Gen #238

Closed
wants to merge 17 commits into from
Closed

Moved functions from Random to Gen #238

wants to merge 17 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Nov 10, 2020

Breaking change, but alluded to in #177

I wasn't able to get the exact type mentioned in that issue, and settled for the type found in this request.

type Gen<'a> = Gen of (Seed -> Size -> Tree<'a>)
// Needs to be this..?
type Gen<'a> = Gen of (Seed -> Size -> Tree<Option<'a>>)

Feel free to close if this is too big of a change.

@ghost
Copy link
Author

ghost commented Nov 11, 2020

There is a test failure because of the implementation of Gen.sampleTree. Any tips on what the implementation should look like are welcome.

@ghost ghost marked this pull request as draft November 11, 2020 21:12
@TysonMN
Copy link
Member

TysonMN commented Nov 11, 2020

There are many tests failing. I analyze just one of them here.

From this code to this code is not behavior preserving. Random.replicate splits the seed so that each of the count values is generated using a different seed. Your code generates each of the count values using the same seed. As such, the same value is generated every time. That is why this test (for all its parameterizations) is failing.

@ghost
Copy link
Author

ghost commented Nov 11, 2020

@TysonMN good catch, I used a local mutable (I know, ew) variable to split the seed for each iteration. I also had to add CompiledName attributes, but it all appears to be in order now, pending verification.

@ghost ghost marked this pull request as ready for review November 11, 2020 23:28
@moodmosaic
Copy link
Member

Thank you, @adam-becker. Thank you, also, @TysonMN, for reviewing this in #238 (comment) as I haven't looked into the changes yet.


Question: Shall we get to this after merging the recheck feature in? Then, the plan is to make a new release, and then review this.

@ghost
Copy link
Author

ghost commented Nov 12, 2020

I also have a question. For this code base are we supposed to use Gen.unsafeRun when creating functions inside the Gen module?

@moodmosaic moodmosaic marked this pull request as draft December 1, 2020 11:39
@ghost
Copy link
Author

ghost commented Dec 3, 2020

@moodmosaic @TysonMN The branch for this PR has been rebased, it should be ready for review again.

@ghost ghost marked this pull request as ready for review December 8, 2020 22:58
@TysonMN
Copy link
Member

TysonMN commented Jan 3, 2021

Now that 0.9.0 is released, it is time to resume the review of this PR?

@moodmosaic
Copy link
Member

Now that 0.9.0 is released, it is time to resume the review of this PR?

Yes. And perhaps we can pack this and #247 into the same (major, v10.x) release.

@ghost
Copy link
Author

ghost commented Jan 11, 2021

Rebased and ready for review again.

Copy link
Member

@moodmosaic moodmosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

src/Hedgehog/CSharp/Gen.fs Outdated Show resolved Hide resolved
src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
@moodmosaic
Copy link
Member

I just did a first round of review (I've only scanned the changes) and left a few nitpicks. Another round of review (looking at the actual impl of the moved functions) should be warranted. We should be good to go then.

I've also run Script.fsx, all the stuff in there worked as usual.

Copy link
Member

@moodmosaic moodmosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I left a comment around filter (the diff there isn't very clear, note that I'm reviewing this from the airport right now as I'm about to board on a plane).

src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
let g' = resize (2 * k + n) g
bind g' <| fun x ->
if p x then
constant (Some x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the actual filtering part was removed. (?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter part is still there, it's this part I believe (267-270):

if p x then
    constant (Some x)
else
    tryN (k + 1) (n - 1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @moodmosaic. Most of the changes in these lines are from renames or from merging modules. But where did the call to Tree.filter go?

If not calling Tree.filter is a bug, then we should make a test for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @TysonMN, it'd be nice with a test.

Copy link
Author

@ghost ghost Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moodmosaic @TysonMN I see what happened here. There were a few tryFilter* functions, tryFilterRandom and tryFilter. When Random was removed, the tryFilterRandom function was also removed. The code is close enough that the diff shows them as the same function that changed, when this isn't the case.

I believe we still need the functionality that tryFilterRandom exposed? Not sure what we want to name the function though.

src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
|> Tree.filter (atLeast (Range.lowerBound size range))
})
bind (integral range) <| fun n ->
replicate n g
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this implementation behave the same as the old/current one?

Copy link
Author

@ghost ghost Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it behaves the same, but I think it will behave as I would expect 😅 Basically just generate a number from the range, and generate that many elements as a list. As long as replicate has that behavior, and doesn't just duplicate the same element n times.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moodmosaic I will add tests for this just to be sure, but I'm pretty confident as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @adam-becker. It is hard to know if the implementations are exactly the same, but the new implementation is "the right" one (because it doesn't do a deep dive into implementation details).

However, as I pointed out in other comments, a stack overflow bug was introduced in replicate.

src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
src/Hedgehog/Gen.fs Outdated Show resolved Hide resolved
Comment on lines 56 to 62
let replicate (times : int) (g : Gen<'a>) : Gen<'a list> =
let rec loop n xs =
if n <= 0 then
constant xs
else
bind g (fun x -> loop (n - 1) (x :: xs))
loop times []
Copy link
Member

@TysonMN TysonMN Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...while this recursion lacks tail calls. The stack will overflow if times is sufficiently large as in this code.

let n = 10000
Gen.constant 1
|> Gen.replicate n
|> Gen.sample 0 1
|> ignore

This function nearly implements ListGen.sequence from PR #260. Implementing this function in terms of that one would only be slightly inefficient because of the additional call to List.rev. Even so, I suggest we first (correctly!...see below) implement traverse and sequence, and then implement this replicate in terms of sequence.

Furthermore, I now realize that traverse in PR #260 also lack tail calls. (I wrongly assumed that return! "magically" avoided this problem.)

Copy link
Author

@ghost ghost Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky bit with making this tail recursive is that you're multiple functors deep. List<Gen<'a>> where Gen<'a> returns a Tree<'a> for all of the shrinks. The Haskell version implements this with the replicateM function, but I think Haskell's laziness makes this work. Aside from that, this will be tricky and require some thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the simplest thing to do is keep the implementation like it was before. We can keep this refactor in mind and come back to it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not sure it is possible to completely avoid stack overflows.

Recall that the type 'a -> 'b is (covariant) functor in 'b. The map function for this functor is function composition (which is either >> or << with only the order of inputs chanding). Our Random<'c> type is a (covariant) functor in 'c because it is just an wrapper around the type 'a -> 'b -> 'c, which is also a (covarant) functor in 'c. We can simplify things by uncurrying to get back to a function of the form 'a -> 'b.

I don't know a way to implement map for the (covariant) functor 'a -> 'b that won't overflow the stack. It might be impossible.

Specifically, the following test fails.

[<Fact>]
let ``Does function composition overflow the stack? Answer: Yes`` () =
  let n = 100000
  let f =
    id
    |> List.replicate n
    |> List.fold (>>) id
  f ()

With this in mind, I think we should be more willing to allow implementations that can overflow the stack like the one this comment thread is about. Changing the code to put more pressure on the stack will cause it to overflow sooner, but I currently think it is impossible to completely avoid in this case.

In conclusion, I suggest that we no longer hold up this PR because of this recursion that lacks tail calls. Perhaps we can put a comment in the code to remind us of that lack of tails calls.

Comment on lines 53 to 62
let replicate (times : int) (r : Random<'a>) : Random<List<'a>> =
Random <| fun seed0 size ->
let rec loop seed k acc =
if k <= 0 then
acc
else
let seed1, seed2 = Seed.split seed
let x = unsafeRun seed1 size r
loop seed2 (k - 1) (x :: acc)
loop seed0 times []
Copy link
Member

@TysonMN TysonMN Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recursion uses tail calls, while...

@TysonMN TysonMN mentioned this pull request Jan 15, 2021
@moodmosaic moodmosaic mentioned this pull request Jan 19, 2021
@moodmosaic
Copy link
Member

@TysonMN happy to merge this if you think it's in a state that can be merged. I haven't followed after #238 (comment) because I'm still (partially) out and about.

@TysonMN
Copy link
Member

TysonMN commented Jan 23, 2021

@TysonMN happy to merge this if you think it's in a state that can be merged.

I would prefer to merge PR #266 first. I am completely convinced that removing Random is an improvement.

@ghost ghost added this to the 0.11.0 milestone Jan 31, 2021
@TysonMN
Copy link
Member

TysonMN commented Feb 1, 2021

I am completely convinced that removing Random is an improvement.

Whoa! I meant to say that I am NOT completely convinced that removing Random is an improvement.

My current feeling is that we should not do this. Random is complicated enough, and it is only going to get more complicated as we fix stack overflow bugs like #289 and #177.

I think the separation between Gen and Random was/is not so good. I think this lead to the idea that things would be simplified by combining them. Instead, my guess is that things would get better if the separation were increased.

My current thinking is that Gen<'a> should essentially be Random<Tree<'a>> and the Gen module should contain mostly trivial functions that delegate to the Random and Tree modules.

I haven't focused directly on the separation between Gen and Random. Instead, I have been thinking about this indirectly while working adding features like ListGen.traverse and fixes for bugs like issue #289.

@ghost
Copy link
Author

ghost commented Feb 1, 2021

I am completely convinced that removing Random is an improvement.

Whoa! I meant to say that I am NOT completely convinced that removing Random is an improvement.

Can you expand on why? I see the little bits you mentioned later, but these seems more like an instinct rather than a firm judgment.

My current feeling is that we should not do this. Random is complicated enough, and it is only going to get more complicated as we fix stack overflow bugs like #289 and #177.

This is why I think removing this module/type will make things simpler going forward.

I think the separation between Gen and Random was/is not so good. I think this lead to the idea that things would be simplified by combining them. Instead, my guess is that things would get better if the separation were increased.

Not sure how much more they could be separated, they are distinct types/modules now.

My current thinking is that Gen<'a> should essentially be Random<Tree<'a>> and the Gen module should contain mostly trivial functions that delegate to the Random and Tree modules.

I think Gen<'a> should be the central type of this project. It seems like there are two currently, and that leads to confusion/bloat/needless complexity.

@TysonMN
Copy link
Member

TysonMN commented Feb 2, 2021

I think the separation between Gen and Random was/is not so good. I think this lead to the idea that things would be simplified by combining them. Instead, my guess is that things would get better if the separation were increased.

Not sure how much more they could be separated, they are distinct types/modules now.

The best example is probably Gen.bindRandom. It duplicates code from Random.bind.

In my study of functional programming, I found found a common way to implement bind for nested modands C<_> and D<_>, which is

module CD =
  let bind (f: 'a -> C<D<'b>>) : C<D<'a>> -> C<D<'b>> =
    f
    |> DC.traverse
    |> C.bind
    >> C.map D.flatten

I think Gen<'a> should be the central type of this project. It seems like there are two currently, and that leads to confusion/bloat/needless complexity.

I think that is because of duplicate code like in the above example. In only Random contained code specific to Random and Gen implemented everything for the nested monadic type Gen<_> without assuming the implementation details of those monadic types, then I don't think it would feel like there are two central types.

My current feeling is that we should not do this. Random is complicated enough, and it is only going to get more complicated as we fix stack overflow bugs like #289 and #177.

This is why I think removing this module/type will make things simpler going forward.

It will be easier to fix bugs specific to Random<_> when only working with Random<_> than when working with Random<Tree<_>>.

Whoa! I meant to say that I am NOT completely convinced that removing Random is an improvement.

Can you expand on why? I see the little bits you mentioned later, but these seems more like an instinct rather than a firm judgment.

Random<_> and Tree<_> are separate monads, so they should be implemented separately.

@TysonMN
Copy link
Member

TysonMN commented Feb 2, 2021

This PR makes more functions look like Gen.bindRandom. I think we should make more functions look Random.bind, Tree.bind, and the bind for nested monadic types that I gave in the previous comment.

@TysonMN
Copy link
Member

TysonMN commented Feb 2, 2021

Oh, by flatten, I mean the function that some call join, such as here.

let join (xss : Tree<Tree<'a>>) : Tree<'a> =
bind id xss

@ghost
Copy link
Author

ghost commented Sep 6, 2021

I've redone this branch entirely, Random is entirely gone. I also went through and streamlined Gen functions that were using Random. This extra layer of indirection is completely unnecessary, and a ton of extra allocations were removed. This should also help with Hedgehog's efficiency. I'm sure I missed a few opportunities to eliminate common sub-expressions, but I think this is ready for review again.

EDIT: Also, lots of functions are now tail-recursive. Including: Gen.list, Gen.filter and Gen.tryFilter

/cc @moodmosaic @TysonMN

@TysonMN
Copy link
Member

TysonMN commented Sep 6, 2021

I am still not convinced that this is a good idea. I will review the current branch and see if I come to a different conclusion.

@ghost ghost removed this from the 0.11.0 milestone Sep 11, 2021
@ghost
Copy link
Author

ghost commented Sep 11, 2021

Closing this for now. We can re-evaluate using the new contributing process.

@ghost ghost closed this Sep 11, 2021
@ghost ghost deleted the remove-random branch September 21, 2021 16:15
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants