Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UniformRange.isInRange function #78

Merged
merged 4 commits into from
Jan 21, 2021
Merged

Add UniformRange.isInRange function #78

merged 4 commits into from
Jan 21, 2021

Conversation

Bodigrim
Copy link
Contributor

@Bodigrim Bodigrim commented Jul 1, 2020

Following our discussions earlier, here is my proposal for a new isInRange function to describe rigorously what exactly UniformRange instance means by "range". This will allow us to define lawful UniformRange instances for tuples and complex numbers.

Copy link
Contributor

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if these laws are sufficient, but looks ok.

For example, I can't seem to derive this isInRange (lo, hi) hi == True

But as far as the approach to tackle the problem with ranges this seems a perfectly viable solution.

@lehins
Copy link
Contributor

lehins commented Jul 1, 2020

@Shimuuar I think you'd be the perfect guy to try and challenge this approach, I know how much you love the UniformRange class ;)

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Jul 1, 2020

For example, I can't seem to derive this isInRange (lo, hi) hi == True

By symmetry and by inclusivity: isInRange (lo, hi) hi = isInRange (hi, lo) hi = True.

@lehins
Copy link
Contributor

lehins commented Jul 1, 2020

Duhh 😄 🤦

Thanks ;)

@Shimuuar
Copy link
Contributor

Shimuuar commented Jul 2, 2020

I really like idea!

But I don't understand what does 3rd law states

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Jul 2, 2020

@Shimuuar I hope it is more clear now.

@Shimuuar
Copy link
Contributor

Shimuuar commented Jul 2, 2020

Yes. Much clearer now. Thanks!

Right now I'm trying to invent pathological instantiations of these laws for reals. In this case brackets are seriously overloaded. I will use <a,b> for set corresponding to parameter of uniformRM. For clarity I'll assume that a<b. Obvious instantiation one which we actually want is:

<a,b> = [a, b]

But following one satisfy all law despite being clearly pathological one:

<a,b> = [a - 1, b + 1]

It could be ruled out by adding another law: <a.a> = {a} or equivalently ∀x ≠ a : inRange (a,a) x = False. It however cannot rule out

<a,b> = [ a - (b - a), b + (b - a) ]

On the other hand this is just a reparametrization of good instance.

@Shimuuar
Copy link
Contributor

Shimuuar commented Jul 2, 2020

Even worse. This:

<a,b> = [ (a+b)/2  - f((b-a)/2), (a+b)/2 + f((b-a)/2)] 

will work for any f such that f(x) >= x, x>0

@Shimuuar
Copy link
Contributor

Shimuuar commented Jul 2, 2020

Please disregard all "bad" example above.. They actually fail 3-rd law. I guess it's getting late

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Jul 2, 2020

I think you are right in the sense that the proposed laws describe what is in range, but do not describe what is not. For example, isInRange _ _ = True is a valid function. Let's add

isInRange (lo, lo) x == (x == lo)

@Shimuuar
Copy link
Contributor

Shimuuar commented Jul 3, 2020

Now I found truly pathological case: <a,b> = {a,b}. In other words include only endpoints.

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Jul 3, 2020

Nitpick: for Bounded types <a,b> = {a,b} will be still unlawful.
In general, I agree that this one is pathological, but if someone has such fancy, why not? In this case uniformRM just chooses one of two values, which might be useful in some circumstances.

I made the last law about trivial ranges more strict: now it basically enforces that lo and hi are at the bound of <lo, hi>.

@Shimuuar
Copy link
Contributor

Shimuuar commented Jul 3, 2020

@Bodigrim what I trying to do is to understand what is meaning of these laws. Math is scary and wonderful thing. You put identity, associativity, and inverse in and you get group theory out. So question is what sort of structures these laws are generating?

I got few more weird constructions and will pot them tomorrow after I flesh them out.

@Shimuuar
Copy link
Contributor

Shimuuar commented Jul 4, 2020

TL;DR it seems that there's no shortage of weird constructions.

Here I'll consider only integer numbers Z and their finite subsets [0..N-1]

  • I'll call ranges <a,b> = [a,b] — standard ranges.
  • On diagrams x is endpoint of interval <a,b> that is a or b. o is
    midpoint, x : x ∈ <a,b>, x ≠ a, x ≠ b

Thinning

One approach for constructing more ranges is to drop some elements from standard ranges. Extreme version is to leave only endpoints since they are required by law 2). But there're more possibilities. For example it's possible to include only every other element in the range.

x
xx
x x
x  x
x o x
x    x
x o o x
...

In similar way it's possible to leave out every third etc.

Fattening

Opposite approach is to include more elements into standard ranges. Extreme variant is to make <a,b> = Z. I guess there're less extreme variants as well.

Permutations

For finite sets it's possible to simply apply permutation to set and obtain set of ranges. Take for example set [0,2]. Below are standard and permuted intervals:

x
 x
  x
xx   x x  x x
 xx   xx  xx
xox  xxo  oxx

I don't understand yet how this approach apply to integers and reals

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Jul 4, 2020

"Thinning" is almost fine, except the law for Bounded.

"Fattening" violates the 4th law.

I'm not sure what you mean by permutations here, could you provide a snippet? But it seems equivalent to making a newtype with compare (Foo x) (Foo y) = compare (permutation x) (permutation y). So yes, it is a totally valid range for the correspondent total order.

@Shimuuar
Copy link
Contributor

Shimuuar commented Jul 4, 2020

Yes, permutations are equivalent to changing order of values. It seems something unavoidable unless order is brought in. But I think it's overspecifying things.

@Shimuuar
Copy link
Contributor

Laws

First let recapitulate laws:

  1. Symmetry: <a,b> = <b,a>, another way to formulate this law is to work with
    unordered pairs.

  2. Bounds are inclusive: a ∈ <a,b> and <a,a> = {a}

  3. Transitivity: c ∈ <a,b> ⇒ <a,c> ⊂ <a,b>

  4. No overflow: с ∈ <a,b> & c ≠ b & c ≠ a ⇒ c ∉ <a,b>

Finite sets

For start let work with some finite set A. Ranges are function from set of
unordered pairs of A's elements to set of all subsets of A. Let start by
considering only first 3 laws. <a,a> = {a} defines ranges for pair with equal
elements so we can concentrate on pairs of inequal elements.

There's useful change of perspective. Ranges form a partial order with
relation as comparison. But law 3 means that if we given some partial order
between unordered pair we can reconstruct sets corresponding to pairs. Procedure
is simple: c ∈ <a,b> ⇔ <a,c> ⊂ <a,b>. Not all orders satisfy axioms though.

Now we can easily see two extremes. Thin order: <a,b> = {a,b} where no two
ranges are comparable, and fattest one: <a,b> = A.

4th law

Now what is effect of fourth law? It states that ranges sharing endpoint are not
equal. It however doesn't prevent <a,b> = <c,d> if all endpoints are
different. Let see for example following construction:

1234
xxoo
x x 
x  x
 xx 
 x x
ooxx

or same using with different notation:

<1,2> = {1,2,3,4}
<3,4> = {1,2,3,4}
<a,b> = {a,b}

It looks that law doesn't quite protect from "range overflow" whatever it
means. It's possible to strengthen it by requiring injectivity. That is
different pairs yield different intervals.

@Bodigrim
Copy link
Contributor Author

It seems that your analysis ignores the 5th rule, which would require <1,4> = {1,2,3,4} and prohibit both <1,2> = {1,2,3,4} and <3,4> = {1,2,3,4}.

@Shimuuar
Copy link
Contributor

This one?

isInRange (minBound, maxBound) x == True

I do. It seems very ad hoc to me. It doesn't generalize to infinite sets (integers) it does nothing if data type is not Bounded. It's also very easy to sidestep. Let add elements 0 & 5 to constrution above:

<0,5> = {0,1,2,3,4,5}
<0,x> = {0,x}
<x,5> = {x,5}
<1,2> = {1,2,3,4}
<3,4> = {1,2,3,4}
<a,b> = {a,b}

@Bodigrim
Copy link
Contributor Author

I like the idea of ranges being injective, but I struggle to find a way to express it in a constructive, property-testable fashion, without "there exists..." clause.

@Shimuuar
Copy link
Contributor

I think it's fine to leave some laws to on paper verification. Especially since in order to get pathological constructions one have to really go out his way. It's not sort of things one can construct accidentally.

P.S. I should get back to constructing possible partial orders. They grow only modestly so it's possible to explore further than 4 elements sets

@Bodigrim
Copy link
Contributor Author

@Shimuuar Convinced, I've added the injectivity law and removed a law for Bounded. Rebased and squashed.

@Shimuuar
Copy link
Contributor

@Bodigrim sorry for being slow.

Law 3 (transitivity)

I think it should be strengthened. Currently it's c ∈ <a,b> ⇒ <a,c> ⊂ <a,b>. I propose to strengthen it to c,d ∈ <a,b> ⇒ <c,d> ⊂ <a,b>. meaning is very straightforward: it;s not possible to get outside of in the interval. Any interval made from point from <a,b> is its subset. Note that current for is not sufficient since we can't say anything about <c,d>. Here is counterexample:

<a,b> = {a,b,c,d}
<a,c> = {a,c}
<b,c> = {b,c}
<a,d> = {a,d}
<b,d> = {b,d}
<c,d> = {c,d,e}

Former law 4.

AFAIR it said that interval endpoints are literal endpoints and any interval constructed from interior points is endpoint: c∈<a,b>, c≠a ⇒ a∉<c,b>.

There's curious consequence of these laws: if we take any 3 points a,b,c. Then we can build 3 intervals from them: <a,b>, <b,c>, <a,c> and there're only two possibilities: either two intervals are subset of third and not subsets or each other. Or all three subsets are not comparable.

If we take ordinary intervals for bounded types (say Int). Then for every 3 points first option is realized. I suspect that if for every 3 point there's interval that contains other two. System of intervals is isomorphic to standard intervals. But that something that should be checked

@Shimuuar
Copy link
Contributor

@Bodigrim I spent some time thinking about the matter and come to following set of laws:

  1. Symmetry: <a,b> = <b,a>
  2. Endpoints are part of interval: a,b ∈ <a,b>
  3. When endpoints coincide there's nothing else: <a,a> = {a}
  4. No escape: ∀c,d ∈ <a,b> : <c,d> ⊂ <a,b>.
  5. Endpoints are endpoints: ∀c,d ∈ <a,b> : a∈<c,d> ⇒ a=c || a=d. I think it's good way to say that points are in some sense at the border of interval without invoking order.
    5

Notable difference from first version is that instead of picking single point from interval they pick two. It turns out it's not possible to go from ∀c∈<a,b> : <a,c>⊂<a,b> to ∀c,d ∈ <a,b> : <c,d> ⊂ <a,b> and I think it's worthwhile to strengthen law a bit. Same for 5.

Another point these laws do not imply injectivity. Common variant ranges (elementwise) for tuples satisfy these laws. And those not unique: <(1,10), (2,20> = <(2,10), (1,20)>.

All in all I think this set of law is quite sensible. It doesn't seem to be possible to specify things more concretely without using order in some way. They do allow element tuple instances but I don;t think it's necessary to disallow them

On more practical note I'm slightly worried about default implementation of inRange. It's implemented in terms of type class which is only tangentially related to the UniformRange. Maybe default implementation in terms of Generic for both methods should be added? If it will only work for single constructor, 0-1 field data types it will at least cover newtypes.

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Dec 7, 2020

@Shimuuar sorry for being super late.

Maybe default implementation in terms of Generic for both methods should be added?

Sure, but let's do this in a separate PR. I'll revert db65185 with appropriate changes for product types and add generic derivation for isInRange.

Endpoints are endpoints: ∀c,d ∈ <a,b> : a∈<c,d> ⇒ a=c || a=d.

This does not hold for tuples (with rectangular ranges). E. g., take a=(0,0), b=(1,1), c=(0,1), d=(1,0).

Otherwise looks good, updated. I think four rules are good enough to go.

@Shimuuar
Copy link
Contributor

This does not hold for tuples (with rectangular ranges)

You're right. Then law could be changed to weaker version (AFAIR already proposed)

∀c,d ∈ <a,b> : b∈<a,c> ⇒ b=c

Other that this I think PR is good to go

@Bodigrim
Copy link
Contributor Author

CI failure is unrelated to this PR, but still an interesting one: it uses nightly-2021-01-17, but somehow ends up building smallcheck-1.2.0 instead of smallcheck-1.2.1. CC @lehins

@lehins
Copy link
Contributor

lehins commented Jan 18, 2021

@Bodigrim smallcheck-1.2.0 is specified in stack.yaml It's an easy fix, I'll submit a PR in a sec

@Bodigrim
Copy link
Contributor Author

@curiousleo @idontgetoutmuch unless you have any comments, I'll merge this soon.

@idontgetoutmuch
Copy link
Member

LGTM

Copy link
Contributor

@curiousleo curiousleo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice. Great work everyone for constructively criticising this towards an elegant and succinct definition!

src/System/Random/Internal.hs Outdated Show resolved Hide resolved
Co-authored-by: Leonhard Markert <curiousleo@users.noreply.github.com>
@Bodigrim Bodigrim merged commit 46a15af into haskell:master Jan 21, 2021
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jan 29, 2025
# 1.3.0

* Improve floating point value generation and avoid degenerate cases: [#172](haskell/random#172)
* Add `Uniform` instance for `Maybe` and `Either`: [#167](haskell/random#167)
* Add `Seed`, `SeedGen`, `seedSize`, `seedSizeProxy`, `mkSeed` and `unSeed`:
  [#162](haskell/random#162)
* Add `mkSeedFromByteString`, `unSeedToByteString`, `withSeed`, `withSeedM`, `withSeedFile`,
  `seedGenTypeName`, `nonEmptyToSeed`, `nonEmptyFromSeed`, `withSeedM`, `withSeedMutableGen` and `withSeedMutableGen_`
* Add `SplitGen` and `splitGen`: [#160](haskell/random#160)
* Add `unifromShuffleList` and `unifromShuffleListM`: [#140](haskell/random#140)
* Add `uniformWordR`: [#140](haskell/random#140)
* Add `mkStdGen64`: [#155](haskell/random#155)
* Add `uniformListRM`, `uniformList`, `uniformListR`, `uniforms` and `uniformRs`:
  [#154](haskell/random#154)
* Add compatibility with recently added `ByteArray` to `base`:
  [#153](haskell/random#153)
  * Switch to using `ByteArray` for type class implementation instead of
    `ShortByteString`
  * Add `unsafeUniformFillMutableByteArray` to `RandomGen` and a helper function
    `defaultUnsafeUniformFillMutableByteArray` that makes implementation
    for most instances easier.
  * Add `uniformByteArray`, `uniformByteString` and `uniformFillMutableByteArray`
  * Deprecate `genByteString` in favor of `uniformByteString`
  * Add `uniformByteArrayM` to `StatefulGen`
  * Add `uniformByteStringM` and `uniformShortByteStringM`
  * Deprecate `System.Random.Stateful.uniformShortByteString` in favor of `uniformShortByteStringM` for
    consistent naming and a future plan of removing it from `StatefulGen`
    type class
  * Add a pure `System.Random.uniformShortByteString` generating function.
  * Deprecate `genShortByteString` in favor of `System.Random.uniformShortByteString`
  * Expose a helper function `fillByteArrayST`, that can be used for
    defining implementation for `uniformByteArrayM`
  * Deprecate `genShortByteStringST` and `genShortByteStringIO` in favor of `fillByteArrayST`
* Improve `FrozenGen` interface: [#149](haskell/random#149)
  * Move `thawGen` from `FreezeGen` into the new `ThawGen` type class. Fixes an issue with
    an unlawful instance of `StateGen` for `FreezeGen`.
  * Add `modifyGen` and `overwriteGen` to the `FrozenGen` type class
  * Switch `splitGenM` to use `SplitGen` and `FrozenGen` instead of deprecated `RandomGenM`
  * Add `splitMutableGenM`
  * Switch `randomM` and `randomRM` to use `FrozenGen` instead of `RandomGenM`
  * Deprecate `RandomGenM` in favor of a more powerful `FrozenGen`
* Add `isInRangeOrd` and `isInRangeEnum` that can be used for implementing `isInRange`:
  [#148](haskell/random#148)
* Add `isInRange` to `UniformRange`: [#78](haskell/random#78)
* Add default implementation for `uniformRM` using `Generics`:
  [#92](haskell/random#92)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants