Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random API proposal #132

Merged
merged 8 commits into from
Jul 23, 2018
253 changes: 253 additions & 0 deletions proposals/stdlib/random.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# Multiplatform random number generators

* **Type**: Standard Library API proposal
* **Author**: Ilya Gorbunov
* **Contributors**: Roman Elizarov, Vsevolod Tolstopyatov, Pavel Punegov
* **Status**: Submitted
* **Prototype**: In progress
* **Related issues**: [KT-17261](https://youtrack.jetbrains.com/issue/KT-17261)
* **Discussion**: [KEEP-131](https://github.com/Kotlin/KEEP/issues/131)


## Summary

Introduce an API in the Standard Library to:

- generate pseudo-random numbers conveniently and efficiently without much ceremony,
- generate reproducible pseudo-random number sequences,
- allow using custom pseudo-random number algorithms or even true sources of randomness
with the same API.

## Similar API review

* Java: [`java.util.Random`](https://docs.oracle.com/javase/6/docs/api/java/util/Random.html),
[`java.util.concurrent.ThreadLocalRandom`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadLocalRandom.html)
* Scala: [`scala.util.Random`](https://www.scala-lang.org/api/current/scala/util/Random.html)
* Apache commons: [RandomGenerator](http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/random/RandomGenerator.html)
* [Apache RNG](https://commons.apache.org/proper/commons-rng/userguide/rng.html):
[UniformRandomProvider](https://commons.apache.org/proper/commons-rng/commons-rng-client-api/apidocs/org/apache/commons/rng/UniformRandomProvider.html)

## Motivation and use cases

While there's no problem with generating random numbers in Kotlin/JVM (aside that `ThreadLocalRandom`
isn't available prior to JDK 7), there's no API yet to do that in Kotlin for other targets,
and more importantly there's no way to do that consistently in multiplatform code.

The use cases of random API are established, but here we name a few:

* Sampling a random element from a collection
* Collection shuffling
* Random game content and level generation
* Simulations requiring a source of random numbers
* Using randomness in tests to cover more code execution paths

## Alternatives

The current alternatives require finding some platform libraries and providing common multiplatform API
for them or implementing random generators by hand.

## Placement

This API shall be placed into the Kotlin Standard Library. Regarding the package the following options were considered:

- Use the existing `kotlin` package which is imported by default.
- Use the existing `kotlin.math` package
- Create new `kotlin.random` package and place API there.

We decided to stick with the latter as it would be more explorable in docs: all the related API will be shown close to each other.
Also it's more future proof in regard to possible API additions.

A downside of a new package is that it will require new imports, especially if there will be extensions.
A class named `Random` from that package can be confused with `java.util.Random` in the completion list.
It should be investigated whether we could make that package imported by default in Kotlin 1.3.

## Reference implementation

The initial implementation is [available](https://github.com/JetBrains/kotlin/compare/4d51d13~1...f7337cc) in 1.3-M1 branch.

## Dependencies

What are the dependencies of the proposed API:

* JDK: `java.util.Random` in JDK 6,7, `ThreadLocalRandom` in JDK 8+
* JS: `Math.random` function as the source of randomness seed

## API details

The following code snippet summarizes the public API introduced by this proposal.

```kotlin
abstract class Random {

abstract fun nextBits(bitCount: Int): Int

open fun nextInt(): Int
open fun nextInt(bound: Int): Int
open fun nextInt(origin: Int, bound: Int): Int
open fun nextInt(range: IntRange): Int

open fun nextLong(): Long
open fun nextLong(bound: Long): Long
open fun nextLong(origin: Long, bound: Long): Long
Copy link
Contributor

@voddan voddan Jul 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the parameter names origin and bound very non-intuitive. I kinda guess they mean min and max, but I still have no idea if they are inclusive or exclusive. I don't think we should copy obscure parameter names from JDK.

IMHO the most intuitive naming would be the same as for Kotlin ranges: start and endInclusive , or a variation on this like diapasonStart and diapasonEndInclusive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bound is an exclusive end, something that shouldn't be crossed. This might look unclear in the API summary, but I hope that the method documentation will clarify that. If it won't, we can reconsider the parameter names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then it's diapasonStart and diapasonEndExclusive. You see what I mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

origin is really a bit confusing. It may give an impression that this parameter is maybe somehow used as the source of randomness instead of the internal state of the class, or that the returned value is tied to this parameter value (more than to bound) in some other sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've recorded the question about parameter names.

Ok, then it's diapasonStart and diapasonEndExclusive

While we have other places in the library, where we specify half-open range with two parameters, we haven't used exclusive word in identifiers yet: usually it's just startIndex/endIndex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have settled on from and until parameter names.

open fun nextLong(range: LongRange): Long

open fun nextBoolean(): Boolean

open fun nextDouble(): Double
open fun nextDouble(bound: Double): Double
open fun nextDouble(origin: Double, bound: Double): Double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason to have 3 overloaded methods instead of one with default parameters like fun nextDouble(origin: Double = 0.0, bound: Double = Double.MAX_VALUE)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • nextDouble() returns value from 0.0 and up to 1.0
  • overloads with less parameters allow more efficient implementations (less parameter checking, less edge cases)

Copy link
Contributor

@voddan voddan Jul 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then 3 questions:

  1. How much more efficient is nextDouble() than a method with default parameters?
  2. Is that efficiency worth having a separate nextDouble()?
  3. Is that efficiency still worth having a relatively rare nextDouble(bound: Double)?

Note that 2) and 3) combined effectively bloat the API two fold.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the precise measurements, but all of the following operations definitely add up. nextDouble without parameters doesn't have to:

  • check if the optional parameters are specified and substitute them with their default values
  • check if those parameters are valid
  • calculate the range bound - origin
  • check if it is finite
  • multiply the random [0, 1) by the range and add the origin
  • check if the resulting value is equal to bound.

open fun nextFloat(): Float

open fun nextBytes(array: ByteArray, fromIndex: Int = 0, toIndex: Int = array.size): ByteArray
open fun nextBytes(array: ByteArray): ByteArray
open fun nextBytes(size: Int): ByteArray

// The default random number generator
// Safe to use concurrently in JVM, and to transfer between workers in Native
companion object : Random() {
// overrides all methods delegating them to some platform-specific RNG implementation
}
}

// constructor-like functions to get a reproducible RNG implementation with the specified seed
fun Random(seed: Int): Random
fun Random(seed: Long): Random

// JVM-only: functions to wrap java.util.Random in kotlin.random.Random and vice-versa
fun java.util.Random.asKotlinRandom(): Random

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be .toKotlinRandom() to keep consistency with .to*() functions provided by stdlib for variable types already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because "to" would suggest that you create a new random instance which is no longer dependent on the original. This only wraps the java random instance.
This is consistent with the naming. One other example of this would be List<T>.asReversed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally dont understand why we need Random.asJavaRandom() and java.util.Random.asKotlinRandom() on the Random object. Anyone can explain it to me ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine you use 2 libraries. One written in java and the other one in Kotlin. Both need a random object and you want to use the same one. Now you have to pass them from one library to the other. The problem now is that the java library expects an instance of type java.util.Random and the kotlin library expects a instance of type kotlin.random.Random, but those 2 are not the same. That's why you need those 2 functions to create wrappers around the objects to "cast" type to the other.

Copy link

@vmichalak vmichalak Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation, it's more clear for me now. But if tomorrow, the implementation of ThreadLocalRandom change in Java, these methods will be a problem no ? (because of multiplatform compatibility constrain)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no multiplatform compatibility constraint for an arbitrary implementation of Random, only for that returned by Random(seed) function.

fun Random.asJavaRandom(): java.util.Random
```

### Default random number generator

The default RNG implementation can be obtained from the `Random` class companion object.
In fact that object is just an implementation of `Random`, so one can call its methods like

```kotlin
Random.nextInt(10)
```

There were two alternative options of getting the default RNG considered:

- Having a top-level property like `random` or `defaultRandom`
- That was rejected because of high possibility of shadowing that property with a more local member

- Having a property in `Random` companion object, so that its methods are called like
```kotlin
Random.default.nextInt(10)
```
- That was rejected as too verbose, similar to `ThreadlocalRandom.current().nextInt()` calls

The default implementation can be used when it doesn't matter what particular RNG implementation is required
and what the state of that implementation is.

**Platform specifics**

The default implementation may be different in different platforms:

- In JVM it's a wrapper around `ThreadLocal<Random>` or `ThreadLocalRandom.current()`
- In JS it's a repeatable implementation (see below) seeded with a random number

**Serialization**

In JVM the default implementation is serialized like a singleton, so after the deserialization it should point
to the same companion object instance.


### Repeatable pseudo-random number generator

Two constructor-like functions are provided to obtain a repeatable sequence generator,
seeded with the specified number:

```kotlin
val random = Random(seed)
...
random.nextInt(10)
```

The `seed` can be either `Int` or `Long` number.

Two random number generators obtained from the same seed produce the same sequence of numbers.

**Platform specifics**

- All platforms use the same implementation, so a sequence produced from the same seed is same on all platforms.
- In JVM the generator obtained this way is not thread-safe and should not be used concurrently.
- In Native the generator is not freezable (it becomes not functional when freezed) and should not be shared between workers.

**Serialization**

In JVM the repeatable generator is serialized by saving its state, so after the deserialization it will produce numbers
from the state it was before the serialization.

**Implementation requirements**

There are a plenty of pseudo-random generator implementations to choose from.
We decided to provide one that is good enough given the following constraints:

- an implementation should be simple to be easily verifiable
- it shouldn't require 64-bit `Long` type as it may have performance implications in Kotlin/JS, where longs are not supported natively
- it shouldn't require many operations or complex operations to generate the next number
- shouldn't have a lot of state to maintain
- should perform well in randomness tests
- the period of the generator should be long

Basing on these constraints we have chosen XORWOW algorithm. (George Marsaglia. Xorshift RNGs. Journal of Statistical Software, 8(14), 2003. Available at http://www.jstatsoft.org/v08/i14/paper.)

### Custom implementations

All methods in the `Random` abstract class have some reasonable implementations except `nextBits`,
which is the abstract one, so to implement a custom random number generator only that one has to be overridden.

However if a custom generator can provide a more effective implementation for the other methods,
it can override them too, since they all are open.

On JVM there's an extension function `asKotlinRandom()` to wrap any `java.util.Random` implementation into `kotlin.random.Random`.

### Collection shuffling

It becomes possible to provide extensions for collection shuffling with the specified source of randomness in the common standard library:

```kotlin
fun <T> MutableList<T>.shuffle(random: Random): Unit
fun <T> Iterable<T>.shuffled(random: Random): List<T>
```

The existing `shuffle()` and `shuffled()` can be reimplemented by delegating to `shuffle(Random)` and `shuffled(Random)` respectively.

## What has to be done

- [ ] Make the implementations serializable
- [ ] Provide unsigned counterparts like `nextUInt`, `nextULong`, `nextUBytes` as extensions

## Unresolved questions

**Fixed implementation of seeded generator**

What are the guarantees about the implementation of the seeded generator?
Should we fix its implementation and never change it in the future?

- An option here is to state that `Random(seed)` returns an unspecified repeatable RNG implementation
that can be changed later, for example in some 1.M Kotlin release, and one can obtain a fixed repeatable RNG with
an additional enum parameter:

Random(seed, RandomImplementation.XORWOW)

**`Random` identifier overloading**

In the current naming scheme `Random` name is used to denote the abstract class, its companion, and two constructor-like functions.
This makes it problematic to refer a correct overload of the name in the documentation.

**Do we need `nextBits` method?**

Instead of `nextBits(n)` one can use `nextInt()` shifting the result right by `32 - n` bits.

## Future advancements

* Extension on `List`/`Array`/`CharSequence` to select a random element from the collection.

* Extensions or members to generate sequences, like `ints()`, `ints(bound)`, `ints(range)` etc.

We haven't found the use cases justifying introduction of these functions yet.