Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a random_benchmark() method #240

Closed
ChrisCummins opened this issue Apr 30, 2021 · 0 comments · Fixed by #247
Closed

Add a random_benchmark() method #240

ChrisCummins opened this issue Apr 30, 2021 · 0 comments · Fixed by #247
Assignees
Labels
Enhancement New feature or request
Milestone

Comments

@ChrisCummins
Copy link
Contributor

🚀 Feature

The v0.1.8 release removed the random benchmark selection from CompilerGym environments when no benchmark was specified. If the user wishes for random benchmark selection, they must now roll their own implementation. For users who want to select benchmarks randomly, we should provide a simple Dataset.random_benchmark() option.

Motivation

Randomly sampling from env.dataset.benchmark_uris() is not always easy as the generator may be infinite. For some datasets, e.g. Csmith, it is trivial to select random benchmarks by generating random numbers within the range of numeric seed values, but this is not obvious and the user shouldn't have to figure this out for the simple case of uniform random selection.

Pitch

Extend the dataset classes with a random_benchmark() method:

class Dataset:
    ...
    def random_benchmark(self, random_state: np.random.Generator = None):
        """Select a benchmark uniformly randomly."""
        raise NotImplementError

class Datasets:
    ...
    def random_benchmark(self, random_state: np.random.Generator = None):
        """Select a dataset uniformly randomly and then select a benchmark uniformly randomly."""
        ...

This method can be implemented by subclasses to efficiently select a benchmark using the provided RNG.

Alternatives

We don't provide any randomness methods. We require that users first enumerate a finite set of benchmark URIs and then sample it. This has the advantage of making the users think explicitly about the random distributions they wish to use. The downside is that it is more complex to roll your own random selection, and most users probably just want a uniform selection anyway.

@ChrisCummins ChrisCummins added the Enhancement New feature or request label Apr 30, 2021
@ChrisCummins ChrisCummins added this to the v0.1.9 milestone Apr 30, 2021
ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this issue May 4, 2021
The v0.1.8 release removed the random benchmark selection from
CompilerGym environments when no benchmark was specified. If the user
wishes for random benchmark selection, they were required to roll
their own implementation. Randomly sampling from
env.dataset.benchmark_uris() is not always easy as the generator may
be infinite. For some datasets, e.g. Csmith, it is trivial to select
random benchmarks by generating random numbers within the range of
numeric seed values, but this is not obvious and the user shouldn't
have to figure this out for the simple case of uniform random
selection.

This adds a `random_benchmark()` method to the `Dataset` class which
allows uniform random benchmark selection, and a `random_benchmark()`
method to the `Datasets` class for sampling across datasets.

Issue facebookresearch#240.
ChrisCummins added a commit to ChrisCummins/CompilerGym that referenced this issue May 4, 2021
The v0.1.8 release removed the random benchmark selection from
CompilerGym environments when no benchmark was specified. If the user
wishes for random benchmark selection, they were required to roll
their own implementation. Randomly sampling from
env.dataset.benchmark_uris() is not always easy as the generator may
be infinite. For some datasets, e.g. Csmith, it is trivial to select
random benchmarks by generating random numbers within the range of
numeric seed values, but this is not obvious and the user shouldn't
have to figure this out for the simple case of uniform random
selection.

This adds a `random_benchmark()` method to the `Dataset` class which
allows uniform random benchmark selection, and a `random_benchmark()`
method to the `Datasets` class for sampling across datasets.

Issue facebookresearch#240.
@ChrisCummins ChrisCummins self-assigned this Jul 13, 2021
bwasti pushed a commit to bwasti/CompilerGym that referenced this issue Aug 3, 2021
The v0.1.8 release removed the random benchmark selection from
CompilerGym environments when no benchmark was specified. If the user
wishes for random benchmark selection, they were required to roll
their own implementation. Randomly sampling from
env.dataset.benchmark_uris() is not always easy as the generator may
be infinite. For some datasets, e.g. Csmith, it is trivial to select
random benchmarks by generating random numbers within the range of
numeric seed values, but this is not obvious and the user shouldn't
have to figure this out for the simple case of uniform random
selection.

This adds a `random_benchmark()` method to the `Dataset` class which
allows uniform random benchmark selection, and a `random_benchmark()`
method to the `Datasets` class for sampling across datasets.

Issue facebookresearch#240.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant