Add a random_benchmark() method #240

ChrisCummins · 2021-04-30T17:14:09Z

🚀 Feature

The v0.1.8 release removed the random benchmark selection from CompilerGym environments when no benchmark was specified. If the user wishes for random benchmark selection, they must now roll their own implementation. For users who want to select benchmarks randomly, we should provide a simple Dataset.random_benchmark() option.

Motivation

Randomly sampling from env.dataset.benchmark_uris() is not always easy as the generator may be infinite. For some datasets, e.g. Csmith, it is trivial to select random benchmarks by generating random numbers within the range of numeric seed values, but this is not obvious and the user shouldn't have to figure this out for the simple case of uniform random selection.

Pitch

Extend the dataset classes with a random_benchmark() method:

class Dataset:
    ...
    def random_benchmark(self, random_state: np.random.Generator = None):
        """Select a benchmark uniformly randomly."""
        raise NotImplementError

class Datasets:
    ...
    def random_benchmark(self, random_state: np.random.Generator = None):
        """Select a dataset uniformly randomly and then select a benchmark uniformly randomly."""
        ...

This method can be implemented by subclasses to efficiently select a benchmark using the provided RNG.

Alternatives

We don't provide any randomness methods. We require that users first enumerate a finite set of benchmark URIs and then sample it. This has the advantage of making the users think explicitly about the random distributions they wish to use. The downside is that it is more complex to roll your own random selection, and most users probably just want a uniform selection anyway.

The text was updated successfully, but these errors were encountered:

The v0.1.8 release removed the random benchmark selection from CompilerGym environments when no benchmark was specified. If the user wishes for random benchmark selection, they were required to roll their own implementation. Randomly sampling from env.dataset.benchmark_uris() is not always easy as the generator may be infinite. For some datasets, e.g. Csmith, it is trivial to select random benchmarks by generating random numbers within the range of numeric seed values, but this is not obvious and the user shouldn't have to figure this out for the simple case of uniform random selection. This adds a `random_benchmark()` method to the `Dataset` class which allows uniform random benchmark selection, and a `random_benchmark()` method to the `Datasets` class for sampling across datasets. Issue facebookresearch#240.

ChrisCummins added the Enhancement New feature or request label Apr 30, 2021

ChrisCummins added this to the v0.1.9 milestone Apr 30, 2021

ChrisCummins mentioned this issue May 4, 2021

[datasets] Add a random_benchmark() method. #247

Merged

ChrisCummins closed this as completed in #247 May 4, 2021

ChrisCummins self-assigned this Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a random_benchmark() method #240

Add a random_benchmark() method #240

ChrisCummins commented Apr 30, 2021

Add a random_benchmark() method #240

Add a random_benchmark() method #240

Comments

ChrisCummins commented Apr 30, 2021

🚀 Feature

Motivation

Pitch

Alternatives