-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm] Add the AnghaBench dataset. #210
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Apr 23, 2021
ChrisCummins
force-pushed
the
llvm-datasets-4
branch
2 times, most recently
from
April 26, 2021 21:38
598e7a2
to
4e57d83
Compare
ChrisCummins
force-pushed
the
llvm-datasets-5
branch
2 times, most recently
from
April 26, 2021 21:42
c43019f
to
c6de6fb
Compare
JD-ETH
approved these changes
Apr 27, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, some questions I had.
This dataset will be really nice for training!
ChrisCummins
force-pushed
the
llvm-datasets-4
branch
from
April 27, 2021 07:22
4e57d83
to
26b6001
Compare
hughleat
approved these changes
Apr 27, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ChrisCummins
force-pushed
the
llvm-datasets-4
branch
from
April 27, 2021 07:51
26b6001
to
c2f6ab7
Compare
ChrisCummins
force-pushed
the
llvm-datasets-5
branch
from
April 27, 2021 08:15
c6de6fb
to
31c2b3f
Compare
ChrisCummins
force-pushed
the
llvm-datasets-4
branch
from
April 27, 2021 08:16
c2f6ab7
to
a7e80c1
Compare
ChrisCummins
force-pushed
the
llvm-datasets-5
branch
from
April 27, 2021 08:17
31c2b3f
to
9c15186
Compare
ChrisCummins
force-pushed
the
llvm-datasets-4
branch
from
April 27, 2021 09:48
a7e80c1
to
0989103
Compare
The dataset is from: da Silva, Anderson Faustino, Bruno Conde Kind, José Wesley de Souza Magalhaes, Jerônimo Nunes Rocha, Breno Campos Ferreira Guimaraes, and Fernando Magno Quinão Pereira. "ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction." In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 378-390. IEEE, 2021. Issue #45.
ChrisCummins
force-pushed
the
llvm-datasets-5
branch
from
April 27, 2021 09:49
9c15186
to
554eed1
Compare
ChrisCummins
added a commit
that referenced
this pull request
Apr 30, 2021
This release introduces some significant changes to the way that benchmarks are managed, introducing a new dataset API. This enabled us to add support for millions of new benchmarks and a more efficient implementation for the LLVM environment, but this will require some migrating of old code to the new interfaces (see “Migration Checklist” below). Some of the key changes of this release are: - [Core API change] We have added a Python Benchmark class (#190). The env.benchmark attribute is now an instance of this class rather than a string (#222). - [Core behavior change] Environments will no longer select benchmarks randomly. Now env.reset() will now always select the last-used benchmark, unless the benchmark argument is provided or env.benchmark has been set. If no benchmark is specified, a default is used. - [API deprecations] We have added a new Dataset class hierarchy (#191, #192). All datasets are now available without needing to be downloaded first, and a new Datasets class can be used to iterate over them (#200). We have deprecated the old dataset management operations, the compiler_gym.bin.datasets script, and removed the --dataset and --ls_benchmark flags from the command line tools. - [RPC interface change] The StartSession RPC endpoint now accepts a list of initial observations to compute. This removes the need for an immediate call to Step, reducing environment reset time by 15-21% (#189). - [LLVM] We have added several new datasets of benchmarks, including the Csmith and llvm-stress program generators (#207), a dataset of OpenCL kernels (#208), and a dataset of compilable C functions (#210). See the docs for an overview. - CompilerEnv now takes an optional Logger instance at construction time for fine-grained control over logging output (#187). - [LLVM] The ModuleID and source_filename of LLVM-IR modules are now anonymized to prevent unintentional overfitting to benchmarks by name (#171). - [docs] We have added a Feature Stability section to the documentation (#196). - Numerous bug fixes and improvements. Please use this checklist when updating code for the previous CompilerGym release: - Review code that accesses the env.benchmark property and update to env.benchmark.uri if a string name is required. Setting this attribute by string (env.benchmark = "benchmark://a-v0/b") and comparison to string types (env.benchmark == "benchmark://a-v0/b") still work. - Review code that calls env.reset() without first setting a benchmark. Previously, calling env.reset() would select a random benchmark. Now, env.reset() always selects the last used benchmark, or a predetermined default if none is specified. - Review code that relies on env.benchmark being None to select benchmarks randomly. Now, env.benchmark is always set to the previously used benchmark, or a predetermined default benchmark if none has been specified. Setting env.benchmark = None will raise an error. Select a benchmark randomly by sampling from the env.datasets.benchmark_uris() iterator. - Remove calls to env.require_dataset() and related operations. These are no longer required. - Remove accesses to env.benchmarks. An iterator over available benchmark URIs is now available at env.datasets.benchmark_uris(), but the list of URIs cannot be relied on to be fully enumerable (the LLVM environments have over 2^32 URIs). - Review code that accesses env.observation_space and update to env.observation_space_spec where necessary (#228). - Update compiler service implementations to support the updated RPC interface by removing the deprecated GetBenchmarks RPC endpoint and replacing it with Dataset classes. See the example service for details. - [LLVM] Update references to the poj104-v0 dataset to poj104-v1. - [LLVM] Update references to the cBench-v1 dataset to cbench-v1.
bwasti
pushed a commit
to bwasti/CompilerGym
that referenced
this pull request
Aug 3, 2021
This release introduces some significant changes to the way that benchmarks are managed, introducing a new dataset API. This enabled us to add support for millions of new benchmarks and a more efficient implementation for the LLVM environment, but this will require some migrating of old code to the new interfaces (see “Migration Checklist” below). Some of the key changes of this release are: - [Core API change] We have added a Python Benchmark class (facebookresearch#190). The env.benchmark attribute is now an instance of this class rather than a string (facebookresearch#222). - [Core behavior change] Environments will no longer select benchmarks randomly. Now env.reset() will now always select the last-used benchmark, unless the benchmark argument is provided or env.benchmark has been set. If no benchmark is specified, a default is used. - [API deprecations] We have added a new Dataset class hierarchy (facebookresearch#191, facebookresearch#192). All datasets are now available without needing to be downloaded first, and a new Datasets class can be used to iterate over them (facebookresearch#200). We have deprecated the old dataset management operations, the compiler_gym.bin.datasets script, and removed the --dataset and --ls_benchmark flags from the command line tools. - [RPC interface change] The StartSession RPC endpoint now accepts a list of initial observations to compute. This removes the need for an immediate call to Step, reducing environment reset time by 15-21% (facebookresearch#189). - [LLVM] We have added several new datasets of benchmarks, including the Csmith and llvm-stress program generators (facebookresearch#207), a dataset of OpenCL kernels (facebookresearch#208), and a dataset of compilable C functions (facebookresearch#210). See the docs for an overview. - CompilerEnv now takes an optional Logger instance at construction time for fine-grained control over logging output (facebookresearch#187). - [LLVM] The ModuleID and source_filename of LLVM-IR modules are now anonymized to prevent unintentional overfitting to benchmarks by name (facebookresearch#171). - [docs] We have added a Feature Stability section to the documentation (facebookresearch#196). - Numerous bug fixes and improvements. Please use this checklist when updating code for the previous CompilerGym release: - Review code that accesses the env.benchmark property and update to env.benchmark.uri if a string name is required. Setting this attribute by string (env.benchmark = "benchmark://a-v0/b") and comparison to string types (env.benchmark == "benchmark://a-v0/b") still work. - Review code that calls env.reset() without first setting a benchmark. Previously, calling env.reset() would select a random benchmark. Now, env.reset() always selects the last used benchmark, or a predetermined default if none is specified. - Review code that relies on env.benchmark being None to select benchmarks randomly. Now, env.benchmark is always set to the previously used benchmark, or a predetermined default benchmark if none has been specified. Setting env.benchmark = None will raise an error. Select a benchmark randomly by sampling from the env.datasets.benchmark_uris() iterator. - Remove calls to env.require_dataset() and related operations. These are no longer required. - Remove accesses to env.benchmarks. An iterator over available benchmark URIs is now available at env.datasets.benchmark_uris(), but the list of URIs cannot be relied on to be fully enumerable (the LLVM environments have over 2^32 URIs). - Review code that accesses env.observation_space and update to env.observation_space_spec where necessary (facebookresearch#228). - Update compiler service implementations to support the updated RPC interface by removing the deprecated GetBenchmarks RPC endpoint and replacing it with Dataset classes. See the example service for details. - [LLVM] Update references to the poj104-v0 dataset to poj104-v1. - [LLVM] Update references to the cBench-v1 dataset to cbench-v1.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The dataset is from:
Issue #45.