Skip to content

Review README #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ converted to text-only format.

We closely follow guidance from [HumanEval](https://github.com/openai/human-eval/tree/master).

Make sure to use python 3.7 or later:
```
Make sure to use Python 3.7 or later:
```bash
$ conda create -n codex python=3.7
$ conda activate codex
```

Install [ICARUS Verilog](https://github.com/steveicarus/iverilog):
```
```bash
$ git clone https://github.com/steveicarus/iverilog.git && cd iverilog \
&& git checkout 01441687235135d1c12eeef920f75d97995da333 \
&& sh ./autoconf.sh && ./configure && make -j4\
Expand All @@ -32,14 +32,14 @@ which already pre-installed ICARUS Verilog Simulator. Using the docker container
you would still need to complete the following step.

Check out and install this repository:
```
```bash
$ git clone https://github.com/NVlabs/verilog-eval
$ pip install -e verilog-eval
```

## Usage

**This program would make system calls to *iverilog* and *vvp* to simulate
**This program would make system calls to `iverilog` and `vvp` to simulate
untrusted model-generated code. Users are strongly
encouraged not to do so outside of a robust security sandbox. The [execution
call](https://github.com/NVlabs/verilog-eval/blob/main/verilog_eval/execution.py#L79-L112)
Expand All @@ -50,14 +50,16 @@ disclaimer before running code in a potentially unsafe manner. See the comment i
After following the above instructions to enable execution, generate samples
and save them in the following JSON Lines (jsonl) format, where each sample is
formatted into a single line like so:
```
```json
{"task_id": "Corresponding VerilogEval task ID", "completion": "Completion only without the prompt"}
```
We provide examples under `data/example` to illustrate the format and help with debugging.

To evaluate the samples, run
```
```bash
$ evaluate_functional_correctness samples.jsonl --problem_file data/VerilogEval_Human.jsonl
```
```
Reading samples...
3120it [00:00, 16077.44it/s]
Running test suites...
Expand All @@ -81,8 +83,10 @@ out", or "failed".
As a quick sanity-check, the example samples should yield 0.5 pass@1. The results can be
verified against the provided output
in `data/example/ExampleSolution.jsonl_reference.jsonl`.
```
```bash
$ evaluate_functional_correctness data/example/ExampleSolution.jsonl --problem_file=data/example/ExampleEval.jsonl
```
```
Reading samples...
6it [00:00, 221.60it/s]
Running example suites...
Expand All @@ -97,14 +101,14 @@ Because there is no unbiased way of estimating pass@k when there are fewer
samples than k, the script does not evaluate pass@k for these cases. To
evaluate with other k values, pass `--k=<comma-separated-values-here>`. For
other options, see
```
```bash
$ evaluate_functional_correctness --help
```
However, we recommend that you use the default values for the rest.

## Issues
Problem descriptions in `descriptions/VerilogDescription_Machine.jsonl` are machine
generated and we can not guarantee the absense of ambiguity and errors. We do not plan
generated and we can not guarantee the absence of ambiguity and errors. We do not plan
to maintain description correctness.

Functional correctness are evaluated through comparing simulation outputs using
Expand Down