Skip to content

Commit

Permalink
Merge branch 'stanford-crfm:main' into DecodingTrust
Browse files Browse the repository at this point in the history
  • Loading branch information
danielz02 authored Jan 7, 2024
2 parents 88b3e50 + 2a112cb commit 2a4bdc4
Show file tree
Hide file tree
Showing 347 changed files with 15,818 additions and 12,294 deletions.
33 changes: 19 additions & 14 deletions .github/workflows/frontend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ on:
branches:
- '*'
paths:
- 'src/helm-frontend/**'
- 'helm-frontend/**'
pull_request:
branches:
- '*'
paths:
- 'src/helm-frontend/**'
- 'helm-frontend/**'

jobs:
test:
Expand All @@ -22,18 +22,20 @@ jobs:
uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install Yarn
run: npm install --global yarn
- name: Install dependencies
working-directory: ./src/helm-frontend
run: npm ci
working-directory: ./helm-frontend
run: yarn install
- name: Run lint
working-directory: ./src/helm-frontend
run: npm run lint
working-directory: ./helm-frontend
run: yarn lint
- name: Run check format
working-directory: ./src/helm-frontend
run: npm run format:check
working-directory: ./helm-frontend
run: yarn format:check
- name: Run tests
working-directory: ./src/helm-frontend
run: npm run test
working-directory: ./helm-frontend
run: yarn test

build:
runs-on: ubuntu-latest
Expand All @@ -53,12 +55,15 @@ jobs:
uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install Yarn
working-directory: ./helm-frontend
run: npm install --global yarn
- name: Install dependencies
working-directory: ./src/helm-frontend
run: npm ci
working-directory: ./helm-frontend
run: yarn install
- name: Build app
working-directory: ./src/helm-frontend
run: npm run build
working-directory: ./helm-frontend
run: yarn build
- name: Upload artifact
uses: actions/upload-pages-artifact@v2
with:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ on:
push:
branches: [ main ]
paths-ignore:
- 'src/helm-frontend/**'
- 'helm-frontend/**'
pull_request:
paths-ignore:
- 'src/helm-frontend/**'
- 'helm-frontend/**'

jobs:
install:
Expand Down
72 changes: 71 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,75 @@

## [Upcoming]

## [v0.4.0] - 2023-12-20

### Models

- Added Google PaLM 2 (#2087, #2111, #2139)
- Added Anthropic Claude 2.1 and Claude Instant 1.2 (#2095, #2123)
- Added Writer Palmyra-X v2 and v3 (#2104)
- Added OpenAI GPT-4 Turbo preview (#2092)
- Added 01.AI Yi (#2009)
- Added Mistral AI Mixtral-8x7B (#2130)
- Fixed race condition with "Already borrowed" error for Hugging Face tokenizers (#2088, #2091, #2116)
- Support configuration precision and quantization in HuggingFaceClient (#1912)
- Support LanguageModelingAdapter for HuggingFaceClient (#1964)

### Scenarios

- Added VizWiz Scenario (#1983)
- Added LegalBench scenario (#2129)
- Refactored CommonSenseScenario into HellaSwagScenario, OpenBookQA, SiqaScenario, and PiqaScenario (#2117, #2118, #2119)
- Added run specs configuration for HELM Lite (#2009)
- Changed the default metric in GSM8K to check exact match of the final number in the response (#2130)

### Framework

- Added tutorial for computing the leaderboard rank of a model using the method from "Efficient Benchmarking (of Language Models)" (#1968, #1986, #1985)
- Refactored ModelMetadata, ModelDeployment and Tokenizer, and moved configuration to YAML files (#1903, #1994)
- Fixed a bug regarding writing `runs_to_run_suites.json` when using `helm-release` with `--release` (#2012)
- Made pymongo an optional dependency (#1882)
- Made SlurmRunner retry some failed Slurm requests (#2077)
- Shortened cache retry time (#2081)
- Added retrying to AutoTokenizer (#2090)
- Added support for user configuration of model deployments and tokenizer configurations (#1996, #2142)
- Added support for passing in an arbitrary schema file to `helm-rummarize` (#2075)
- Changed the prompt format for some instruction following models (#2130)
- Added py.typed to package type information (#2169)

### Frontend

- Made visual improvements and bugfixes for the new React frontend (#1947, #2000, #2005, #2018)
- Changed front page on Raect frontend to display a mini leaderboard (#2113, #2128)
- Added a dropdown menu for switching between different HELM results websites (#1947)
- Added a dropdown menu for switching between different versions (#2135)

### Evaluation Results

- Launched new React frontend
- [HELM Classic v0.4.0](https://crfm.stanford.edu/helm/classic/v0.4.0/)
- Added evaluation results for Mistral
- [HELM Lite v1.0.0](https://crfm.stanford.edu/helm/lite/v1.0.0/)
- Launched new [HELM Lite leaderboard](https://crfm.stanford.edu/2023/12/19/helm-lite.html) with 30 models and 10 scenarios

### Contributors

Thank you to the following contributors for your work on this HELM release!

- @brianwgoldman
- @dlwh
- @farzaank
- @JosselinSomervilleRoberts
- @krh26
- @neelguha
- @percyliang
- @perlitz
- @pettter
- @ruixin31
- @teetone
- @yifanmai
- @yotamp

## [v0.3.0] - 2023-11-01

### Models
Expand Down Expand Up @@ -236,7 +305,8 @@ Thank you to the following contributors for your contributions to this HELM rele

- Initial release

[upcoming]: https://github.com/stanford-crfm/helm/compare/v0.3.0...HEAD
[upcoming]: https://github.com/stanford-crfm/helm/compare/v0.4.0...HEAD
[v0.4.0]: https://github.com/stanford-crfm/helm/releases/tag/v0.4.0
[v0.3.0]: https://github.com/stanford-crfm/helm/releases/tag/v0.3.0
[v0.2.4]: https://github.com/stanford-crfm/helm/releases/tag/v0.2.4
[v0.2.3]: https://github.com/stanford-crfm/helm/releases/tag/v0.2.3
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
recursive-include src/helm/ py.typed
recursive-include src/helm/proxy/clients/ *.sp
recursive-include src/helm/benchmark/ *.json
recursive-include src/helm/benchmark/static/ *.css *.html *.js *.png *.yaml
Expand Down
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,36 @@ The directory structure for this repo is as follows
└── helm-frontend # New React Front-end
```

# Holistic Evaluation of Text-To-Image Models

<img src="https://github.com/stanford-crfm/helm/raw/heim/src/helm/benchmark/static/heim/images/heim-logo.png" alt="" width="800"/>

Significant effort has recently been made in developing text-to-image generation models, which take textual prompts as
input and generate images. As these models are widely used in real-world applications, there is an urgent need to
comprehensively understand their capabilities and risks. However, existing evaluations primarily focus on image-text
alignment and image quality. To address this limitation, we introduce a new benchmark,
**Holistic Evaluation of Text-To-Image Models (HEIM)**.

We identify 12 different aspects that are important in real-world model deployment, including:

- image-text alignment
- image quality
- aesthetics
- originality
- reasoning
- knowledge
- bias
- toxicity
- fairness
- robustness
- multilinguality
- efficiency

By curating scenarios encompassing these aspects, we evaluate state-of-the-art text-to-image models using this benchmark.
Unlike previous evaluations that focused on alignment and quality, HEIM significantly improves coverage by evaluating all
models across all aspects. Our results reveal that no single model excels in all aspects, with different models
demonstrating strengths in different aspects.

This repository contains the code used to produce the [results on the website](https://crfm.stanford.edu/heim/latest/)
and [paper](https://arxiv.org/abs/2311.04287).
8 changes: 8 additions & 0 deletions docs/code.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,11 @@ multiple perturbations and applying it onto a single instance.
4. Add a new class `<Name of tokenizer>WindowService` in file `<Name of tokenizer>_window_service.py`.
Follow what we did for `GPTJWindowService`.
5. Import the new `WindowService` and map the model(s) to it in `WindowServiceFactory`.


## HEIM (text-to-image evaluation)

The overall code structure is the same as HELM's.

When adding new scenarios and metrics for image generation, place the Python files under the `image_generation` package
(e.g., `src/helm/benchmark/scenarios/image_generation`).
16 changes: 16 additions & 0 deletions docs/heim.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# HEIM Quick Start (text-to-image evaluation)

To run HEIM, follow these steps:

1. Create a run specs configuration file. For example, to evaluate
[Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) against the
[MS-COCO scenario](https://github.com/stanford-crfm/heim/blob/main/src/helm/benchmark/scenarios/image_generation/mscoco_scenario.py), run:
```
echo 'entries: [{description: "mscoco:model=huggingface/stable-diffusion-v1-4", priority: 1}]' > run_specs.conf
```
2. Run the benchmark with certain number of instances (e.g., 10 instances):
`helm-run --conf-paths run_specs.conf --suite heim_v1 --max-eval-instances 10`

Examples of run specs configuration files can be found [here](https://github.com/stanford-crfm/helm/tree/main/src/helm/benchmark/presentation).
We used [this configuration file](https://github.com/stanford-crfm/helm/blob/main/src/helm/benchmark/presentation/run_specs_heim.conf)
to produce results of the paper.
4 changes: 4 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,7 @@ To add new models and scenarios, refer to the Developer Guide's chapters:

- [Developer Setup](developer_setup.md)
- [Code Structure](code.md)


We also support evaluating text-to-image models as introduced in **Holistic Evaluation of Text-to-Image Models (HEIM)**
([paper](https://arxiv.org/abs/2311.04287), [website](https://crfm.stanford.edu/heim/latest)).
16 changes: 16 additions & 0 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,19 @@ Within this virtual environment, run:
```
pip install crfm-helm
```

### For HEIM (text-to-image evaluation)

To install the additional dependencies to run HEIM, run:

```
pip install "crfm-helm[heim]"
```

Some models (e.g., DALLE-mini/mega) and metrics (`DetectionMetric`) require extra dependencies that are
not available on PyPI. To install these dependencies, download and run the
[extra install script](https://github.com/stanford-crfm/helm/blob/main/install-heim-extras.sh):

```
bash install-heim-extras.sh
```
8 changes: 7 additions & 1 deletion docs/models.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# Models

Please visit the models [page](https://crfm.stanford.edu/helm/latest/?models) of HELM's website for a list of available models and their descriptions.
Please visit the models [page](https://crfm.stanford.edu/helm/latest/?models) of HELM's website
for a list of available models and their descriptions.


## HEIM (text-to-image evaluation)

Please visit the [models page](https://crfm.stanford.edu/heim/latest/?models) of the HEIM results website.
7 changes: 6 additions & 1 deletion docs/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,9 @@ helm-server

Then go to http://localhost:8000/ in your browser.

**Next steps:** click [here](get_helm_rank.md) to find out how to to run the full benchmark and get your model's leaderboard rank.

## Next steps

Click [here](get_helm_rank.md) to find out how to run the full benchmark and get your model's leaderboard rank.

For the quick start page for HEIM, visit [here](heim.md).
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ const colorNames = [
"warning",
"error",
];
// eslint-disable-next-line @typescript-eslint/no-unsafe-member-access
const theme = daisyuiColors["[data-theme=business]"] as Record<string, string>;

const theme = daisyuiColors["[data-theme=business]"];

interface Props {
groupsTables: GroupsTable[];
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ export default defineConfig({
environment: "jsdom",
},
build: {
outDir: `${__dirname}/../helm/benchmark/static_build`,
outDir: `${__dirname}/../src/helm/benchmark/static_build`,
},
});
File renamed without changes.
28 changes: 28 additions & 0 deletions install-heim-extras.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash

# Extra dependencies for HEIM when evaluating the following:
# Models: craiyon/dalle-mini, craiyon/dalle-mega, thudm/cogview2
# Scenarios: detection with the `DetectionMetric`

# This script fails when any of its commands fail.
set -e

# For DALLE-mini/mega, install the following dependencies.
# On Mac OS, skip installing pytorch with CUDA because CUDA is not supported
if [[ $OSTYPE != 'darwin'* ]]; then
# Manually install pytorch to avoid pip getting killed: https://stackoverflow.com/a/54329850
pip install --no-cache-dir --find-links https://download.pytorch.org/whl/torch_stable.html torch==1.12.1+cu113 torchvision==0.13.1+cu113

# DALLE mini requires jax install
pip install jax==0.3.25 jaxlib==0.3.25+cuda11.cudnn805 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
fi

# For CogView2, manually install apex and Image-Local-Attention. NOTE: need to run this on a GPU machine
echo "Installing CogView2 dependencies..."
pip install localAttention@git+https://github.com/Sleepychord/Image-Local-Attention.git@43fee310cb1c6f64fb0ed77404ba3b01fa586026
pip install --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" apex@git+https://github.com/michiyasunaga/apex.git@9395ba2aab3c05e0e36ef0b7fe48d42de9f10bcf

# For Detectron2. Following https://detectron2.readthedocs.io/en/latest/tutorials/install.html
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

echo "Done."
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ nav:
- 'User Guide':
- 'installation.md'
- 'quick_start.md'
- 'heim.md'
- 'get_helm_rank.md'
- 'tutorial.md'
- 'benchmark.md'
Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ google-api-core==2.10.1
google-api-python-client==2.64.0
google-auth==2.12.0
google-auth-httplib2==0.1.0
google-cloud-aiplatform==1.36.4
google-cloud-aiplatform==1.38.1
googleapis-common-protos==1.56.4
greenlet==1.1.3
gunicorn==20.1.0
Expand Down Expand Up @@ -88,6 +88,7 @@ nodeenv==1.7.0
numba==0.56.4
numpy==1.23.3
openai==0.27.8
opencv-python==4.8.1.78
openpyxl==3.0.10
outcome==1.2.0
packaging==21.3
Expand Down
Empty file.
Loading

0 comments on commit 2a4bdc4

Please sign in to comment.