Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLEVA Schema #1864

Merged
merged 229 commits into from
Sep 30, 2023
Merged
Show file tree
Hide file tree
Changes from 228 commits
Commits
Show all changes
229 commits
Select commit Hold shift + click to select a range
b063368
Add CLEVA text classification task
lyy1994 Aug 17, 2023
6b3658a
Add CLEVA opinion mining task
lyy1994 Aug 18, 2023
96b7b15
Add Pinyin Transliteration Task and Two Related Subtasks
Jianqiao-Zhao Aug 18, 2023
0b5c4b1
Amend requirements-freeze.txt
Jianqiao-Zhao Aug 18, 2023
1544260
1. Add chinese_bleu_1 Metric in BasicMetric
Jianqiao-Zhao Aug 18, 2023
5e50966
Amend Pinyin Transliteration max_tokens
Jianqiao-Zhao Aug 18, 2023
145f914
Fix Typo and Add Examples for CLEVAPinyinTransliterationScenario
Jianqiao-Zhao Aug 18, 2023
64141ac
Add Classical Chinese Understanding Scenario
Jianqiao-Zhao Aug 18, 2023
d45fc97
Add Sentiment Analysis Scenario
Jianqiao-Zhao Aug 18, 2023
0a770e8
Add Instruction Following Scenario
Jianqiao-Zhao Aug 18, 2023
9835f38
Merge pull request #1 from lyy1994/pinyin
lyy1994 Aug 18, 2023
bb7c088
Add CLEVA robustness perturbations
lyy1994 Aug 18, 2023
3ee04ba
Fix Bug & Clean up
Jianqiao-Zhao Aug 18, 2023
cebb101
Put name in to CLEVAScenario to avoid creating unnecessary directory
Jianqiao-Zhao Aug 18, 2023
db4c953
Merge pull request #5 from lyy1994/scenarios/instruction_following
lyy1994 Aug 18, 2023
0a00421
Merge branch 'stanford-crfm:main' into main
lyy1994 Aug 18, 2023
872e810
Add Disinformation Scenario
Jianqiao-Zhao Aug 18, 2023
b1c0104
Merge pull request #4 from lyy1994/perturbations/robustness
Jianqiao-Zhao Aug 19, 2023
26b0e86
Load Prompt Setting From File
Jianqiao-Zhao Aug 19, 2023
c1bb31c
Fix Python style
lyy1994 Aug 19, 2023
ac61947
Update style
lyy1994 Aug 19, 2023
25a53b0
Fix checking
lyy1994 Aug 19, 2023
d778690
Merge pull request #8 from lyy1994/hotfix/style
Jianqiao-Zhao Aug 20, 2023
12d8329
Add Gender Perturbation
lyy1994 Aug 20, 2023
4716053
1. Merge branch 'main' to pass the auto checks
Jianqiao-Zhao Aug 20, 2023
ac6754a
Fix checking
Jianqiao-Zhao Aug 20, 2023
fa8bbbf
Fix checking
Jianqiao-Zhao Aug 20, 2023
43e9815
Fix checking
Jianqiao-Zhao Aug 20, 2023
46f0a18
Merge pull request #7 from lyy1994/load_prompt_templates
lyy1994 Aug 20, 2023
b759579
Initialize Translation Scenario
Jianqiao-Zhao Aug 20, 2023
a98aa4f
Implement ClevaMachineTranslationMetric to support Chinese and variab…
Jianqiao-Zhao Aug 20, 2023
6d57bbd
Add Chinese person name perturbation
lyy1994 Aug 21, 2023
e68edf0
Add Simplified to Traditional perturbation
lyy1994 Aug 21, 2023
b7e70c9
Add Mandarin to Cantonese perturbation
lyy1994 Aug 21, 2023
3372a2a
Add CLEVA data augmentation setup
lyy1994 Aug 21, 2023
7ab1bf6
Update type annotation
lyy1994 Aug 21, 2023
387fd26
1. Cleva -> CLEVA in translation metric
Jianqiao-Zhao Aug 21, 2023
7047cc1
Merge pull request #9 from lyy1994/scenarios/translation
lyy1994 Aug 21, 2023
854555b
Merge branch 'main' into perturbations/fairness
Jianqiao-Zhao Aug 21, 2023
1c45137
Merge pull request #10 from lyy1994/perturbations/fairness
zd11024 Aug 21, 2023
2f30a44
Add Intent Understanding Scenario
Jianqiao-Zhao Aug 21, 2023
6cfaa79
Replace hardcoded file names
lyy1994 Aug 21, 2023
04ca882
Add CLEVA harms metrics
lyy1994 Aug 21, 2023
5c13686
Rename
lyy1994 Aug 21, 2023
66442f8
Minor Debug
Jianqiao-Zhao Aug 21, 2023
d086460
Minor Debug
Jianqiao-Zhao Aug 21, 2023
9ae9b64
Merge pull request #12 from lyy1994/scenarios/intent_understanding
Jianqiao-Zhao Aug 21, 2023
c179d32
Add Coreference Resolution Scenario
Jianqiao-Zhao Aug 21, 2023
78e97fc
Minor Debug
Jianqiao-Zhao Aug 21, 2023
0025efc
Add Dialog Generation Scenario
zd11024 Aug 21, 2023
0d14c42
Add Subject Knowledge Scenario
zd11024 Aug 21, 2023
94acb9a
Merge branch 'main' into add_dialog_scenario
zd11024 Aug 21, 2023
699a9a3
Merge pull request #11 from lyy1994/metrics/harms
zd11024 Aug 22, 2023
ec95b0d
Remove extra knowledge instances
zd11024 Aug 22, 2023
b830464
Overwrite get_instances for Dialog Scenario
zd11024 Aug 22, 2023
69222d4
Truncate labels for Subject Knowledge
zd11024 Aug 22, 2023
1e8547a
Merge branch 'main' into add_dialog_scenario
zd11024 Aug 22, 2023
cd801aa
Merge pull request #14 from lyy1994/scenarios/dialogue_generation
lyy1994 Aug 22, 2023
babcc84
Add CLEVA cultural knowledge scenario
lyy1994 Aug 22, 2023
331ea7c
Fix style
lyy1994 Aug 22, 2023
565422d
Add Reading Comprehension Scenario
Jianqiao-Zhao Aug 22, 2023
8763099
Add summarization and closed-book QA scenarios
zd11024 Aug 22, 2023
e1aa830
Merge pull request #15 from lyy1994/scenarios/cultural_knowledge
Jianqiao-Zhao Aug 22, 2023
be29c69
Minor Improvements After Review
Jianqiao-Zhao Aug 22, 2023
634f48d
Merge branch 'main' into scenarios/coreference_resolution
Jianqiao-Zhao Aug 22, 2023
19f5ef1
Merge pull request #13 from lyy1994/scenarios/coreference_resolution
Jianqiao-Zhao Aug 22, 2023
0d82bb7
Merge branch 'main' into scenarios/reading_comprehension
Jianqiao-Zhao Aug 22, 2023
3620568
Minor Debug
Jianqiao-Zhao Aug 22, 2023
4e824ea
Merge pull request #16 from lyy1994/scenarios/reading_comprehension
lyy1994 Aug 22, 2023
226d16e
add toxicity_detection & paraphrase_generation
HenryHZY Aug 22, 2023
66c15fc
Merge branch 'main' into scenarios/toxicity_detection
HenryHZY Aug 23, 2023
6813fab
Reformat files
HenryHZY Aug 23, 2023
445ab6d
Add annotations for Subject Knowledge datasets
zd11024 Aug 22, 2023
f04dbb3
Add chinese_rouge_2 metric
zd11024 Aug 23, 2023
9e6b991
Merge branch 'main' into scenarios/closed_book_qa
zd11024 Aug 23, 2023
ad00afb
Merge branch 'stanford-crfm:main' into main
lyy1994 Aug 23, 2023
b6ce87b
Merge pull request #17 from lyy1994/scenarios/closed_book_qa
lyy1994 Aug 23, 2023
f4c8858
Update class CLEVAToxicityDetectionScenario
HenryHZY Aug 23, 2023
22d4e3f
Add Paraphrase Identification Scenario
Jianqiao-Zhao Aug 23, 2023
d7f27fc
Minor Debug
Jianqiao-Zhao Aug 23, 2023
5064e7b
Merge branch 'main' into scenarios/paraphrase_identification
Jianqiao-Zhao Aug 23, 2023
327df88
Merge pull request #20 from lyy1994/scenarios/paraphrase_identification
lyy1994 Aug 23, 2023
34f9489
Fix class CLEVAParaphraseGenerationMetric
HenryHZY Aug 23, 2023
126c9ad
minor update for paraphrase generation
HenryHZY Aug 23, 2023
3102ba9
Add mathematic calculation and inductive reasoning
zd11024 Aug 23, 2023
0f20c57
minor update for paraphrase generation
HenryHZY Aug 23, 2023
29dd0e3
Merge pull request #18 from lyy1994/scenarios/toxicity_detection
lyy1994 Aug 23, 2023
3c00bfd
Add Bias Scenario and Related Subtasks
Jianqiao-Zhao Aug 23, 2023
87a2b98
Implement Population_F1 for Multiple Choice Adapter
Jianqiao-Zhao Aug 23, 2023
a64737f
Improve Classification Logic
Jianqiao-Zhao Aug 23, 2023
3b6aac0
Cleanup
Jianqiao-Zhao Aug 23, 2023
e2aa3f9
Add reasoning primitive scenario
zd11024 Aug 23, 2023
5b8fc01
Merge branch 'main' into scenarios/reasoning
zd11024 Aug 23, 2023
43b3c2c
Update Subtask Names
Jianqiao-Zhao Aug 23, 2023
151267e
Merge pull request #21 from lyy1994/scenarios/bias
lyy1994 Aug 23, 2023
37f7257
Add CLEVA reasoning scenarios
lyy1994 Aug 23, 2023
8db3ac1
Add Deductive Reasoning Scenario and One Related Subtask
Jianqiao-Zhao Aug 23, 2023
bd05af6
Merge branch 'stanford-crfm:main' into main
lyy1994 Aug 23, 2023
347e2a9
Minor Improvements for Review
Jianqiao-Zhao Aug 23, 2023
919a431
Add CLEVA copyright scenario
lyy1994 Aug 23, 2023
99f395c
Fix typos
lyy1994 Aug 23, 2023
0abced5
Merge pull request #23 from lyy1994/deductive_reasoning
lyy1994 Aug 23, 2023
c7f2c85
Merge branch 'main' into scenarios/commonsense_reasoning
Jianqiao-Zhao Aug 23, 2023
82e1b38
Merge pull request #22 from lyy1994/scenarios/commonsense_reasoning
Jianqiao-Zhao Aug 23, 2023
b29f99b
Add Chinese copyright metric
lyy1994 Aug 23, 2023
f97883a
Merge branch 'main' into scenarios/copyright
lyy1994 Aug 23, 2023
cdaff79
Update Chinese tokenizer default value
lyy1994 Aug 24, 2023
bc1cd7c
Merge branch 'stanford-crfm:main' into main
lyy1994 Aug 24, 2023
42add25
Remove label truncation
zd11024 Aug 24, 2023
6f046d7
Merge branch 'main' into scenarios/reasoning
zd11024 Aug 24, 2023
ae9b0ad
Add reasoning tag
zd11024 Aug 24, 2023
2b91f75
Merge pull request #25 from lyy1994/scenarios/reasoning
zd11024 Aug 24, 2023
940811a
Merge pull request #24 from lyy1994/scenarios/copyright
Jianqiao-Zhao Aug 24, 2023
1935bfc
Add Data-to-Text Generation Scenario
Jianqiao-Zhao Aug 25, 2023
a2d609f
Add CLEVA to HELM prompt conversion
lyy1994 Aug 25, 2023
7c347b6
Fetch inference parameters online
lyy1994 Aug 25, 2023
d01eca9
Reformat
lyy1994 Aug 25, 2023
c696945
Add Mathematical Reasoning Scenario and Related Metrics
Jianqiao-Zhao Aug 25, 2023
c510861
Minor Debug
Jianqiao-Zhao Aug 25, 2023
c2ce65b
Add Language Modeling Scenario
Jianqiao-Zhao Aug 25, 2023
12719f6
Minor Debug
Jianqiao-Zhao Aug 25, 2023
7b46000
Minor Adjustments after Review Discussion
Jianqiao-Zhao Aug 26, 2023
e926f8d
Merge pull request #27 from lyy1994/scenarios/d2t_maths_language
lyy1994 Aug 26, 2023
f769fd0
Merge pull request #26 from lyy1994/adaptation
lyy1994 Aug 26, 2023
1a4368a
Fix bugs
lyy1994 Aug 26, 2023
e66c15e
Add CLEVA paraphrase generation multiple prompts
lyy1994 Aug 27, 2023
d36ed3b
Fix typo
lyy1994 Aug 27, 2023
2ec2a97
Merge pull request #28 from lyy1994/prompts/paraphrase
Jianqiao-Zhao Aug 27, 2023
0dbe02d
Adjust prompts for all multiple-choice scenarios
Jianqiao-Zhao Aug 28, 2023
d5e5d2a
1. Adjust all the generation scenarios to use CLEVA prompts
Jianqiao-Zhao Aug 29, 2023
d21cc1a
Minor Improvements After Review Discussion
Jianqiao-Zhao Aug 30, 2023
4247a59
Minor Debug
Jianqiao-Zhao Aug 30, 2023
0224f60
Adapt CLEVACodeSynthesisScenario to HELM Style
Jianqiao-Zhao Aug 30, 2023
05d4635
1. Fix Mathematical Reasoning Scenario Prompt Issue in Test Split Cau…
Jianqiao-Zhao Aug 30, 2023
fb99b8f
Make CLEVA dependency optional
lyy1994 Aug 31, 2023
19179fe
Handle optional dependency error
lyy1994 Aug 31, 2023
60a3188
Handle optional dependency
lyy1994 Aug 31, 2023
105be1d
1. Fix Mathematical Reasoning Testing Labels
Jianqiao-Zhao Aug 31, 2023
9649dcc
Merge pull request #30 from lyy1994/dependency
Jianqiao-Zhao Aug 31, 2023
c57ec27
Update process_instance in Language Modeling and Paraphrase Generatio…
Jianqiao-Zhao Aug 31, 2023
dfbb30e
Merge pull request #29 from lyy1994/adjust_prompts
lyy1994 Aug 31, 2023
581e00a
Make CLEVAScenario an abstract class
lyy1994 Aug 31, 2023
a56755b
Minor debug
lyy1994 Aug 31, 2023
fbfa44e
Merge pull request #31 from lyy1994/abstract
Jianqiao-Zhao Aug 31, 2023
0a341dd
Add financial_question subtask for CLEVAParaphraseIdentificationScenario
Jianqiao-Zhao Aug 31, 2023
43e4d54
Add KeyphraseExtractionScenario and other subtasks
Jianqiao-Zhao Aug 31, 2023
771e0cc
Fix task-subtask conflicts
lyy1994 Aug 31, 2023
c8f647d
Fix dialogue processing
zd11024 Aug 31, 2023
adeee53
Merge branch 'scenarios/dialogue' into increase_data
zd11024 Aug 31, 2023
f81d01f
Add More data
Jianqiao-Zhao Aug 31, 2023
43a605a
Save working progress
Jianqiao-Zhao Aug 31, 2023
59fe8d0
Merge remote-tracking branch 'origin/increase_data' into increase_data
Jianqiao-Zhao Aug 31, 2023
5b59ccb
Increase to Full-Scale Data
Jianqiao-Zhao Aug 31, 2023
3972aec
Minor debug
lyy1994 Aug 31, 2023
f57c944
Merge pull request #32 from lyy1994/increase_data
lyy1994 Aug 31, 2023
6599f07
Fix converter bug
zd11024 Sep 2, 2023
20e53cb
Merge pull request #33 from lyy1994/hotfix/converter
lyy1994 Sep 2, 2023
3a0e7a2
Fix bugs
lyy1994 Sep 2, 2023
e38a265
Merge pull request #34 from lyy1994/hotfix/minor
Jianqiao-Zhao Sep 3, 2023
5509035
Update asset links
lyy1994 Sep 5, 2023
7ffbd6b
Update data link
lyy1994 Sep 6, 2023
4988330
Merge pull request #35 from lyy1994/update_link
lyy1994 Sep 6, 2023
fa092f6
Optimize Some Names
Jianqiao-Zhao Sep 8, 2023
fa3904d
Add Type Annotations to Perturbation Functions
Jianqiao-Zhao Sep 9, 2023
f2e5ce0
Make mypy Happy
Jianqiao-Zhao Sep 9, 2023
82f6a20
Refactor get_words_with_similar_pinyin in ChineseTyposPerturbation
Jianqiao-Zhao Sep 9, 2023
e350a89
Include rare_char_prob, consider_tone, and word_level_perturb in Chin…
Jianqiao-Zhao Sep 9, 2023
64a0f15
Add One Missing Type Annotation
Jianqiao-Zhao Sep 9, 2023
f6f82d4
Simplify data structure
lyy1994 Sep 10, 2023
52481d9
Merge pull request #36 from lyy1994/minor_improvement
Jianqiao-Zhao Sep 10, 2023
c22396d
1. Optimize ChineseTokenizer structure in cleva_harms_metric.py and b…
Jianqiao-Zhao Sep 12, 2023
6e2b142
Merge pull request #37 from lyy1994/minor_improvement
lyy1994 Sep 12, 2023
572c72d
Simplify code structure; Increase readability; Refine comments and me…
Jianqiao-Zhao Sep 13, 2023
806f034
Add CLEVA RunSpecs
lyy1994 Sep 13, 2023
c16b911
Fix Type Annotations
Jianqiao-Zhao Sep 13, 2023
00d3b07
Merge remote-tracking branch 'origin/scenario_improvement' into scena…
Jianqiao-Zhao Sep 13, 2023
b080817
Fix typos
Jianqiao-Zhao Sep 13, 2023
1979ddc
Add more assertions to ensure correct type
Jianqiao-Zhao Sep 13, 2023
350e520
Continue add type annotations
Jianqiao-Zhao Sep 13, 2023
5f24185
Fix type annotations
lyy1994 Sep 13, 2023
92f38af
Merge pull request #38 from lyy1994/scenario_improvement
lyy1994 Sep 13, 2023
acbc061
Add more comments
lyy1994 Sep 14, 2023
cbd4197
Add one more example
lyy1994 Sep 14, 2023
3c737c1
Replace examples
lyy1994 Sep 14, 2023
3ad2db6
Revise examples
lyy1994 Sep 14, 2023
05e7358
Merge pull request #39 from lyy1994/template_improvement
lyy1994 Sep 14, 2023
858be52
Fix incorrect assert
lyy1994 Sep 16, 2023
e4fd565
Update from `full_functionality_text` to `text`
lyy1994 Sep 16, 2023
997dcf0
Minor debug
lyy1994 Sep 16, 2023
f7c0576
Merge pull request #40 from lyy1994/debug
Jianqiao-Zhao Sep 16, 2023
ea34b43
Minor update for clearer presentation
lyy1994 Sep 19, 2023
2e562f6
Minor debug
Jianqiao-Zhao Sep 19, 2023
5877b19
Rephrase a comment
Jianqiao-Zhao Sep 19, 2023
e4d08a0
Merge pull request #41 from lyy1994/minor_improvement
lyy1994 Sep 19, 2023
f2d01e2
Resolve upstream conflicts
lyy1994 Sep 20, 2023
4cd0f8d
Merge pull request #42 from lyy1994/conflicts
lyy1994 Sep 20, 2023
7da19db
Minor fix
lyy1994 Sep 20, 2023
e87e1c8
Merge branch 'main' into main
lyy1994 Sep 20, 2023
13f280d
Merge branch 'stanford-crfm:main' into main
lyy1994 Sep 21, 2023
dd50a9d
Merge branch 'stanford-crfm:main' into main
lyy1994 Sep 22, 2023
2ec19cb
Add CLEVA accuracy metric descriptions
lyy1994 Sep 22, 2023
9cfc54f
Minor update
lyy1994 Sep 22, 2023
f7d7643
Update metric groups
lyy1994 Sep 22, 2023
125c639
Update taxonomy & metric_groups
lyy1994 Sep 25, 2023
43fb081
Reorganize
lyy1994 Sep 25, 2023
ec2d535
Update CLEVA harms, others and a few scenarios
lyy1994 Sep 25, 2023
95f3de4
Update CLEVA language, knowledge, reasoning schema
lyy1994 Sep 25, 2023
1f5c56f
Correct some fields
lyy1994 Sep 25, 2023
309a7e9
Complete All Schema in schema.yaml
Jianqiao-Zhao Sep 25, 2023
9b30bf2
Refine schema.yaml
Jianqiao-Zhao Sep 26, 2023
ac5f025
Merge pull request #43 from lyy1994/schema
Jianqiao-Zhao Sep 26, 2023
d8e68cb
Update All Download Links
Jianqiao-Zhao Sep 26, 2023
a3270ae
Minor Debug
Jianqiao-Zhao Sep 26, 2023
ec1aed2
Minor Debug
Jianqiao-Zhao Sep 26, 2023
1b33806
Add Type Annotation
Jianqiao-Zhao Sep 26, 2023
5c33af0
Update One File Name
Jianqiao-Zhao Sep 26, 2023
da63e61
Merge pull request #44 from lyy1994/download_link
lyy1994 Sep 26, 2023
3d53210
Update schema.yaml
lyy1994 Sep 27, 2023
008160c
Update run_specs.py
lyy1994 Sep 27, 2023
fd3a2e8
Correct an error
lyy1994 Sep 27, 2023
06308af
Merge pull request #45 from lyy1994/schema
lyy1994 Sep 27, 2023
ed42663
Merge branch 'stanford-crfm:main' into main
lyy1994 Sep 27, 2023
2281fd5
Fix most comments
lyy1994 Sep 28, 2023
4b318ee
Reformat
lyy1994 Sep 28, 2023
e57ef22
Add data collection time
lyy1994 Sep 28, 2023
b913851
Merge pull request #46 from lyy1994/fix_schema
Jianqiao-Zhao Sep 28, 2023
521bfdb
Fix Typos: math_world_problem -> math_word_problem
Jianqiao-Zhao Sep 28, 2023
5e0ddd5
Merge pull request #47 from lyy1994/fix_typos
Jianqiao-Zhao Sep 28, 2023
52d2c69
Update metric name
lyy1994 Sep 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 22 additions & 11 deletions src/helm/benchmark/augmentations/cleva_perturbation.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,15 @@ class Description(PerturbationDescription):
name: str = "chinese_typos"

# For downloading resources
ASSET_URL = "https://drive.google.com/uc?id=1p5mldLpKxI-63H8YEruGJghtD1dZJI8k"
ASSET_URL = "http://39.108.215.175/assets/butter_finger"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it is better to have this be a domain name due to (1) HTTPS security and (2) it's easier to lose an allocated IP address than a domain name. Is there any plan to change the URL back to a HTTPS domain name?

For instance, these assets could be hosted in a GitHub repository.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the Great Firewall, Google Drive is not accessible in China. For this reason, and to ensure our project's long-term stability, we have rented a cloud server with a two-year contract. This means that this IP address and related service will remain stable in the foreseeable future.

In the meantime, we are applying for a domain and an SSL certificate. We estimate the process should be completed within 2 weeks. Immediately after, we will update the URLs to use the new HTTPS domain name.

Please let me know how you feel about this plan. We truly value your suggestions. Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing your plans. Sounds good to me.

FILE_NAMES: List[str] = [
"pinyin_to_char.json",
"toneless_pinyin_to_char.json",
"pinyin_to_common_char.json",
"toneless_pinyin_to_common_char.json",
"pinyin_to_word.json",
"toneless_pinyin_to_word.json",
]

def __init__(
self,
Expand All @@ -62,8 +70,11 @@ def __init__(

# Ensure all necessary data are downloaded
output_dir = os.path.join("benchmark_output", "perturbations", self.name)
ensure_directory_exists(os.path.dirname(output_dir))
ensure_file_downloaded(source_url=self.ASSET_URL, target_path=output_dir, unpack=True, unpack_type="unzip")
ensure_directory_exists(output_dir)
for filename in self.FILE_NAMES:
target_path = os.path.join(output_dir, filename)
SOURCE_URL: str = f"{self.ASSET_URL}/{filename}"
ensure_file_downloaded(source_url=SOURCE_URL, target_path=target_path)

# Load the data for the perturbation
with open(
Expand Down Expand Up @@ -285,7 +296,7 @@ class Description(PerturbationDescription):
name: str = "chinese_synonym"

# For downloading resources
SOURCE_URI: str = "https://drive.google.com/uc?id=1gXyZjoUw6yRjrsrh9ERzB_gxVluMTvij"
SOURCE_URL: str = "http://39.108.215.175/assets/synonyms.json"

def __init__(self, prob: float, trial_num: int = 10):
# Assign parameters to instance variables
Expand All @@ -294,7 +305,7 @@ def __init__(self, prob: float, trial_num: int = 10):

target_dir = os.path.join("benchmark_output", "perturbations", self.name, "synonyms.json")
ensure_directory_exists(os.path.dirname(target_dir))
ensure_file_downloaded(source_url=self.SOURCE_URI, target_path=target_dir)
ensure_file_downloaded(source_url=self.SOURCE_URL, target_path=target_dir)
with open(os.path.join(target_dir)) as f:
self.synonym_dict: Dict[str, List[str]] = json.load(f)

Expand Down Expand Up @@ -377,7 +388,7 @@ class ChineseGenderPerturbation(Perturbation):
MODES = [GENDER_TERM, GENDER_PRONOUN]

""" Resources """
SOURCE_URI: str = "https://drive.google.com/uc?id=1tJ5GLKboQrpzzBYTnFxeRuCOBxYhjFLp"
SOURCE_URL: str = "http://39.108.215.175/assets/gender_term.txt"

@dataclass(frozen=True)
class Description(PerturbationDescription):
Expand Down Expand Up @@ -424,7 +435,7 @@ class must be one of the genders in it. If not, it must be

target_path = os.path.join("benchmark_output", "perturbations", self.name, "gender_term.txt")
ensure_directory_exists(os.path.dirname(target_path))
ensure_file_downloaded(source_url=self.SOURCE_URI, target_path=target_path)
ensure_file_downloaded(source_url=self.SOURCE_URL, target_path=target_path)
with open(target_path) as fin:
for line in fin.readlines():
splits: List[str] = line.strip("\n").split(" ")
Expand Down Expand Up @@ -480,7 +491,7 @@ class ChinesePersonNamePerturbation(Perturbation):
should_perturb_references: bool = True

""" Resources """
SOURCE_URI: str = "https://drive.google.com/uc?id=1nKnfsxREkScrNOyhqiFxP5F1SjRgk6r8"
SOURCE_URL: str = "http://39.108.215.175/assets/chinese_name_gender.json"
OUTPUT_PATH = os.path.join("benchmark_output", "perturbations", name)

""" Gender categories """
Expand Down Expand Up @@ -545,7 +556,7 @@ def __init__(

target_path = os.path.join("benchmark_output", "perturbations", self.name, "chinese_name_gender.json")
ensure_directory_exists(os.path.dirname(target_path))
ensure_file_downloaded(source_url=self.SOURCE_URI, target_path=target_path)
ensure_file_downloaded(source_url=self.SOURCE_URL, target_path=target_path)
with open(os.path.join(target_path), "r", encoding="utf-8") as f:
self.gender2name: Dict[str, List[str]] = json.load(f)
del self.gender2name["unknown"]
Expand Down Expand Up @@ -715,7 +726,7 @@ class MandarinToCantonesePerturbation(Perturbation):
should_perturb_references: bool = True

""" Resources """
SOURCE_URI: str = "https://drive.google.com/uc?id=1vljbwq0hTm7W1tz74gjPnONWJ6kSEwK2"
SOURCE_URL: str = "http://39.108.215.175/assets/conversion.json"

@property
def description(self) -> PerturbationDescription:
Expand All @@ -733,7 +744,7 @@ def __init__(

target_path = os.path.join("benchmark_output", "perturbations", self.name, "conversion.json")
ensure_directory_exists(os.path.dirname(target_path))
ensure_file_downloaded(source_url=self.SOURCE_URI, target_path=target_path)
ensure_file_downloaded(source_url=self.SOURCE_URL, target_path=target_path)
with open(target_path) as fin:
self.phrase_table = json.load(fin)

Expand Down
8 changes: 2 additions & 6 deletions src/helm/benchmark/metrics/classification_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,6 @@ def evaluate_instances(self, request_states: List[RequestState]) -> List[Stat]:
y_pred.append(pred)

return [
Stat(MetricName("multiple_choice_classification_macro_f1")).add(
f1_score(y_pred=y_pred, y_true=y_true, average="macro")
),
Stat(MetricName("multiple_choice_classification_micro_f1")).add(
f1_score(y_pred=y_pred, y_true=y_true, average="micro")
),
Stat(MetricName("classification_macro_f1")).add(f1_score(y_pred=y_pred, y_true=y_true, average="macro")),
Stat(MetricName("classification_micro_f1")).add(f1_score(y_pred=y_pred, y_true=y_true, average="micro")),
]
25 changes: 13 additions & 12 deletions src/helm/benchmark/metrics/cleva_harms_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,16 @@ class CLEVABiasMetric(BiasMetric):
name: str = "chinese_bias"

""" Chinese resources. """
FILE_TO_URL: Dict[str, str] = {
"asian_word_list.txt": "https://drive.google.com/uc?id=1Enm7x1nj5o5DMXdSD3WFqr47F09QgFaM",
"hispanic_word_list.txt": "https://drive.google.com/uc?id=1-JAUBxA0IGiJ0j9nL6xJ8wCP9o-Gh8sS",
"white_word_list.txt": "https://drive.google.com/uc?id=1jbKhE4stKs8VyYmvQjx4aXq_m6oSccHp",
"male_word_list.txt": "https://drive.google.com/uc?id=1xWEsIYzXvQvOlpcID_zYBqfItIrVDIam",
"female_word_list.txt": "https://drive.google.com/uc?id=1Q5e18NfMCqPxdy7mBIteMqPHrJmOs7s_",
"profession_word_list.txt": "https://drive.google.com/uc?id=1baBsev6ippugLwUCQ8lHnLaOSBDstsj3",
"adjective_word_list.txt": "https://drive.google.com/uc?id=1s-jgE6OW-btc921GX9Aos0EhIbYwmBAT",
}
ASSET_URL = "http://39.108.215.175/assets"
FILE_NAMES: List[str] = [
"asian_word_list.txt",
"hispanic_word_list.txt",
"white_word_list.txt",
"male_word_list.txt",
"female_word_list.txt",
"profession_word_list.txt",
"adjective_word_list.txt",
]

def __repr__(self):
return (
Expand All @@ -71,9 +72,9 @@ def __init__(self, mode: str, demographic_category: str, target_category: Option
# Ensure all necessary data are downloaded
self.output_dir = os.path.join("benchmark_output", "metrics", self.name)
ensure_directory_exists(self.output_dir)
for FILENAME, URL in self.FILE_TO_URL.items():
target_path = os.path.join(self.output_dir, FILENAME)
ensure_file_downloaded(source_url=URL, target_path=target_path)
for filename in self.FILE_NAMES:
target_path = os.path.join(self.output_dir, filename)
ensure_file_downloaded(source_url=f"{self.ASSET_URL}/{filename}", target_path=target_path)

# Overwrite inherited mappings
self.build_mappings()
Expand Down
6 changes: 3 additions & 3 deletions src/helm/benchmark/presentation/run_specs_cleva_v1.conf
Original file line number Diff line number Diff line change
Expand Up @@ -217,9 +217,9 @@ entries: [
{description: "cleva:model=text,task=commonsense_reasoning,subtask=textual_entailment,prompt_id=5,version=v1,data_augmentation=cleva", priority: 1}
{description: "cleva:model=full_functionality_text,task=commonsense_reasoning,subtask=commonsense_question_answering,prompt_id=0,version=v1,data_augmentation=cleva", priority: 1}

{description: "cleva:model=text,task=mathematical_reasoning,subtask=math_world_problem,prompt_id=0,version=v1,data_augmentation=cleva", priority: 1}
{description: "cleva:model=text,task=mathematical_reasoning,subtask=math_world_problem,prompt_id=1,version=v1,data_augmentation=cleva", priority: 1}
{description: "cleva:model=text,task=mathematical_reasoning,subtask=math_world_problem,prompt_id=2,version=v1,data_augmentation=cleva", priority: 1}
{description: "cleva:model=text,task=mathematical_reasoning,subtask=math_word_problem,prompt_id=0,version=v1,data_augmentation=cleva", priority: 1}
{description: "cleva:model=text,task=mathematical_reasoning,subtask=math_word_problem,prompt_id=1,version=v1,data_augmentation=cleva", priority: 1}
{description: "cleva:model=text,task=mathematical_reasoning,subtask=math_word_problem,prompt_id=2,version=v1,data_augmentation=cleva", priority: 1}

{description: "cleva:model=text,task=inductive_reasoning,subtask=add,prompt_id=0,version=v1,data_augmentation=cleva", priority: 1}
{description: "cleva:model=text,task=inductive_reasoning,subtask=add,prompt_id=1,version=v1,data_augmentation=cleva", priority: 1}
Expand Down
6 changes: 3 additions & 3 deletions src/helm/benchmark/run_specs.py
Original file line number Diff line number Diff line change
Expand Up @@ -749,7 +749,7 @@ def get_cleva_generative_task_metric_spec(task: str, subtask: Optional[str], **k
"pinyin_transliteration:zh2pinyin": partial(get_basic_metric_specs, ["chinese_bleu_1"]),
"dialogue_generation:task_oriented": partial(get_basic_metric_specs, ["chinese_bleu_1"]),
"data_to_text_generation": partial(get_basic_metric_specs, ["chinese_bleu_1"]),
"mathematical_reasoning:math_world_problem": partial(get_basic_metric_specs, ["cleva_math_result_match"]),
"mathematical_reasoning:math_word_problem": partial(get_basic_metric_specs, ["cleva_math_result_match"]),
}

key: str = task
Expand Down Expand Up @@ -2380,7 +2380,7 @@ def get_anthropic_hh_rlhf_spec(num_respondents: int, subset: str) -> RunSpec:
def get_cleva_spec(task: str, version: str, subtask: str = None, prompt_id: int = 0) -> RunSpec:
from .scenarios.cleva_scenario import CLEVAScenario # noqa

CLEVAScenario.download_dataset()
CLEVAScenario.download_dataset(task, version)

_, prompt_setting = CLEVAScenario.get_prompt_setting(task, subtask, version, prompt_id)
inference_parameters = CLEVAScenario.load_inference_parameters(task, subtask, version, prompt_id)
Expand Down Expand Up @@ -2430,7 +2430,7 @@ def get_cleva_spec(task: str, version: str, subtask: str = None, prompt_id: int
output_suffix=prompt_setting.output_suffix,
max_train_instances=inference_parameters.get("max_train_instances", 5),
num_outputs=inference_parameters.get("num_outputs", 5),
max_tokens=inference_parameters.get("max_tokens", 5),
max_tokens=inference_parameters.get("max_tokens", 1),
temperature=inference_parameters.get("temperature", 0.0),
stop_sequences=inference_parameters.get("stop_sequences", ["\n"]),
sample_train=inference_parameters.get("sample_train", True),
Expand Down
13 changes: 7 additions & 6 deletions src/helm/benchmark/scenarios/cleva_scenario.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from .code_scenario import CodeReference, CodeInstance


CLEVA_DATA_URL = "https://drive.google.com/uc?id=1uteSvq2dOgsmutOOwEziQd_d9i5Ypan6&confirm=t"
CLEVA_DATA_URL = "http://39.108.215.175/data"
CLEVA_DATA_PATH = "benchmark_output/scenarios/cleva"


Expand Down Expand Up @@ -410,10 +410,11 @@ def task(self) -> str:
pass

@classmethod
def download_dataset(cls):
target_dir = os.path.join(CLEVA_DATA_PATH, "data")
ensure_directory_exists(CLEVA_DATA_PATH)
ensure_file_downloaded(source_url=CLEVA_DATA_URL, target_path=target_dir, unpack=True, unpack_type="untar")
def download_dataset(cls, task: str, version: str):
source_url: str = CLEVA_DATA_URL + f"/{version}/{task}.zip"
target_dir: str = os.path.join(CLEVA_DATA_PATH, "data", version)
ensure_directory_exists(target_dir)
ensure_file_downloaded(source_url=source_url, target_path=os.path.join(target_dir, task), unpack=True)

def load_dataset(self) -> Dict[str, List[Dict[str, Any]]]:
data_dir: str = os.path.join(CLEVA_DATA_PATH, "data", self.version, self.task)
Expand Down Expand Up @@ -1483,7 +1484,7 @@ class CLEVAMathematicalReasoningScenario(CLEVAScenario):
For example, we use "所以答案是(只给出数字即可)" (English: Thus, the answer is:) before the answer,
and remove line breaks within the answer.

An example of the math_world_problem subtask is:
An example of the math_word_problem subtask is:
回答以下数学问题

问题:甲数是168,乙数是甲数的4倍,乙数=?请一步一步给出推理过程。
Expand Down
Loading