Reproducibility Issues (Bias Score and Accuracy) #12

rajneesh407 · 2024-09-12T10:55:16Z

I have been working on reproducing the results using the code from your repository and implementing it in Python. I successfully converted the provided R code into Python, and the outputs from both versions match. However, the results from your repository code do not align with the figures presented in your research paper.

Code used: [BBQ_calculate_bias_score.R] (https://github.com/nyu-mll/BBQ/blob/main/analysis_scripts/BBQ_calculate_bias_score.R)
Research paper link: [QA Bias Benchmark] (https://github.com/nyu-mll/BBQ/blob/main/QA_bias_benchmark.pdf)

Here is the output I obtained using the R code from your repository:

Comparing Dberta V3 Base (For Disambiguous):
Comparison With Paper :
- Age : Match
- Disability : Match
- Gender Identity : 13.9 (R-Code) instead of 15 (paper)
- Gender Identity Names : 12.7 instead of 14
- Nationality : Match
- Physical Appearance : 41.8 instead of 41
- Race and Ethnicity : 4.7 instead of 4.6
- Race and Ethnicity Names : Match
- Religion : Match
- Sexual Orientation : Match
- SES : Match

Same patter can be seen across other models also.

Any help/clarifications would be appreciated here. Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility Issues (Bias Score and Accuracy) #12

Reproducibility Issues (Bias Score and Accuracy) #12

rajneesh407 commented Sep 12, 2024

Reproducibility Issues (Bias Score and Accuracy) #12

Reproducibility Issues (Bias Score and Accuracy) #12

Comments

rajneesh407 commented Sep 12, 2024