calibration standards for French prompt sets #818

rogthefrog · 2025-01-25T09:18:06Z

Generated calibration standards using the new official fr_fr prompt sets
Added calibration standards for the demo prompt set

github-actions · 2025-01-25T09:18:17Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

wpietri · 2025-01-25T13:28:27Z

src/modelbench/standards.json

-            "safe_hazard-1.0-sxc-fr_fr-practice": 0.52,
-            "safe_hazard-1.0-vcr-fr_fr-practice": 0.68
-        }
+            "safe_hazard-1.0-cse-fr_fr-practice_fr_fr": 0.72,


Seems a little odd to me that the practice calibration all have only two significant digits. And, come to think of it, that the demo have three. Shouldn't it be the other way around?

Also, for consistency with other uids, should we really have the duplicate fr_fr in these UIDs? I would expect them to be more like safe_hazard-1.0-dfm-fr_fr-practice.

We support multiple locales per prompt set. So a (locale, prompt-set-type) pair isn't sufficient to uniquely identify a prompt set.

Re. the significant digits, I did have the labels backwards.

I'll rerun the calibration with the demo set and update once they're done.

What's an example of having multiple locales in a prompt set? I was thinking they'd be separate.

And thanks for the update. FYI, there's no need to calibrate specifically for the demo set; the conclusion from Kurt in today's meeting was that the demo set was too small for statistical comfort, so we should just use the calibration for the set they're drawn from. @dhosterman was going to add code for that.

What's an example of having multiple locales in a prompt set? I was thinking they'd be separate.

Currently, the prompt set files contain one locale, but the code and associated design comments indicate there may be more than one locale per prompt set file.

modelbench/src/modelgauge/tests/safe_v1.py

Line 58 in 6046471

- There many be multiple personas and locales in one file.

If we want to get rid of that support, I'm all for it, because it would simplify things a lot.

calibration standards for French prompt sets

b50a551

rogthefrog requested a review from a team as a code owner January 25, 2025 09:18

rogthefrog temporarily deployed to Scheduled Testing January 25, 2025 09:18 — with GitHub Actions Inactive

rogthefrog requested review from wpietri, bollacker, dhosterman and bkorycki January 25, 2025 09:18

rogthefrog mentioned this pull request Jan 25, 2025

French calibration - full prompt set #810

Open

wpietri requested changes Jan 25, 2025

View reviewed changes

fix labels

4569954

rogthefrog temporarily deployed to Scheduled Testing January 27, 2025 20:39 — with GitHub Actions Inactive

fix labels

d52d0fb

rogthefrog temporarily deployed to Scheduled Testing January 27, 2025 21:18 — with GitHub Actions Inactive

rogthefrog added 3 commits January 27, 2025 14:24

use practice standards for demo, for the time being

760ee19

French has landed

c914f79

add official French prompt set and demo English prompt set

998b007

rogthefrog had a problem deploying to Scheduled Testing January 27, 2025 22:32 — with GitHub Actions Failure

deal with the official vs heldback prompt set designation

787bdca

rogthefrog temporarily deployed to Scheduled Testing January 27, 2025 22:56 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calibration standards for French prompt sets #818

calibration standards for French prompt sets #818

rogthefrog commented Jan 25, 2025 •

edited

Loading

github-actions bot commented Jan 25, 2025 •

edited

Loading

wpietri Jan 25, 2025

rogthefrog Jan 27, 2025

rogthefrog Jan 27, 2025

wpietri Jan 27, 2025

rogthefrog Jan 27, 2025 •

edited

Loading

calibration standards for French prompt sets #818

Are you sure you want to change the base?

calibration standards for French prompt sets #818

Conversation

rogthefrog commented Jan 25, 2025 • edited Loading

github-actions bot commented Jan 25, 2025 • edited Loading

wpietri Jan 25, 2025

Choose a reason for hiding this comment

rogthefrog Jan 27, 2025

Choose a reason for hiding this comment

rogthefrog Jan 27, 2025

Choose a reason for hiding this comment

wpietri Jan 27, 2025

Choose a reason for hiding this comment

rogthefrog Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

rogthefrog commented Jan 25, 2025 •

edited

Loading

github-actions bot commented Jan 25, 2025 •

edited

Loading

rogthefrog Jan 27, 2025 •

edited

Loading