-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[master] Add new BERT calibration dataset #1171
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
LGTM. @psyhtest Please review. |
1c336c1
to
e53d936
Compare
@nvitramble Looks ok! If you would like this change to be included in Inferece_v2.1, please also make a PR to the branch |
ResNet50 is the only benchmark with two calibration datasets. And it's only an historic accident - NVIDIA and Intel proposed their calibration datasets at the same tim, so the grudging consensus was to allow submitters use either.
This may well be the case. But why simply not convert the old calibration examples into the new format? |
As discussed in this morning's WG, there is not a 1:1 mapping from (question, context) pairs in dev-v1.1.json to features since contexts get split when len(context + question) > max_seq_len. And to followup from the WG, is the current calibration dataset used by any submitters? My understandings is that it is unused - for int8 submitters would use the QAT'ed models (which include quantization scales) here and here. |
To explain the motivation more, the current calibration dataset assumes a deterministic mapping from dev-v1.1.json question+context pairs to features. Different implementations of the featurizing function may not maintain the same ordering (e.g. if using huggingface). It is more robust to define the calibration examples in terms of unique ids, since that makes no assumptions about the featurizer (e.g. for ImageNet the calibration set is defined in terms of image file names, for LibriSpeech the calibration set is defined in terms of wav file names, etc.). |
e53d936
to
28c0d40
Compare
Either one of the calibration sets is allowed. |
@nvitramble Should we cherry-pick this to r2.1 branch? |
Nvm, it's already done in #1172 |
The purpose of this change is to:
The new calibration examples are randomly selected from dev-v1.1.json. Defining the examples in terms of qas ids can be more convenient for some PTQ workflows. Other benchmarks (e.g. resnet50) already provide multiple calibration datasets (see here).