You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear authors,
I have been trying to reproduce your results in the paper. I realized a problem with the evaluation of Entity Inferences dataset.
To calculate the probability of each label for one sample, probe sentence is forwarded only once with the first label. Then the logits of this pass is used to calculate the probability for all other labels. This approach brings out a problem when the tokenized lengths of labels are different.
For example, in edit_func.py logits for raw model are yielded at line 320:
pre_edit_logits = model_raw(**ex).logits
where ex is the probe sentence with the first label and created at line 312:
Then, at lines between 347-369 in the same file probs are calculated for all labels of that sample:
with torch.no_grad():
n_probe_labels = batch_pre['edit_inner'][0]['labels']['input_ids'].size(0)
pre_edit_dict = []
for i in range(n_probe_labels):
if dataset_name == 'ecbd':
#code
else:
pre_label = batch_pre["edit_inner"][0]['labels']['input_ids'][ i].unsqueeze(0)
pre_edit_dict.append(get_log_probs(pre_edit_logits, pre_label, shift=True))
I believe, for each label pre_edit_logits should be obtained by forwarding the probe sentence with that label as follows:
with torch.no_grad():
n_probe_labels = batch_pre['edit_inner'][0]['labels']['input_ids'].size(0)
pre_edit_dict = []
for i in range(n_probe_labels):
ex = {}
ex['input_ids'] = batch_pre["edit_inner"][0]['labels']['input_ids'][i].unsqueeze(0)
ex['attention_mask'] = batch_pre["edit_inner"][0]['labels'][ 'attention_mask'][i].unsqueeze(0)
ex['labels'] = batch_pre["edit_inner"][0]['labels']['input_ids'][i].unsqueeze(0)
pre_edit_logits = model_raw(**ex).logits
pre_edit_dict.append(get_log_probs( pre_edit_logits, ex['labels'], shift=True))
This issue exist also in other functions in edit_func.py. You can check the validity of the problem with the first sample in fake_person_implicit_attribute_dependent_adjective.json. I get 34.71 pre acc for gpt2-xl on entity inferences dataset with the version I provided.
I might also be misundertanding some parts. Please correct me if I am wrong.
Thanks in advance
The text was updated successfully, but these errors were encountered:
Hi! I was trying out this code and also noticed this issue. But I think this is only an issue when we use edit_method = "prepend_def" right? When I try edit_method = "ft_per_ex" (i.e. finetune one example at a time), I think this is handled correctly (line 191 in edit_func.py) --- I do get 34.71% pre accuracy in this case for gpt2-xl.
Dear authors,
I have been trying to reproduce your results in the paper. I realized a problem with the evaluation of Entity Inferences dataset.
To calculate the probability of each label for one sample, probe sentence is forwarded only once with the first label. Then the logits of this pass is used to calculate the probability for all other labels. This approach brings out a problem when the tokenized lengths of labels are different.
For example, in
edit_func.py
logits for raw model are yielded at line 320:where
ex
is the probe sentence with the first label and created at line 312:Then, at lines between 347-369 in the same file probs are calculated for all labels of that sample:
I believe, for each label
pre_edit_logits
should be obtained by forwarding the probe sentence with that label as follows:This issue exist also in other functions in
edit_func.py
. You can check the validity of the problem with the first sample infake_person_implicit_attribute_dependent_adjective.json
. I get 34.71 pre acc for gpt2-xl on entity inferences dataset with the version I provided.I might also be misundertanding some parts. Please correct me if I am wrong.
Thanks in advance
The text was updated successfully, but these errors were encountered: