Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Evaluation #4

Open
duygunuryldz opened this issue Apr 8, 2024 · 2 comments
Open

Problem with Evaluation #4

duygunuryldz opened this issue Apr 8, 2024 · 2 comments

Comments

@duygunuryldz
Copy link

duygunuryldz commented Apr 8, 2024

Dear authors,
I have been trying to reproduce your results in the paper. I realized a problem with the evaluation of Entity Inferences dataset.
To calculate the probability of each label for one sample, probe sentence is forwarded only once with the first label. Then the logits of this pass is used to calculate the probability for all other labels. This approach brings out a problem when the tokenized lengths of labels are different.

For example, in edit_func.py logits for raw model are yielded at line 320:

pre_edit_logits = model_raw(**ex).logits

where ex is the probe sentence with the first label and created at line 312:

ex = {}
ex['input_ids'] = batch_pre["edit_inner"][0]['labels']['input_ids'][0].unsqueeze(0)
ex['attention_mask'] = batch_pre["edit_inner"][0]['labels'][ 'attention_mask'][0].unsqueeze(0)
ex['labels'] = batch_pre["edit_inner"][0]['labels']['input_ids'][ 0].unsqueeze(0)

Then, at lines between 347-369 in the same file probs are calculated for all labels of that sample:

    with torch.no_grad():
        n_probe_labels = batch_pre['edit_inner'][0]['labels']['input_ids'].size(0)
        pre_edit_dict = []
        for i in range(n_probe_labels):
            if dataset_name == 'ecbd':
                #code
            else:
                pre_label =  batch_pre["edit_inner"][0]['labels']['input_ids'][ i].unsqueeze(0)
            pre_edit_dict.append(get_log_probs(pre_edit_logits, pre_label, shift=True))

I believe, for each label pre_edit_logits should be obtained by forwarding the probe sentence with that label as follows:

with torch.no_grad():
        n_probe_labels = batch_pre['edit_inner'][0]['labels']['input_ids'].size(0)
        pre_edit_dict = []
        for i in range(n_probe_labels):
            ex = {}
            ex['input_ids'] = batch_pre["edit_inner"][0]['labels']['input_ids'][i].unsqueeze(0)
            ex['attention_mask'] = batch_pre["edit_inner"][0]['labels'][ 'attention_mask'][i].unsqueeze(0)
            ex['labels'] = batch_pre["edit_inner"][0]['labels']['input_ids'][i].unsqueeze(0)

            pre_edit_logits = model_raw(**ex).logits
            pre_edit_dict.append(get_log_probs( pre_edit_logits, ex['labels'], shift=True))

This issue exist also in other functions in edit_func.py. You can check the validity of the problem with the first sample in fake_person_implicit_attribute_dependent_adjective.json. I get 34.71 pre acc for gpt2-xl on entity inferences dataset with the version I provided.

I might also be misundertanding some parts. Please correct me if I am wrong.
Thanks in advance

@joshinh
Copy link

joshinh commented Apr 24, 2024

Hi! I was trying out this code and also noticed this issue. But I think this is only an issue when we use edit_method = "prepend_def" right? When I try edit_method = "ft_per_ex" (i.e. finetune one example at a time), I think this is handled correctly (line 191 in edit_func.py) --- I do get 34.71% pre accuracy in this case for gpt2-xl.

@EmilyGirl
Copy link

Hello, do you have any code for generating using the generate function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants