[P0] Can't reproduce commonsense reasoning numbers #141

dyahadila · 2024-10-29T19:05:16Z

Hello!

I am trying to reproduce commonsense reasoning numbers on LLaMA with this run script: https://github.com/stanfordnlp/pyreft/tree/main/examples/loreft#commonsense-reasoning-tasks

But my eval_results.json after training is showing all zeros:

    "eval/boolq": 0.0,
    "eval/piqa": 0.0,
    "eval/social_i_qa": 0.0,
    "eval/hellaswag": 0.0,
    "eval/winogrande": 0.0,
    "eval/ARC-Easy": 0.0,
    "eval/ARC-Challenge": 0.0,
    "eval/openbookqa": 0.0,
    "n_params": 2097408
}```

I am only reducing batch size from 16 to 8, but other than that everything is identical to the run script. I checked my training log and at some point the loss is nan.

Any advice is appreciated!

The text was updated successfully, but these errors were encountered:

frankaging · 2024-10-29T19:38:34Z

Hey @dyahadila Thanks for reporting this issue.

Could you share your running command? and your loss curve? I did observe sometimes DiReFT might not converge due to numeric instability when overfitting to short-generation task like commensense reasoning. Could you also try another random seed, or reduce your learning rate a little and epoch to 3 to see if it solve the issue?

frankaging · 2024-10-29T19:42:11Z

BTW, we shared commonsense reasoning logs (loss curve, final eval, stats) through wandb (link). You might find it helpful. But since there are changes in this repo as well as in the huggingface transformer repo, this could be a new issue. If you could, also try to lower your transformers version to something like 4.44.0 which is closer to the version we used before.

dyahadila · 2024-10-29T19:47:41Z

I was using LoREFT, and this is my command:

python train.py -task commonsense \
-data_dir dataset \
-model yahma/llama-7b-hf \
-seed 42 \
-l all -r 8 -p f7+l7 -e 6 -lr 9e-4 \
-type LoreftIntervention \
-gradient_accumulation_steps 2 \
-batch_size 8 \
-eval_batch_size 4 \
--dropout 0.00 \
--test_split test \
--use_normalized_template \
--share_weights \
--warmup_ratio 0.1 \
--greedy_decoding

identical to https://github.com/stanfordnlp/pyreft/tree/main/examples/loreft#commonsense-reasoning-tasks just with smaller batch size..

I am trying ep=6 lr = 1e-4
and ep=3 lr=5e-4 now! will let you know how these 2 turn out

frankaging · 2024-10-29T20:08:38Z

And did you install pyreft by using pip install git+https://github.com/stanfordnlp/pyreft.git?

dyahadila · 2024-10-29T20:26:33Z

i used pip install pyreft

frankaging changed the title ~~Can't reproduce commonsense reasoning numbers~~ [P0] Can't reproduce commonsense reasoning numbers Oct 29, 2024

frankaging self-assigned this Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P0] Can't reproduce commonsense reasoning numbers #141

[P0] Can't reproduce commonsense reasoning numbers #141

dyahadila commented Oct 29, 2024

frankaging commented Oct 29, 2024

frankaging commented Oct 29, 2024 •

edited

Loading

dyahadila commented Oct 29, 2024

frankaging commented Oct 29, 2024

dyahadila commented Oct 29, 2024

[P0] Can't reproduce commonsense reasoning numbers #141

[P0] Can't reproduce commonsense reasoning numbers #141

Comments

dyahadila commented Oct 29, 2024

frankaging commented Oct 29, 2024

frankaging commented Oct 29, 2024 • edited Loading

dyahadila commented Oct 29, 2024

frankaging commented Oct 29, 2024

dyahadila commented Oct 29, 2024

frankaging commented Oct 29, 2024 •

edited

Loading