Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trouble with bert-based model #3

Open
CuongNN218 opened this issue Jul 6, 2023 · 2 comments
Open

trouble with bert-based model #3

CuongNN218 opened this issue Jul 6, 2023 · 2 comments

Comments

@CuongNN218
Copy link

CuongNN218 commented Jul 6, 2023

Dear Dr.Arazd @arazd ,
Thanks for your great work. I'm trying to replicate your result in Table.1 with the order 4 (5 tasks - bert base-uncased model) - CL setting (full data) in the main paper with the following cmd:
python train_cl2.py --task_list ag yelp_review_full amazon yahoo dbpedia --prefix_MLP residual_MLP2 --lr 1e-4 --num_epochs 40 --freeze_weights 1 --freeze_except word_embeddings \ --prompt_tuning 1 --prefix_len 20 --seq_len 450 --one_head 0 \ --model_name bert-base-uncased --early_stopping 1 \ --save_name BERT_order_4_run1 --save_dir ./results

However, when the progressive prompt model evaluate the accuracy on all dataset, it thrown an error like this when it started evaluating on yahoo dataset:
/opt/conda/conda-bld/pytorch_1639180487213/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [60,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.

This issue only appears in the evaluation of the 4th task, I tried other settings such as shorter sequence (2, 3 tasks), removing ResMLP, ... but it works as normal. I also tried to print the pattern of the input ids, token_type_ids, and position_ids but the 4th task's pattern is similar to previous tasks.

the error stems from this line in your repo: https://github.com/arazd/ProgressivePrompts/blob/01572d6a73c0576b070ceee00dbe4f5bc278423f/BERT_codebase/model_utils.py#L576

Could you give me some insight about this problem?
I really appreciate if you can help me to fix that.

P/s: After troubleshooting the source of those problem, I found that it only occur when the sequence length longer than 4, and evaluating with full validation set.

@CuongNN218
Copy link
Author

Hi,
after troubleshooting the problem, I found that this issue stems from the embedding size of bert-base-uncased model. In particular, if we set the seq_len =450, after prepreding soft prompts with 20 tokens per task from 4 previous tasks, the embedding size is seq_len = 450 + 20 * 4 = 530 > 512, 512 is the maximum size of embedding for bert-base-uncased model. Would you like to share the len of the input sequence used to reproduce table1.b? I haven't found it in the main paper. Thanks.

@mingyang-wang26
Copy link

mingyang-wang26 commented Oct 13, 2023

Hi, after troubleshooting the problem, I found that this issue stems from the embedding size of bert-base-uncased model. In particular, if we set the seq_len =450, after prepreding soft prompts with 20 tokens per task from 4 previous tasks, the embedding size is seq_len = 450 + 20 * 4 = 530 > 512, 512 is the maximum size of embedding for bert-base-uncased model. Would you like to share the len of the input sequence used to reproduce table1.b? I haven't found it in the main paper. Thanks.

I also met the same problem, BERT model cannot handle with the hyperparamters given in this repository: seq_len=450 with the prompt length=20. I'm curious what hyperparameters did the authors actually use to get the result in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants