Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

it seems that adashield-a doesn't update defense prompts. #4

Open
payphone131 opened this issue Aug 8, 2024 · 6 comments
Open

it seems that adashield-a doesn't update defense prompts. #4

payphone131 opened this issue Aug 8, 2024 · 6 comments

Comments

@payphone131
Copy link

hello, i just run train_our_qr.sh and got some csv files. i found that in the csv files you record the initial scores and the final scores of the queries for each scenario. i noticed that if an initial score is 10, it never becomes 1 or 5 in the final score, which suggests adashield-a didn't change an invalid defense prompt into a successful defense prompt. is this normal? i used llava as the target model and vicuna as the defense model.

@rain305f
Copy link
Collaborator

rain305f commented Aug 9, 2024

It is not normal. This is my results files. You can observe that the replies obtained by using the updated defense prompt for defense are safe for manual judgment. One possible reason is that there may be a problem with the judge.
final_table.csv

@payphone131
Copy link
Author

i found the judge score is calculated by string matching in your code, and the function "load_judge(args)" in judges.py is never used. do i have to modify the code to use llm to give judge scores?

@rain305f
Copy link
Collaborator

rain305f commented Aug 9, 2024

So are the replies obtained by using the updated defense prompt safe (manual judgment)? And the judge score calculated by string matching is unsafe?

@payphone131
Copy link
Author

the replies are not updated and maintain what they are at the beginning. btw, i found that line 121 and line 122 in main_queryrelated.py initialize the prompt during iterations, should i comment these two lines?

@jdg900
Copy link

jdg900 commented Oct 4, 2024

Same here.. Did you resolve this issue? @payphone131

@payphone131
Copy link
Author

Same here.. Did you resolve this issue? @payphone131

no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants