-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
it seems that adashield-a doesn't update defense prompts. #4
Comments
It is not normal. This is my results files. You can observe that the replies obtained by using the updated defense prompt for defense are safe for manual judgment. One possible reason is that there may be a problem with the judge. |
i found the judge score is calculated by string matching in your code, and the function "load_judge(args)" in judges.py is never used. do i have to modify the code to use llm to give judge scores? |
So are the replies obtained by using the updated defense prompt safe (manual judgment)? And the judge score calculated by string matching is unsafe? |
the replies are not updated and maintain what they are at the beginning. btw, i found that line 121 and line 122 in main_queryrelated.py initialize the prompt during iterations, should i comment these two lines? |
Same here.. Did you resolve this issue? @payphone131 |
no. |
hello, i just run train_our_qr.sh and got some csv files. i found that in the csv files you record the initial scores and the final scores of the queries for each scenario. i noticed that if an initial score is 10, it never becomes 1 or 5 in the final score, which suggests adashield-a didn't change an invalid defense prompt into a successful defense prompt. is this normal? i used llava as the target model and vicuna as the defense model.
The text was updated successfully, but these errors were encountered: