it seems that adashield-a doesn't update defense prompts. #4

payphone131 · 2024-08-08T11:01:16Z

hello, i just run train_our_qr.sh and got some csv files. i found that in the csv files you record the initial scores and the final scores of the queries for each scenario. i noticed that if an initial score is 10, it never becomes 1 or 5 in the final score, which suggests adashield-a didn't change an invalid defense prompt into a successful defense prompt. is this normal? i used llava as the target model and vicuna as the defense model.

rain305f · 2024-08-09T08:01:09Z

It is not normal. This is my results files. You can observe that the replies obtained by using the updated defense prompt for defense are safe for manual judgment. One possible reason is that there may be a problem with the judge.
final_table.csv

payphone131 · 2024-08-09T08:38:36Z

i found the judge score is calculated by string matching in your code, and the function "load_judge(args)" in judges.py is never used. do i have to modify the code to use llm to give judge scores?

rain305f · 2024-08-09T15:41:34Z

So are the replies obtained by using the updated defense prompt safe (manual judgment)? And the judge score calculated by string matching is unsafe?

payphone131 · 2024-08-16T07:56:23Z

the replies are not updated and maintain what they are at the beginning. btw, i found that line 121 and line 122 in main_queryrelated.py initialize the prompt during iterations, should i comment these two lines?

jdg900 · 2024-10-04T09:04:08Z

Same here.. Did you resolve this issue? @payphone131

payphone131 · 2024-10-17T02:30:17Z

Same here.. Did you resolve this issue? @payphone131

no.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

it seems that adashield-a doesn't update defense prompts. #4

it seems that adashield-a doesn't update defense prompts. #4

payphone131 commented Aug 8, 2024

rain305f commented Aug 9, 2024

payphone131 commented Aug 9, 2024

rain305f commented Aug 9, 2024

payphone131 commented Aug 16, 2024

jdg900 commented Oct 4, 2024

payphone131 commented Oct 17, 2024

it seems that adashield-a doesn't update defense prompts. #4

it seems that adashield-a doesn't update defense prompts. #4

Comments

payphone131 commented Aug 8, 2024

rain305f commented Aug 9, 2024

payphone131 commented Aug 9, 2024

rain305f commented Aug 9, 2024

payphone131 commented Aug 16, 2024

jdg900 commented Oct 4, 2024

payphone131 commented Oct 17, 2024