You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't know if there was a problem with the data processing or the metadata.jsonl file was created incorrectly. I found that the input_ids input to the donut model contained the answer part. Is this normal?
You can see the following input_ids:
From my experience or based on the code provided by the author, prompt should be like: <s_docvqa> <s_question><question></s_question> <s_answer>
Part of my metadata.jsonl file is as follows {"file_name": "sxxj0037_2.png", "ground_truth": "{\"gt_parses\": [{\"question\": \"How many points are there in modifications to readout instrumentation\", \"answer\": \"5.\"}]}"} {"file_name": "tynx0037_1.png", "ground_truth": "{\"gt_parses\": [{\"question\": \"What is the first line of the address mentioned at the top?\", \"answer\": \"Reynolds Building\"}, {\"question\": \"What is the date mentioned?\", \"answer\": \"May 4, 2000\"}]}"} {"file_name": "mtyj0226_1.png", "ground_truth": "{\"gt_parses\": [{\"question\": \"What is the word written in bold black in the first picture?\", \"answer\": \"Coke\"}]}"}
The text was updated successfully, but these errors were encountered:
I don't know if there was a problem with the data processing or the metadata.jsonl file was created incorrectly. I found that the input_ids input to the donut model contained the answer part. Is this normal?
You can see the following input_ids:
tensor([[57527, 57529, 11604, 52743, 48941, 45383, 18528, 43095, 36477, 46385, 35647, 36209, 57524, 57526, 46481, 23485, 35815, 4768, 57523, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:2')
I used tokenizer to decode it and got the result:
<s_docvqa> <s_question> ▁When ▁is ▁the ▁response ▁code ▁request ▁form ▁dat ed ? </s_question> <s_answer> ▁September ▁10 , ▁1996 </s_answer> </s> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>
From my experience or based on the code provided by the author, prompt should be like:
<s_docvqa> <s_question><question></s_question> <s_answer>
Part of my metadata.jsonl file is as follows
{"file_name": "sxxj0037_2.png", "ground_truth": "{\"gt_parses\": [{\"question\": \"How many points are there in modifications to readout instrumentation\", \"answer\": \"5.\"}]}"}
{"file_name": "tynx0037_1.png", "ground_truth": "{\"gt_parses\": [{\"question\": \"What is the first line of the address mentioned at the top?\", \"answer\": \"Reynolds Building\"}, {\"question\": \"What is the date mentioned?\", \"answer\": \"May 4, 2000\"}]}"}
{"file_name": "mtyj0226_1.png", "ground_truth": "{\"gt_parses\": [{\"question\": \"What is the word written in bold black in the first picture?\", \"answer\": \"Coke\"}]}"}
The text was updated successfully, but these errors were encountered: