You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found an example in the MedQA EN questions in dev.jsonl where one of the answers had extra characters (a new line and a double quote) appended to it:
Answer choice E is Pulmonary embolism\n\"
{"question": "A 47-year-old man comes to the physician because of severe retrosternal chest pain and shortness of breath for 45 minutes. He has dyslipidemia, hypertension, and type 2 diabetes mellitus. Current medications include hydrochlorothiazide, lisinopril, metformin, and atorvastatin. He has smoked 1 pack of cigarettes daily for 20 years. He appears pale and diaphoretic. His temperature is 37°C (98.6°F), pulse is 115/min, and blood pressure is 140/70 mm Hg. Breath sounds are normal. The remainder of the examination shows no abnormalities. An ECG shows left ventricular hypertrophy with ST-segment elevation in leads I, aVL, and V1–V6. High-dose aspirin, clopidogrel, metoprolol, sublingual nitroglycerin, and unfractionated heparin are administered. As the patient awaits transport to the nearest emergency room, he collapses and becomes unresponsive. His pulse and blood pressure cannot be detected. Despite resuscitative efforts, the patient dies. Which of the following is the most likely cause of death in this patient?", "answer": "Ventricular fibrillation", "options": {"A": "Papillary muscle rupture", "B": "Left ventricular failure", "C": "Ventricular fibrillation", "D": "Septal wall rupture", "E": "Pulmonary embolism\n\""}, "meta_info": "step2&3", "answer_idx": "C"}
The text was updated successfully, but these errors were encountered:
hi, thanks for the notice! These should be some special strings attached to the options in the webpages we parse during the data collection process. An easy shot is to remove them by string matching, e.g., x.replace('"', '').replace('\n', '') (x is the option string).
I found an example in the MedQA EN questions in
dev.jsonl
where one of the answers had extra characters (a new line and a double quote) appended to it:Answer choice E is
Pulmonary embolism\n\"
{"question": "A 47-year-old man comes to the physician because of severe retrosternal chest pain and shortness of breath for 45 minutes. He has dyslipidemia, hypertension, and type 2 diabetes mellitus. Current medications include hydrochlorothiazide, lisinopril, metformin, and atorvastatin. He has smoked 1 pack of cigarettes daily for 20 years. He appears pale and diaphoretic. His temperature is 37°C (98.6°F), pulse is 115/min, and blood pressure is 140/70 mm Hg. Breath sounds are normal. The remainder of the examination shows no abnormalities. An ECG shows left ventricular hypertrophy with ST-segment elevation in leads I, aVL, and V1–V6. High-dose aspirin, clopidogrel, metoprolol, sublingual nitroglycerin, and unfractionated heparin are administered. As the patient awaits transport to the nearest emergency room, he collapses and becomes unresponsive. His pulse and blood pressure cannot be detected. Despite resuscitative efforts, the patient dies. Which of the following is the most likely cause of death in this patient?", "answer": "Ventricular fibrillation", "options": {"A": "Papillary muscle rupture", "B": "Left ventricular failure", "C": "Ventricular fibrillation", "D": "Septal wall rupture", "E": "Pulmonary embolism\n\""}, "meta_info": "step2&3", "answer_idx": "C"}
The text was updated successfully, but these errors were encountered: