Script to download and restore annotated data #4856

Golovneva · 2022-11-01T19:18:03Z

Patch description
Updating ROSCOE with script to download and restore human-annotated data.
Updating links.

Testing steps
% python projects/roscoe/roscoe_data/restore_annotated.py
...
Saved SEMEVAL dataset in ./projects/roscoe/roscoe_data/generated/semevalcommonsense.json
Saved GSM8K dataset in ./projects/roscoe/roscoe_data/generated/gsm8k.json

Note: SEMEVAL dataset is not actually released yet. Commented out in the code.

moyapchen

Looks reasonable for the most part, couple of strings

moyapchen · 2022-11-03T14:02:54Z

projects/roscoe/roscoe_data/download_annotated.sh

@@ -1,4 +1,38 @@
 #!/bin/bash


Ty for putting this together!

moyapchen · 2022-11-03T14:03:42Z

projects/roscoe/roscoe_data/restore_annotated.py

+                reasoning = reasonings[quu1["key"]]["reasoning"]
+                quu1["gpt-3"] = reasoning
+                # TODO: TMP, remove!
+                quu1["dataset"] = "semevalcommonsense_gpt3_expl"


This the right path?

lol, thank you for looking! this is not the path, that's key-value in the output JSON I added to completely match with your files, so I could easily run "diff". Will remove!

moyapchen · 2022-11-03T14:04:16Z

projects/roscoe/roscoe_data/restore_annotated.py

+        data = json.loads(line.strip())
+        blob = {}
+        blob["premise"] = data["question"]
+        blob["hypothesis"] = (


lol, I wonder if we could get rid of this and the other code that strips this

moyapchen · 2022-11-03T14:04:27Z

projects/roscoe/roscoe_data/restore_annotated.py

+            parse_cosmos(input_file, model_output_reasoning, save_file)
+            print(f"Saved COSMOSQA dataset in {save_file}")
+        elif dataset == 'semevalcommonsense':
+            # input_file = '/private/home/aslic/scorer/data/semevalcomsense/train_filter_maryam.xml'


"maryam" in the path

spencerp · 2022-11-03T16:10:47Z

projects/roscoe/roscoe_data/download_annotated.sh

-# this file will contain a command to download annotated datasets into "roscoe_data/annotated" folder
-echo "Sorry, Pending data release approval"
+
+PATH_TO_DATA="./projects/roscoe/roscoe_data"


nit; Usually data is downloaded to ParlAI/data/{project}, where the path to that data/ folder is found in opt['datapath']. I think one reason is that data/ is in the .gitignore (and there may be other assumptions coded in, in other places).

Script to download and restore annotated data

430659e

facebook-github-bot added the CLA Signed label Nov 1, 2022

Golovneva added 2 commits November 1, 2022 12:39

lint

7fc0ebe

lint

bfe6ddc

Golovneva requested review from moyapchen and spencerp November 1, 2022 19:51

update links for public release

fb136fc

Golovneva marked this pull request as ready for review November 3, 2022 13:46

moyapchen approved these changes Nov 3, 2022

View reviewed changes

remove debugging prints

3effc94

Golovneva merged commit 5f41ba7 into main Nov 3, 2022

Golovneva deleted the olggol/ha-sets branch November 3, 2022 16:08

spencerp reviewed Nov 3, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script to download and restore annotated data #4856

Script to download and restore annotated data #4856

Golovneva commented Nov 1, 2022 •

edited

Loading

moyapchen left a comment

moyapchen Nov 3, 2022

moyapchen Nov 3, 2022

Golovneva Nov 3, 2022

moyapchen Nov 3, 2022

moyapchen Nov 3, 2022

Golovneva Nov 3, 2022

spencerp Nov 3, 2022

Script to download and restore annotated data #4856

Script to download and restore annotated data #4856

Conversation

Golovneva commented Nov 1, 2022 • edited Loading

moyapchen left a comment

Choose a reason for hiding this comment

moyapchen Nov 3, 2022

Choose a reason for hiding this comment

moyapchen Nov 3, 2022

Choose a reason for hiding this comment

Golovneva Nov 3, 2022

Choose a reason for hiding this comment

moyapchen Nov 3, 2022

Choose a reason for hiding this comment

moyapchen Nov 3, 2022

Choose a reason for hiding this comment

Golovneva Nov 3, 2022

Choose a reason for hiding this comment

spencerp Nov 3, 2022

Choose a reason for hiding this comment

Golovneva commented Nov 1, 2022 •

edited

Loading