Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Incomplete data #3683

Merged
merged 3 commits into from
Jun 2, 2021
Merged

Incomplete data #3683

merged 3 commits into from
Jun 2, 2021

Conversation

mojtaba-komeili
Copy link
Contributor

Patch description
Skipping units that fail under internal assertions in Mephisto and generating a warning that tells user we are skipping them. This is one issue that one might run into while dealing with corrupted data in Mephisto. Skipping them avoids a potential crash in data compiler/

Testing steps
Using it to compile my dataset. It skipped the corrupted unit as expected.

@JackUrb
Copy link
Contributor

JackUrb commented Jun 1, 2021

Out of curiosity, what is the AssertionError that throws here?

@mojtaba-komeili
Copy link
Contributor Author

Out of curiosity, what is the AssertionError that throws here?

This is what I see after a crash:

Traceback (most recent call last):
  File "data_process/compile_results.py", line 577, in <module>
    wizard_data_compiler.generated_data_set(args.output_folder)
  File "data_process/compile_results.py", line 537, in generated_data_set
    compiled_data = self.compile_results()
  File "data_process/compile_results.py", line 492, in compile_results
    task_units_data.extend(self.get_units_data(task_unit))
  File "/private/home/komeili/dev/ParlAI/parlai/crowdsourcing/utils/analysis.py", line 191, in get_units_data
    unit_data = data_browser.get_data_from_unit(unit)
  File "/private/home/komeili/dev/Mephisto/mephisto/tools/data_browser.py", line 82, in get_data_from_unit
    ), f"Trying to get completed data from unassigned unit {unit}"
AssertionError: Trying to get completed data from unassigned unit MTurkUnit(247954, completed)

Copy link
Contributor

@JackUrb JackUrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this data that you're loading from new runs? I don't imagine we should be running into many cases where a completed unit has no data anymore, but if not I'll have to revisit.

Otherwise, this looks fine to me!

@mojtaba-komeili
Copy link
Contributor Author

Is this data that you're loading from new runs? I don't imagine we should be running into many cases where a completed unit has no data anymore, but if not I'll have to revisit.

Otherwise, this looks fine to me!

Yeah, it is using the new singleton system.

@mojtaba-komeili mojtaba-komeili merged commit b12f024 into master Jun 2, 2021
@mojtaba-komeili mojtaba-komeili deleted the incomplete-data branch June 2, 2021 18:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants