-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[squad] make examples and dataset accessible from SquadDataset object #6710
Conversation
07ce328
to
1dbe4bf
Compare
Codecov Report
@@ Coverage Diff @@
## master #6710 +/- ##
==========================================
- Coverage 80.06% 78.78% -1.28%
==========================================
Files 156 156
Lines 28386 28391 +5
==========================================
- Hits 22726 22367 -359
- Misses 5660 6024 +364
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but will break existing saved features. I don't think that's too much of an issue though, what do you think @sgugger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the breaking change should be handled more carefully: cached dataset for squad are used by the run_squad.py
script so a lot of users probably have some and the code will suddenly fail for them.
@sgugger @LysandreJik thanks so much for the comments/suggestions! I have updated the code to include support for the legacy cache format. I had a question on one comment, but if there are any other changes needed please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your changes, it's good to go now!
…huggingface#6710) * [squad] make examples and dataset accessible from SquadDataset object * [squad] add support for legacy cache files
…huggingface#6710) * [squad] make examples and dataset accessible from SquadDataset object * [squad] add support for legacy cache files
…t object (huggingface#6710)" This reverts commit 83848b6.
In order to do evaluation on the SQuAD dataset using squad_evaluate, the user needs access to both the examples loaded in the dataset and the TensorDataset that contains values like unique_id and the like that are used in constructing the list of SquadResult objects. This PR surfaces the examples and dataset to the user so that they can access it directly.
For example of why access to those is needed, see how evaluation is currently done in examples/run_squad.py. The SquadDataset object attempts to wrap up some of this functionality, but without access to examples and dataset the evaluation is not possible.