-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data attestation as a requirement #481
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #481 +/- ##
==========================================
+ Coverage 73.90% 73.96% +0.05%
==========================================
Files 45 45
Lines 2859 2865 +6
Branches 443 444 +1
==========================================
+ Hits 2113 2119 +6
Misses 661 661
Partials 85 85 ☔ View full report in Codecov by Sentry. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: I deleted permissions_obtained: true
from the config file the wizard wrote for me, and then ran everyvoice preprocess config/everyvoice-text-to-spec.yaml
and it ran without error. Instead, it should have failed with the permission message.
everyvoice train text-to-spec config/everyvoice-text-to-spec.yaml
also proceeded without telling me abot the missing permissions.
The concept is good and sufficient: there is no way for us to verify they have permission, all we can do is force the user to declare to the software that they do, and this PR (once the bugs above are fixed) does that.
@field_validator("permissions_obtained") | ||
def check_permissions(cls, permissions_obtained: bool) -> bool: | ||
if not permissions_obtained: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coverage says this raise
statement is not exercised by unit testing. It should be easy to add one test case where you create a dataset without specifying permission=True, maybe just duplicating the test case you had to fix with permissions_obtained=True
, but without that, it would exercise this.
also add a unittest
Thanks for the quick review! I forgot to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
Hum, approved, but something unrelated seems to be failing in CI, it would be best to fix that right away, before merging if you can. |
this should be fixed at a later date by fixing #359
PR Goal?
This PR makes data attestations mandatory for running any model that points to valid data.
Fixes?
#465
Feedback sought?
This is of course able to be circumvented, but do you feel like it's enough of a deterrent for now? I will add more documentation in another PR. Maybe try to run a current model you have that does not have the permissions added and see what happens? It should fail with an intelligible error message.
Priority?
alpha
Tests added?
Tests updated, none added.
How to test?
The wizard should now produce a config with
permissions_obtained: True
for each dataset which should be runnable. Other configs will not be runnable, nor will existing checkpoints.Confidence?
medium
Version change?
it would be normally, but just alpha. It's a breaking change so any existing models will have to add
permissions_obtained: True
for each dataset in the configRelated PRs?
none