Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data attestation as a requirement #481

Merged
merged 8 commits into from
Jun 21, 2024
Merged

Conversation

roedoejet
Copy link
Member

PR Goal?

This PR makes data attestations mandatory for running any model that points to valid data.

Fixes?

#465

Feedback sought?

This is of course able to be circumvented, but do you feel like it's enough of a deterrent for now? I will add more documentation in another PR. Maybe try to run a current model you have that does not have the permissions added and see what happens? It should fail with an intelligible error message.

Priority?

alpha

Tests added?

Tests updated, none added.

How to test?

The wizard should now produce a config with permissions_obtained: True for each dataset which should be runnable. Other configs will not be runnable, nor will existing checkpoints.

Confidence?

medium

Version change?

it would be normally, but just alpha. It's a breaking change so any existing models will have to add permissions_obtained: True for each dataset in the config

Related PRs?

none

@roedoejet roedoejet requested review from joanise and SamuelLarkin and removed request for joanise June 20, 2024 22:17
Copy link

codecov bot commented Jun 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.96%. Comparing base (181ba61) to head (7eb38e0).

Current head 7eb38e0 differs from pull request most recent head 6eb551e

Please upload reports for the commit 6eb551e to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #481      +/-   ##
==========================================
+ Coverage   73.90%   73.96%   +0.05%     
==========================================
  Files          45       45              
  Lines        2859     2865       +6     
  Branches      443      444       +1     
==========================================
+ Hits         2113     2119       +6     
  Misses        661      661              
  Partials       85       85              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Jun 20, 2024

CLI load time: 0:00.23
Pull Request HEAD: 6eb551eeb1048cbddf3a061361e674f0352610e6
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

Copy link
Member

@joanise joanise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: I deleted permissions_obtained: true from the config file the wizard wrote for me, and then ran everyvoice preprocess config/everyvoice-text-to-spec.yaml and it ran without error. Instead, it should have failed with the permission message.

everyvoice train text-to-spec config/everyvoice-text-to-spec.yaml also proceeded without telling me abot the missing permissions.

The concept is good and sufficient: there is no way for us to verify they have permission, all we can do is force the user to declare to the software that they do, and this PR (once the bugs above are fixed) does that.

@field_validator("permissions_obtained")
def check_permissions(cls, permissions_obtained: bool) -> bool:
if not permissions_obtained:
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage says this raise statement is not exercised by unit testing. It should be easy to add one test case where you create a dataset without specifying permission=True, maybe just duplicating the test case you had to fix with permissions_obtained=True, but without that, it would exercise this.

@roedoejet
Copy link
Member Author

Bug: I deleted permissions_obtained: true from the config file the wizard wrote for me, and then ran everyvoice preprocess config/everyvoice-text-to-spec.yaml and it ran without error. Instead, it should have failed with the permission message.

everyvoice train text-to-spec config/everyvoice-text-to-spec.yaml also proceeded without telling me abot the missing permissions.

The concept is good and sufficient: there is no way for us to verify they have permission, all we can do is force the user to declare to the software that they do, and this PR (once the bugs above are fixed) does that.

Thanks for the quick review! I forgot to add validate_default=True so it wasn't actually validating the default argument (which is permissions_obtained=False) so someone would have actually had to set it to False for it to work. This is now fixed I believe. Thanks!!

@roedoejet roedoejet requested a review from joanise June 21, 2024 17:32
Copy link
Member

@joanise joanise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@joanise
Copy link
Member

joanise commented Jun 21, 2024

Hum, approved, but something unrelated seems to be failing in CI, it would be best to fix that right away, before merging if you can.

@roedoejet roedoejet merged commit c928f9b into main Jun 21, 2024
2 checks passed
@roedoejet roedoejet deleted the dev.ap/data-attestation branch June 21, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants