-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restricted summary stats #7619 #7642
Conversation
- At the dataset level, DDI exports no longer show "dataDscr" information for restricted files. There is only one version of this export and it is the version that's suitable for public consumption with the "dataDscr" information hidden for restricted files. - Similarly, at the dataset level, the DDI HTML Codebook no longer shows "dataDscr" information for restricted files. - At the file level, "dataDscr" information is no longer publicly available for restricted files. In practice, it was only possible to get this publicly via API (the download/access button was hidden). - At the file level, "dataDscr" (variable metadata) information can still be downloaded for restricted files if you have access to download the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Once thing that is worth discussing - I had orginally assumed we were hiding summary stats, but it looks like it's the entire data description section, including variable names/labels. However we still index them and make them searchable. Should we be hiding those in the index similarly? Or should we consider leaving them here and just removing summary stats? (if the former, should we consider in scope or out of scope)?
src/main/java/edu/harvard/iq/dataverse/api/errorhandlers/WebApplicationExceptionHandler.java
Outdated
Show resolved
Hide resolved
Good catch. Here's the data I'm testing with (these are my pets, yes my cats are huge)...
... and variable names like "species" are still searchable: |
Thanks @pdurbin and @scolapasta. It's OK if we keep these in the index for now. We'll possibly need to revisit this as we support more sensitive data, but for the OpenDP MVP this will support the use case of not spoiling privacy (and making the tool useful in the first place :)). I'll move this to QA since it looks like the other feedback in the issue (regarding the move to bundle) is done. |
@pdurbin Things I've noticed:
Otherwise it appears to work as described. I can try testing 1. again. |
What this PR does / why we need it:
In the future we want to offer differentially private summary statistics for restricted data which means that restricted data should not present full summary statistics (and related information). This pull request corrects this.
Which issue(s) this PR closes:
Closes #7619
Special notes for your reviewer:
Not particularly. See the release note and the code. I'm weirded out by the existing BadRequestException error handling in Access.java and WebApplicationExceptionHandler.java. I just accepted the pattern of putting a string in the former and the real message in the latter.
Suggestions on how to test this:
Please see the release note, especially:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
The DDI HTML Codebook will no longer show for restricted files summary statistics, names of variables, or descriptions of variables. Instead, you will only see the count of rows and columns, type of file, and the UNF like this:
Is there a release notes update needed for this change?:
Yes, included.
Additional documentation:
Documentation was actually removed because we used to warn people (pull request #6620) about summary statistics being publicly available for restricted data. Now we don't need to.