-
Notifications
You must be signed in to change notification settings - Fork 518
Add "download all" buttons (including size of dataset) to dataset page #7047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
|
||
public void validateAllFilesForDownloadArchival() { | ||
selectAllFiles(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this selectAllFiles()
is what's responsible for the odd UI behavior... when you select "Download" from the Access Dataset dropdown, you get the ZIP download, but now all the files are selected in the files table... that shouldn't be necessary... there has to be a way to get the ZIP without the unexpected UI behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding clearSelection()
to the end like this...
public void validateAllFilesForDownloadArchival() {
selectAllFiles();
boolean guestbookRequired = isDownloadPopupRequired();
boolean downloadOriginal = false;
validateFilesForDownload(guestbookRequired, downloadOriginal);
+ clearSelection();
}
... but now it thinks no files are selected:
So yes, I agree a fix would be nice but I'm not sure what that fix is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the fix be to use a different method than what the one that cares about selected files? i.e
change current method to get selected files and get a list, path that to new private method that takes core of current method but uses this new list.
Create a new method that generates a list and calls new that new private method.
You'd of course have to make sure all the checks for permissions are happening correctly, but I'd expect that to already be in the backend side (i.e. we already shouldn't be relying in the front end to only pass in the permissible values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scolapasta @mheppler please see 427c883. This pull request is now much bigger but it no longer uses selectedFiles
which is what the checkboxes are connected to.
GetDatasetStorageSizeCommand cmd = new GetDatasetStorageSizeCommand(dvRequestService.getDataverseRequest(), dataset, false, GetDatasetStorageSizeCommand.Mode.DOWNLOAD, workingVersion); | ||
try { | ||
long bytes = commandEngine.submit(cmd); | ||
return FileSizeChecker.bytesToHumanReadable(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in the download size displayed. My locally tested datasets are saying the download will be "300 B", but the resulting ZIP is 11 KB, or 11,000 B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this what I noted when saying "riddle me this" in the description of this pull request (with a screenshot), that there's a mismatch in size. GetDatasetStorageSizeCommand needs discussion in tech hours or #dv-tech (heads up @scolapasta ) or a smaller group. I haven't looked into the implementation but the thing I was wondering about is how we estimate the size of a zip. Text files compress really well. Video files don't. I doubt the number will ever be 100% accurate. But yes, it seems pretty far off right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss next tech hours, sure. But isn't the inaccuracy in the wrong direction?
i.e. the ip should be smaller than the size from GetDatasetStorageSizeCommand? I could imagine a scenario where it would be large (due to zip headers), but in that case, I would expect it to be close in size.
But maybe this is "close" in this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in tech hours, the problem seems to stem from double counting tabular files (counting both the size of the original file and the archival version). I just pushed a fix in 1957776.
Kicked the tires on the analytics a lil more. Built the branch, added style classes for One additional detail that will need to be tweaked in the analytics code is to resolve how the new Access Dataset > Download clicks are recorded in the Google Analytics reports. The way the code works, it pulls the link text to report what button was clicked, so now that the file format and size was added to the new download link text, it makes into the report, which is going to mean you can not group those events by category (see screenshot). |
We discussed and decided to not make the change below. While it would provide a path w/ one less click to a very common use case, it would take away from the scalability of the "Access Dataset" umbrella and would require a potential revisitation in the future.
|
@pdurbin Any resolution to the file size download all mismatch? I'm seeing the zip download is larger than the individual files. |
After rereading and discussing a bit with Leonid, the download size is the uncompressed size (size of the dataset), so issue 2 is a non issue.
Otherwise download all appears to work. Leonid made a suggestion below on calling out the uncompressed size. |
Yes on 2. - that's not really an issue; the size the user is seeing in the download dialog is the compressed size of the zip; we want to report the sizes of the actual files. As for 1., looking at the sizes in the pull down menu in the example above: |
Also, revert changes to GetDatasetStorageSizeCommand and DatasetServiceBean since we won't be using them.
In 7e79f1b I addressed the problem tabular download options showing for versions of datasets that no longer have tabular files (because they were deleted and then the dataset was republished). In 115353c I took another swing at getting GetDatasetStorageSizeCommand to behave properly and I thought I had it working but while testing I found further problems and weirdness so I ultimately decided to write a small command that does only the job I want. It seems to work fine so I switched over to it. See c13a5cc. I'm not doing any error checking within that small method (getDownloadSize in DatasetUtil) but I'm happy to add some if we're worried about nulls in the database for the size of files. I'm moving this to code review. @djbrooke and @TaniaSchlatter I noticed you're still assigned. I deployed the code here if you'd like to check it out but there are no UI changes: http://ec2-54-160-66-157.compute-1.amazonaws.com |
Told Phil that I trusted his work, but I still went and confirmed the download sizes that I had poked at early. Looking much, much better. Didn't test anything larger than the 2.1 MB download here, but it might be worth testing if a 210 MB download is off by 20 MB and determining if that is worth trying to fix. |
@mheppler @pdurbin The numbers above look right to me. What looks like a discrepancy - the fact that you're seeing "2,252,470 bytes (2.3 MB on disk)" in the Apple info box, but "2.1 MB" on the Dataverse side appears to be due to the fact that Apple is interpreting "1 MB" as "1,000,000 bytes"; and Dataverse is (more correctly!) using the "1MB = 1024 * 1024 bytes" definition. What we are doing is more correct. But what Apple is doing is somewhat OK too - both definitions are widely used (unfortunately). And real life users are used to seeing discrepancies between the 2 definitions by now. Can I move it to QA? |
@landreev One thing I learned from my networking class is the 1,000,000 bytes value is used to count network traffic and the 1024 style is for disk storage. I think. Will need to reference my class notes ;) |
@pdurbin @landreev @mheppler So, it looks pretty good -I'm fine with file sizes. However, I did notice one issue: OK, need to do a little more testing on the above since I had also done a replace and restrict file in same dataset and not seeing behavior on just removal of tab. So, kind of a weird issue where part of it preexists but is actually worse now. Maybe the same fix that was applied to the access download all would work here too? Update: This issue exists in 4.20 but is very narrow: requires a tabular file removed, download all original format, as restricted file with terms of use (not just terms of access) enabled. I think there may be a use case here where what we really need to do is confirm md5's or sha's of downloaded files are the same as what is shown on server to confirm downloads are intact. However, since that would need to be run on client side, exercise is up to user but maybe some messaging to suggest that at some point? Out of scope now, I'd imagine and maybe obvious to people who work in this space. |
@pdurbin @landreev @mheppler I've found a weird, preexisting bug with old download all that is technically out of scope for this issue but I'm wondering whether the same fix for hiding tab download options when tab deleted in current version might be easy to add to it? |
@kcondon What's "old download all"? Do you mean selecting all the checkboxes and clicking "download" in the upper right corner of the files table? |
@pdurbin @kcondon @mheppler Yeah, I can see the "original" and "tabular" options under the old style multi-download button. I assumed earlier, that the way it used to work, there was never a real "download all", i.e. it was always a download of the specified list of files from the list of the selected checkboxes... and that way the above condition would never happen, in a version that has no tabular files. But we must have added an optimization at some point - I now have some vague memories of it - for that "select all" checkbox, that actually bypasses the checkboxes and goes to the complete list of files when "all" is selected... So yes, it is probably the same logic error - searching for tabular files in the entire dataset; instead of the current version. And could probably be fixed just as easily. |
Oh, and here's the reason you get that weird API error: because one of the files is restricted, and there are 2 files total, in the end there is only 1 file in the list - and Dataverse is smart enough to then determine that this is NOT a multiple file download case; so it redirects it to the single file download API. apparently it is not smart enough to then drop the "format=original" parameter - and the single download API treats that as a fatal error, when a format that's not applicable to the current file is requested. |
@landreev @pdurbin @mheppler Leonid, thanks for confirming! I don't believe it is technically in scope but the question I was asking is whether it would be easy to adopt the solution for not showing download original/archive on versions that do not have tabular files, like Phil did already. If not, I'll just open a new issue. |
Going once, twice... |
What this PR does / why we need it:
This pull request adds "download all" buttons at the dataset level and shows the size of the dataset.
Which issue(s) this PR closes:
Special notes for your reviewer:
(this should probably be removed, since the issue has been resolved? - At least I'm going to mark it as resolved. -- L.A.)
Riddle me this... why do I get a different "size of dataset" number that what I see for the file? Here's a screenshot:
Also, there has been discussion of adding a "download all files" API but @djbrooke and I have decided to split this off into a separate chunk, represented by #4529.
Suggestions on how to test this:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
I don't see any mockups linked from #6118 but I believe they are our there somewhere. In addition to the screenshot above, here's one for a dataset with no tabular files:
Is there a release notes update needed for this change?:
I think there's already a release note about general dataset redesign improvements? I can add another note if we want.
Additional documentation:
I didn't touch the User Guide because my understanding is that we are going to update it all at once after the UI changes.