Add "download all" buttons (including size of dataset) to dataset page #7047

pdurbin · 2020-07-01T19:58:22Z

What this PR does / why we need it:

This pull request adds "download all" buttons at the dataset level and shows the size of the dataset.

Which issue(s) this PR closes:

Closes Make download all files feature more prominent #6118 Make download all files feature more prominent
Closes Display the total size of a dataset on the dataset page #6400 Display the total size of a dataset on the dataset page

Special notes for your reviewer:

(this should probably be removed, since the issue has been resolved? - At least I'm going to mark it as resolved. -- L.A.)
Riddle me this... why do I get a different "size of dataset" number that what I see for the file? Here's a screenshot:

Also, there has been discussion of adding a "download all files" API but @djbrooke and I have decided to split this off into a separate chunk, represented by #4529.

Suggestions on how to test this:

Test both tabular and non tabular files.
Test datasets with and without terms of use.
Test datasets with and without a guestbook.
Test restricted files.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

I don't see any mockups linked from #6118 but I believe they are our there somewhere. In addition to the screenshot above, here's one for a dataset with no tabular files:

Is there a release notes update needed for this change?:

I think there's already a release note about general dataset redesign improvements? I can add another note if we want.

Additional documentation:

I didn't touch the User Guide because my understanding is that we are going to update it all at once after the UI changes.

coveralls · 2020-07-01T20:10:51Z

Coverage decreased (-0.04%) to 19.61% when pulling 174ec43 on 6118-download into 4eda5a1 on develop.

src/main/webapp/dataset.xhtml

mheppler · 2020-07-01T23:29:48Z

src/main/java/edu/harvard/iq/dataverse/DatasetPage.java

+    }
+
+    public void validateAllFilesForDownloadArchival() {
+        selectAllFiles();


I assume this selectAllFiles() is what's responsible for the odd UI behavior... when you select "Download" from the Access Dataset dropdown, you get the ZIP download, but now all the files are selected in the files table... that shouldn't be necessary... there has to be a way to get the ZIP without the unexpected UI behavior.

I tried adding clearSelection() to the end like this...

public void validateAllFilesForDownloadArchival() { selectAllFiles(); boolean guestbookRequired = isDownloadPopupRequired(); boolean downloadOriginal = false; validateFilesForDownload(guestbookRequired, downloadOriginal); + clearSelection(); }

... but now it thinks no files are selected:

So yes, I agree a fix would be nice but I'm not sure what that fix is.

Wouldn't the fix be to use a different method than what the one that cares about selected files? i.e

change current method to get selected files and get a list, path that to new private method that takes core of current method but uses this new list.

Create a new method that generates a list and calls new that new private method.

You'd of course have to make sure all the checks for permissions are happening correctly, but I'd expect that to already be in the backend side (i.e. we already shouldn't be relying in the front end to only pass in the permissible values)

@scolapasta @mheppler please see 427c883. This pull request is now much bigger but it no longer uses selectedFiles which is what the checkboxes are connected to.

mheppler · 2020-07-01T23:32:29Z

src/main/java/edu/harvard/iq/dataverse/DatasetPage.java

+        GetDatasetStorageSizeCommand cmd = new GetDatasetStorageSizeCommand(dvRequestService.getDataverseRequest(), dataset, false, GetDatasetStorageSizeCommand.Mode.DOWNLOAD, workingVersion);
+        try {
+            long bytes = commandEngine.submit(cmd);
+            return FileSizeChecker.bytesToHumanReadable(bytes);


Bug in the download size displayed. My locally tested datasets are saying the download will be "300 B", but the resulting ZIP is 11 KB, or 11,000 B.

Yes, this what I noted when saying "riddle me this" in the description of this pull request (with a screenshot), that there's a mismatch in size. GetDatasetStorageSizeCommand needs discussion in tech hours or #dv-tech (heads up @scolapasta ) or a smaller group. I haven't looked into the implementation but the thing I was wondering about is how we estimate the size of a zip. Text files compress really well. Video files don't. I doubt the number will ever be 100% accurate. But yes, it seems pretty far off right now.

We can discuss next tech hours, sure. But isn't the inaccuracy in the wrong direction?

i.e. the ip should be smaller than the size from GetDatasetStorageSizeCommand? I could imagine a scenario where it would be large (due to zip headers), but in that case, I would expect it to be close in size.

But maybe this is "close" in this scenario.

As discussed in tech hours, the problem seems to stem from double counting tabular files (counting both the size of the original file and the archival version). I just pushed a fix in 1957776.

mheppler · 2020-07-02T03:53:07Z

Kicked the tires on the analytics a lil more. Built the branch, added style classes for btn-download, and confirmed that worked. Also realized that the explore like was missing the btn-explore style class, so the latest from develop needs to be merged as that style class was added in the recently merged 6938 analytics new buttons #7008 PR.

One additional detail that will need to be tweaked in the analytics code is to resolve how the new Access Dataset > Download clicks are recorded in the Google Analytics reports. The way the code works, it pulls the link text to report what button was clicked, so now that the file format and size was added to the new download link text, it makes into the report, which is going to mean you can not group those events by category (see screenshot).

pdurbin · 2020-07-07T21:19:45Z

I'm ready for more code review. I addresses the checkbox issue and am now differentiating between sizes for original files vs archival. Here's a screenshot:

djbrooke · 2020-07-13T15:44:36Z

We discussed and decided to not make the change below. While it would provide a path w/ one less click to a very common use case, it would take away from the scalability of the "Access Dataset" umbrella and would require a potential revisitation in the future.

~~One more change to make here (it was added to another issue by mistake).~~

kcondon · 2020-07-13T20:31:09Z

@pdurbin Any resolution to the file size download all mismatch? I'm seeing the zip download is larger than the individual files.

pdurbin · 2020-07-13T20:38:04Z

@kcondon I thought the size estimate was better as of 1957776 (before that, tabular files were being counted twice). Can you please post a screenshot of the problem you're seeing?

kcondon · 2020-07-13T20:40:49Z

After rereading and discussing a bit with Leonid, the download size is the uncompressed size (size of the dataset), so issue 2 is a non issue.

the download size for both tab and original are off but the tab is off by a lot.
~~2. the actual size of the zip file is considerably smaller than what's presented in the download, so I am guessing it refers to the uncompressed size?~~
did find that for displaying original/archive options, it appears to check the dataset level for tabular rather than the version you are on since it will also display the option when the current version does not contain tabular file but was deleted and therefore exists in a prior version.

Otherwise download all appears to work. Leonid made a suggestion below on calling out the uncompressed size.

kcondon · 2020-07-13T20:41:17Z

kcondon · 2020-07-13T20:42:55Z

This is a screen shot using FF, saving the zip to windows, downloading the original format:

This is a screen shot using FF, saving zip to windows, downloading the archival format:

landreev · 2020-07-13T22:13:21Z

So there are at least two issues:

1. the download size for both tab and original are off but the tab is off by a lot.

2. the actual size of the zip file is considerably smaller than what's presented in the download, so I am guessing it refers to the uncompressed size?

Otherwise download all appears to work

Yes on 2. - that's not really an issue; the size the user is seeing in the download dialog is the compressed size of the zip; we want to report the sizes of the actual files.

As for 1., looking at the sizes in the pull down menu in the example above:
the size for the "original" format appears to be correct (62.3KB); but for the "tabular" it does appear to be off - actually, it looks like it's off by exactly the size of the original; i.e., it keeps counting both format sizes, instead of counting just the tabular size.

Also, revert changes to GetDatasetStorageSizeCommand and DatasetServiceBean since we won't be using them.

pdurbin · 2020-07-16T20:38:51Z

In 7e79f1b I addressed the problem tabular download options showing for versions of datasets that no longer have tabular files (because they were deleted and then the dataset was republished).

In 115353c I took another swing at getting GetDatasetStorageSizeCommand to behave properly and I thought I had it working but while testing I found further problems and weirdness so I ultimately decided to write a small command that does only the job I want. It seems to work fine so I switched over to it. See c13a5cc. I'm not doing any error checking within that small method (getDownloadSize in DatasetUtil) but I'm happy to add some if we're worried about nulls in the database for the size of files.

I'm moving this to code review.

@djbrooke and @TaniaSchlatter I noticed you're still assigned. I deployed the code here if you'd like to check it out but there are no UI changes: http://ec2-54-160-66-157.compute-1.amazonaws.com

mheppler · 2020-07-16T21:04:08Z

Told Phil that I trusted his work, but I still went and confirmed the download sizes that I had poked at early. Looking much, much better. Didn't test anything larger than the 2.1 MB download here, but it might be worth testing if a 210 MB download is off by 20 MB and determining if that is worth trying to fix.

landreev · 2020-07-20T14:49:23Z

@mheppler @pdurbin The numbers above look right to me. What looks like a discrepancy - the fact that you're seeing "2,252,470 bytes (2.3 MB on disk)" in the Apple info box, but "2.1 MB" on the Dataverse side appears to be due to the fact that Apple is interpreting "1 MB" as "1,000,000 bytes"; and Dataverse is (more correctly!) using the "1MB = 1024 * 1024 bytes" definition. What we are doing is more correct. But what Apple is doing is somewhat OK too - both definitions are widely used (unfortunately). And real life users are used to seeing discrepancies between the 2 definitions by now.

Can I move it to QA?

kcondon · 2020-07-20T15:54:47Z

@landreev One thing I learned from my networking class is the 1,000,000 bytes value is used to count network traffic and the 1024 style is for disk storage. I think. Will need to reference my class notes ;)

mheppler · 2020-07-20T16:05:45Z

@landreev @kcondon if that difference between the two math formulas is true, then I suspect we want to use the storage version of the number to display here in the UI for the end user.

kcondon · 2020-07-20T16:17:16Z

@pdurbin @landreev @mheppler So, it looks pretty good -I'm fine with file sizes. However, I did notice one issue:
the old style download all button still presents orig and archival formats when tab file is removed in the current version. While testing shows this exists in 4.20, what is different is when you choose download original, it throws an error on download:
{"status":"ERROR","code":404,"message":"'/api/v1/access/datafile/53792' datafile access error: requested optional service (image scaling, format conversion, etc.) is not supported on this datafile."}

OK, need to do a little more testing on the above since I had also done a replace and restrict file in same dataset and not seeing behavior on just removal of tab.

So, kind of a weird issue where part of it preexists but is actually worse now. Maybe the same fix that was applied to the access download all would work here too?

Update: This issue exists in 4.20 but is very narrow: requires a tabular file removed, download all original format, as restricted file with terms of use (not just terms of access) enabled.

I think there may be a use case here where what we really need to do is confirm md5's or sha's of downloaded files are the same as what is shown on server to confirm downloads are intact. However, since that would need to be run on client side, exercise is up to user but maybe some messaging to suggest that at some point? Out of scope now, I'd imagine and maybe obvious to people who work in this space.

kcondon · 2020-07-20T17:37:20Z

@pdurbin @landreev @mheppler I've found a weird, preexisting bug with old download all that is technically out of scope for this issue but I'm wondering whether the same fix for hiding tab download options when tab deleted in current version might be easy to add to it?
The bug is when you have deleted tab file in existing version, have a restricted file with terms of use, not just terms of access enabled, do not yet have access to restricted file, and choose download all but original version only. Kind of narrow but throws ugly white error screen and nothing is downloaded: {"status":"ERROR","code":404,"message":"'/api/v1/access/datafile/53792' datafile access error: requested optional service (image scaling, format conversion, etc.) is not supported on this datafile."}

landreev · 2020-07-20T17:52:52Z

@kcondon What's "old download all"? Do you mean selecting all the checkboxes and clicking "download" in the upper right corner of the files table?
I'm quite confused by the condition you are describing. If it's something pre-existing, I'm surprised nobody has noticed before.

landreev · 2020-07-20T18:22:42Z

@pdurbin @kcondon @mheppler
I'm assuming this is the test dataset in question: https://dataverse-internal.iq.harvard.edu/dataset.xhtml?persistentId=doi%3A10.70122%2FFK2%2FSJSEZX&version=3.1

Yeah, I can see the "original" and "tabular" options under the old style multi-download button. I assumed earlier, that the way it used to work, there was never a real "download all", i.e. it was always a download of the specified list of files from the list of the selected checkboxes... and that way the above condition would never happen, in a version that has no tabular files. But we must have added an optimization at some point - I now have some vague memories of it - for that "select all" checkbox, that actually bypasses the checkboxes and goes to the complete list of files when "all" is selected... So yes, it is probably the same logic error - searching for tabular files in the entire dataset; instead of the current version. And could probably be fixed just as easily.
Whether it should be fixed as part of this PR, seeing how it is pre-existing and in a different code section - I don't have a strong opinion.

landreev · 2020-07-20T18:23:28Z

@pdurbin @kcondon @mheppler

Oh, and here's the reason you get that weird API error: because one of the files is restricted, and there are 2 files total, in the end there is only 1 file in the list - and Dataverse is smart enough to then determine that this is NOT a multiple file download case; so it redirects it to the single file download API. apparently it is not smart enough to then drop the "format=original" parameter - and the single download API treats that as a fatal error, when a format that's not applicable to the current file is requested.
This is annoying - but is indeed a fairly edge case....

kcondon · 2020-07-20T18:28:46Z

@landreev @pdurbin @mheppler Leonid, thanks for confirming! I don't believe it is technically in scope but the question I was asking is whether it would be easy to adopt the solution for not showing download original/archive on versions that do not have tabular files, like Phil did already. If not, I'll just open a new issue.

landreev · 2020-07-20T18:57:43Z

As I just mentioned to @kcondon and @djbrooke , I'm totally fine with merging this PR; and then using #6972 to fix the pre-existing conditions described above. Any objections?

kcondon · 2020-07-20T18:58:12Z

Going once, twice...

pdurbin added 2 commits June 30, 2020 13:12

add download all buttons under access button #6118

c1a6126

add size to "download all" link #6118

3f4ac7f

mheppler reviewed Jul 1, 2020

View reviewed changes

src/main/webapp/dataset.xhtml Outdated Show resolved Hide resolved

mheppler reviewed Jul 1, 2020

View reviewed changes

src/main/webapp/dataset.xhtml Outdated Show resolved Hide resolved

mheppler reviewed Jul 1, 2020

View reviewed changes

mheppler assigned pdurbin Jul 1, 2020

pdurbin added 2 commits July 2, 2020 11:27

add btn-download style class for analytics #6118

054f249

move ZIP to bundle #6118

140ba41

djbrooke added this to the Dataverse 5 milestone Jul 7, 2020

pdurbin added 2 commits July 7, 2020 14:35

stop using selectedFiles field #6118

427c883

don't double count tabular files in size #6118

1957776

pdurbin removed their assignment Jul 7, 2020

landreev self-requested a review July 8, 2020 17:58

landreev self-assigned this Jul 8, 2020

landreev approved these changes Jul 13, 2020

View reviewed changes

djbrooke unassigned landreev Jul 13, 2020

kcondon self-assigned this Jul 13, 2020

kcondon assigned pdurbin and unassigned kcondon Jul 13, 2020

djbrooke assigned TaniaSchlatter and djbrooke and unassigned djbrooke Jul 14, 2020

mheppler removed their assignment Jul 16, 2020

pdurbin self-assigned this Jul 16, 2020

pdurbin added 4 commits July 16, 2020 14:05

prevent orig file size from being added to tabular files #6118

115353c

use version to determine if tabular download #6118

7e79f1b

switch to smaller, simpler download size method #6118

c13a5cc

Also, revert changes to GetDatasetStorageSizeCommand and DatasetServiceBean since we won't be using them.

remove unused imports #6118

174ec43

pdurbin removed their assignment Jul 16, 2020

djbrooke removed their assignment Jul 20, 2020

djbrooke unassigned TaniaSchlatter Jul 20, 2020

kcondon self-assigned this Jul 20, 2020

kcondon removed their assignment Jul 20, 2020

kcondon merged commit d3f1f6d into develop Jul 20, 2020

kcondon deleted the 6118-download branch July 20, 2020 18:59

landreev mentioned this pull request Jul 20, 2020

Download: 404 error for pdf downloads in projects with ingested tabular files #6972

Closed

Add "download all" buttons (including size of dataset) to dataset page #7047

Add "download all" buttons (including size of dataset) to dataset page #7047

Uh oh!

Conversation

pdurbin commented Jul 1, 2020 • edited by landreev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jul 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mheppler commented Jul 2, 2020

Uh oh!

pdurbin commented Jul 7, 2020

Uh oh!

djbrooke commented Jul 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kcondon commented Jul 13, 2020

Uh oh!

pdurbin commented Jul 13, 2020

Uh oh!

kcondon commented Jul 13, 2020 • edited by djbrooke Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kcondon commented Jul 13, 2020

Uh oh!

kcondon commented Jul 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

landreev commented Jul 13, 2020

Uh oh!

pdurbin commented Jul 16, 2020

Uh oh!

mheppler commented Jul 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

landreev commented Jul 20, 2020

Uh oh!

kcondon commented Jul 20, 2020

Uh oh!

mheppler commented Jul 20, 2020

Uh oh!

kcondon commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kcondon commented Jul 20, 2020

Uh oh!

landreev commented Jul 20, 2020

Uh oh!

landreev commented Jul 20, 2020

Uh oh!

landreev commented Jul 20, 2020

Uh oh!

kcondon commented Jul 20, 2020

Uh oh!

landreev commented Jul 20, 2020

Uh oh!

kcondon commented Jul 20, 2020

Uh oh!

Uh oh!

pdurbin commented Jul 1, 2020 •

edited by landreev

Loading

coveralls commented Jul 1, 2020 •

edited

Loading

djbrooke commented Jul 13, 2020 •

edited

Loading

kcondon commented Jul 13, 2020 •

edited by djbrooke

Loading

kcondon commented Jul 13, 2020 •

edited

Loading

mheppler commented Jul 16, 2020 •

edited

Loading

kcondon commented Jul 20, 2020 •

edited

Loading