Dataset: Loading dataset page fails with exception if original file size is null, eg. some legacy Murray #7205

kcondon · 2020-08-18T16:56:55Z

[Leonid]
Updating the issue, since we had some incorrect assumptions back when it was opened (L.A.):

Yes, dataset page is bombing if the original size is null in the db. for one of the tab. files. (when it tries to calculate the total size for "download all").
This has nothing to do with whether it can access S3, or whether the actual saved original exists: there is NO attempt to check the size of the stored physical file at this point. A null in the db results in a null pointer. (The code that looks up the physical file, if the value is null and then saves the size in the db does exist; but only in the Acces API).
For most installations seeing this issue, this would be resolved by running /api/admin/datafiles/integrity/fixmissingoriginalsizes. We recommended (but not required!) to do this in 4.10 release; but some installations may have missed it.
The code that throws that null pointer in this case should of course be modified, as not to kill the page. (It should simply assume that null = 0).
Some ancient installations (like ours) may have legacy tab. files with no saved originals. In addition to addressing it with the fix in 4., we should probably also modify the fixmissingoriginalsizes API, not to leave any nulls in the db, but replace them with -1 in cases where the originals are missing. That's what I did in our prod. manually. (and then maybe we could remind people with pre-4.10 legacy installations to run that API)

All of the above should be very straightforward.

[Kevin] Original comment
It was discovered that loading the dataset page fails with an exception when original file size is null and cannot be calculated. This happened with a Murray dataset on a test instance. It can happen if 1. a legacy file does not have an original file 2. the file system is temporarily inaccessible and so the get size function fails.

It would seem that this should not be a fatal error in loading the dataset. After consulting with Dev, it seems that this behavior was added when the down all files in a dataset feature was added because it wants to alert you to the size of download.

From @qqmyers in #6343:

When Amazon had a recent S3 outage (DDOS attack), I noted that Dataverse can no longer display file pages when S3 is down. Previously, the only issue with S3 being out related to operations with file upload/download and the Dataverse site itself worked. This issue now is fairly trivial - it looks like the file size to be displayed on the page is being pulled from S3 metadata and hence, if S3 is down, the page breaks/shows a 500 error. I'm not sure its worth fixing, but avoiding 500 errors could be as simple as doing a null check on the file size. (In any case, this issue might help someone debug - 500 errors related to file sizes could be due to S3 being misconfigured/down,)

The text was updated successfully, but these errors were encountered:

landreev · 2020-08-18T17:01:52Z

The reason this is happening on the test box where the above was observed is that it does not have the credentials to access the production S3 bucket where the files are stored. So it can't look up the size of the stored original (which hasn't been cached in the db yet for some files in this dataset). The failure is a bit too fatal/catastrophic - but this would not be an issue on a properly configured system either.

kcondon · 2020-08-18T17:03:48Z

@landreev so the legacy aspect is definitely not an issue then? I can add the credentials to confirm.

landreev · 2020-08-18T17:22:09Z

Should be easier (definitely safer!) to simulate the "legacy" case with a QA S3 bucket (by deleting the saved original). I would be a little weary about configuring any test boxes with the prod. credentials.
I can run that test too.

qqmyers · 2020-08-18T18:17:24Z

duplicates #6343

landreev · 2020-08-18T19:44:31Z

Just making a reminder note to re-test the file page as well. Since #6343, now closed, seems to suggest that it was file.xhtml page that was throwing a 500 w/ no S3 access; or at least it says "file pages". Worth to double-check.

qqmyers · 2020-08-18T20:07:52Z

My guess on that - the file page tries to show the 'friendlysize' - if size is null and gets sent to FileSizeChecker.bytesToHumanReadable() that may trigger a 500 even though the file.xhtml page is trying to do an empty check (jsf:rendered="#{!(empty FilePage.file.friendlySize)}) - if it checked the raw size for the rendering test, it could work (and a similar test could fix the dataset page where friendlysizes are displayed in the filesFragment).

landreev · 2020-08-18T20:18:47Z

I am admittedly tired and stupid r/n, but I don't think that value, from which the "friendlysize" is calculated, should ever be coming from the S3 metadata... It should always be the value of "filesize" column from the DataFile db table - no?

(there may entirely be something else on that page that can be bombing w/ no storage access; not disputing that possibility... some direct compute url silliness? - anyway, we'll retest and figure it out).

My guess on that - the file page tries to show the 'friendlysize' - if size is null and gets sent to FileSizeChecker.bytesToHumanReadable() that may trigger a 500 even though the file.xhtml page is trying to do an empty check (jsf:rendered="#{!(empty FilePage.file.friendlySize)}) - if it checked the raw size for the rendering test, it could work (and a similar test could fix the dataset page where friendlysizes are displayed in the filesFragment).

qqmyers · 2020-08-18T20:43:33Z

Yes - there are two parts. The pages all have problems if the datafile table has a null size.

One way to get that is/was with an S3 direct upload if S3 is slow/flaky such that the upload succeeds but the subsequent call to S3 to get the metadata for the new object fails (us-east1 is only eventually consistent so under load (or DDOS) you may not get metadata to start, even though the upload itself succeeds (and you have a successful response from S3 to prove that)). I think since we saw this problem (and with Kevin's testing of very small files) we've added some delays and retries to make this less likely.

If there are any other current/historic ways to get a null filesize , they would have the same symptoms. (As with S3, temp files and file stores populate the datafile table once at Datafile creation so if there is a file system problem at that point I guess you could get a null. In any case, if it isn't/wasn't set at creation, a null filesize is then persistent until corrected.)

landreev · 2020-08-18T20:54:24Z

So we are talking about a situation where a (temporary) loss of access S3 during the creation/upload of the file left the files in a messed-up state (with null sizes in the datafile table - correct?
I.e., we are not talking about the page failing because it actually needs something from S3 in real time. I think I got it now.

It's a bit of a different situation from what this issue was opened for.
But we could address it the same way it's already addressed for these "saved original" sizes: if the value in the table is NULL, we could open StorageIO, look up the size, save it in the db, all in real time.

landreev · 2021-01-27T18:59:48Z

(I updated the main description of the issue, with the info we learned when helping UVA with this same issue recently)

#7205 fix dataset page load

djbrooke mentioned this issue Aug 18, 2020

S3 outage now breaks page due to file length #6343

Closed

djbrooke assigned landreev Jan 27, 2021

landreev changed the title ~~Dataset: Loading dataset page fails with exception if original file size is null for some reason and can't be calculated, eg. Murray~~ Dataset: Loading dataset page fails with exception if original file size is null, eg. some legacy Murray Jan 27, 2021

djbrooke added the Medium label Jan 27, 2021

djbrooke unassigned landreev Jan 27, 2021

sekmiller self-assigned this Feb 3, 2021

sekmiller added a commit that referenced this issue Feb 5, 2021

#7205 fix dataset page load

edee04e

sekmiller mentioned this issue Feb 5, 2021

#7205 fix dataset page load #7581

Merged

sekmiller removed their assignment Feb 5, 2021

sekmiller added a commit that referenced this issue Feb 5, 2021

#7205 suppress orig button if size 0 B

7527eb4

kcondon closed this as completed in #7581 Feb 5, 2021

kcondon added a commit that referenced this issue Feb 5, 2021

Merge pull request #7581 from IQSS/7205-orig-file-size-null

c9ccad1

#7205 fix dataset page load

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset: Loading dataset page fails with exception if original file size is null, eg. some legacy Murray #7205

Dataset: Loading dataset page fails with exception if original file size is null, eg. some legacy Murray #7205

kcondon commented Aug 18, 2020 •

edited by landreev

Loading

landreev commented Aug 18, 2020

kcondon commented Aug 18, 2020

landreev commented Aug 18, 2020

qqmyers commented Aug 18, 2020

landreev commented Aug 18, 2020

qqmyers commented Aug 18, 2020

landreev commented Aug 18, 2020

qqmyers commented Aug 18, 2020

landreev commented Aug 18, 2020 •

edited

Loading

landreev commented Jan 27, 2021

Dataset: Loading dataset page fails with exception if original file size is null, eg. some legacy Murray #7205

Dataset: Loading dataset page fails with exception if original file size is null, eg. some legacy Murray #7205

Comments

kcondon commented Aug 18, 2020 • edited by landreev Loading

landreev commented Aug 18, 2020

kcondon commented Aug 18, 2020

landreev commented Aug 18, 2020

qqmyers commented Aug 18, 2020

landreev commented Aug 18, 2020

qqmyers commented Aug 18, 2020

landreev commented Aug 18, 2020

qqmyers commented Aug 18, 2020

landreev commented Aug 18, 2020 • edited Loading

landreev commented Jan 27, 2021

kcondon commented Aug 18, 2020 •

edited by landreev

Loading

landreev commented Aug 18, 2020 •

edited

Loading