Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Filetypes #21

Closed
djbrooke opened this issue Jun 11, 2019 · 7 comments
Closed

Update Filetypes #21

djbrooke opened this issue Jun 11, 2019 · 7 comments
Assignees

Comments

@djbrooke
Copy link
Contributor

IQSS/dataverse#2202 is merged and will be included in the next release (4.15). Once this is deployed to dataverse.harvard.edu, we should use these newfound powers to update filetypes in prod.

@djbrooke
Copy link
Contributor Author

When we run this, we expect this to resolve https://github.com/IQSS/dataverse/issues/4156

@djbrooke djbrooke self-assigned this Jun 26, 2019
@djbrooke
Copy link
Contributor Author

  • run the API and expect to see a reduction in the count of unknown files
  • spot check on the output to see if it's what we expect

@djbrooke djbrooke removed their assignment Jun 26, 2019
@landreev landreev self-assigned this Jul 18, 2019
@landreev
Copy link
Collaborator

"Unknown" is no longer the most common type in the facet:

Screen Shot 2019-07-22 at 11 16 50 AM

@landreev
Copy link
Collaborator

When we run this, we expect this to resolve https://github.com/IQSS/dataverse/issues/4156

Correct, they will be mime-typed "application/x-h5" that will be displayed as "Hierarchical Data Format" in the UI.
(most of the affected files are in the next batch to go through the detector api)

@landreev
Copy link
Collaborator

Everybody (but especially @mheppler) is invited to watch the number in the "Unknown" facet gradually drop in real time in the next hour and a half or so.

@landreev
Copy link
Collaborator

No longer in the second place either... How will it end?? Stay tuned to find out!
Screen Shot 2019-07-22 at 4 08 09 PM

@landreev
Copy link
Collaborator

This is the final prod. file type facet in the default "top 5" view:
Screen Shot 2019-07-22 at 4 52 02 PM
The "unknown" category is in the 6th place, with 52K indexed files now.

A very large portion of this change in the facet values is on account of A HUGE NUMBERS of .dcm files in a few large datasets; they were previously unrecognized, and now we type them by the extension as "application/dicom" and count as image files. The "display type" for this mime type is "DICOM Image". But all sorts of other types of files have been identified and reclassified as part of this effort too.

Dragging this into QA - since all the code behind this had been reviewed in the process of closing the dev. issue.

@landreev landreev removed their assignment Jul 22, 2019
@kcondon kcondon self-assigned this Jul 22, 2019
@kcondon kcondon closed this as completed Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants