You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This almost certainly affects all versions of Hyrax. For sure from 2.9.0 to 3.4.1 (and main branch).
Rationale
Rolling back to an older version (a.k.a. revision) of a FileSet is the only place to call CharacterizeJob or CreateDerivativesJob without a filepath, meaning it's the only place (outside of one occurrence in rake tasks) that causes Hyrax::WorkingDirectory to pull a copy of the file from the repository to a NOID-based-pairtree folder inside working_path.
aside: The whole WorkingDirectory thing was slated for removal in a TODO left on this PR. It says to use JobIOWrapper instead. I may spin off another ticket for that after this as it may be sort of forgotten at this point.
So the WorkingDirectory.find_or_retrieve() method relies on the filename to decide whether the version that has just been "rolled back to" is the one that's already cached in said directory. Any old version that may be cached in the working_path will be used if the name matches the current version's original_name. These are never cleared out by the system itself. Admittedly we do delete uploaded files periodically in heliotrope, and perhaps this is recommended in a Hyrax setup Wiki somewhere. Not sure.
It may seem unlikely that two rollbacks would occur through the UI where both versions have the same filename. But of course it's very likely that the same name needs to be used, if it's pertinent to the content (some sort of ID, or in our case a book ISBN). And, as mentioned, other calls that might be made to CharacterizeJob or CreateDerivativesJob from elsewhere, with no filepath parameter will cause this problem too. Like a dev working in the console or triggering a rake task. The task linked above would cache a working_path file for every FileSet in the system.
Expected behavior
Nothing in the UI should cause CharacterizeJob or CreateDerivativesJob to run on a file that is not the FileSet's current version.
Actual behavior
CharacterizeJob or CreateDerivativesJob will run on a file that is not the FileSet's current version if you ever roll back to different versions with the same name.
Steps to reproduce the behavior
Upload an image FileSet to a Work. Let the jobs finish and note the thumbnail, file size and checksum in the UI
Upload a new version to the FileSet. Something with the same filename but a different image and size. Again, note the characterization metadata in the UI.
In the versions tab, revert the FileSet to the first version. Allow jobs to finish. The thumbnail and metadata will be correct. Note that this is where the working_directory copy was made.
Now revert to the second version. Allow jobs to finish. The thumbnail and characterization is done on the wrong file, the one cached to disk in step 3.
Related work
TODO
The text was updated successfully, but these errors were encountered:
Descriptive summary
This almost certainly affects all versions of Hyrax. For sure from 2.9.0 to 3.4.1 (and main branch).
Rationale
Rolling back to an older version (a.k.a. revision) of a FileSet is the only place to call
CharacterizeJob
orCreateDerivativesJob
without afilepath
, meaning it's the only place (outside of one occurrence in rake tasks) that causes Hyrax::WorkingDirectory to pull a copy of the file from the repository to a NOID-based-pairtree folder inside working_path.aside: The whole
WorkingDirectory
thing was slated for removal in a TODO left on this PR. It says to use JobIOWrapper instead. I may spin off another ticket for that after this as it may be sort of forgotten at this point.So the WorkingDirectory.find_or_retrieve() method relies on the filename to decide whether the version that has just been "rolled back to" is the one that's already cached in said directory. Any old version that may be cached in the
working_path
will be used if the name matches the current version'soriginal_name
. These are never cleared out by the system itself. Admittedly we do delete uploaded files periodically in heliotrope, and perhaps this is recommended in a Hyrax setup Wiki somewhere. Not sure.It may seem unlikely that two rollbacks would occur through the UI where both versions have the same filename. But of course it's very likely that the same name needs to be used, if it's pertinent to the content (some sort of ID, or in our case a book ISBN). And, as mentioned, other calls that might be made to
CharacterizeJob
orCreateDerivativesJob
from elsewhere, with nofilepath
parameter will cause this problem too. Like a dev working in the console or triggering a rake task. The task linked above would cache aworking_path
file for every FileSet in the system.Expected behavior
Nothing in the UI should cause
CharacterizeJob
orCreateDerivativesJob
to run on a file that is not the FileSet's current version.Actual behavior
CharacterizeJob
orCreateDerivativesJob
will run on a file that is not the FileSet's current version if you ever roll back to different versions with the same name.Steps to reproduce the behavior
Related work
TODO
The text was updated successfully, but these errors were encountered: