-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport #3407 #3408
Backport #3407 #3408
Conversation
Let's be honest, there's no happy path here.
`AttachFileToWorkJob` shouldn't create a new `FileSet` for an `UploadedFile` that already has one. This avoids creating and attaching multiple `FileSet`s in the event that the job fails, rendering the job idempotent with respect to `FileSet` attachment. This is a quick fix for `AttachFileToWorksJob` idempotency; connected to #801.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇 for making a patch for this so quickly after it was reraised as an issue!
I know this was already merged into master but I have a question/request.
actor = Hyrax::Actors::FileSetActor.new(FileSet.create, user) | ||
uploaded_file.update(file_set_uri: actor.file_set.uri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following why this line was moved up. It looks to me like the uploaded_file
's file_set_uri
is saved eagerly before it is actually added to the file_set
. This appears to leave the chance that if something failed during the attaching that it would never be attached during a retry because it would be skipped. Could you please add a test that ensures this line is at the right place in the execution flow or at least a comment before this line as to why it is important to be where it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FileSet.create
call on line 16 determines the FileSet
's id/URI.
Moving this line directly below it reduces the chance that an incidental failure in the rest of the creation/attachment process will result in a duplicate, by pairing the two as early as possible.
This does introduce the possibility that the FileSet will be created, only to have some failure occurs during the permission, metadata, and content creation steps preventing attachment. This is why I've called this a "quick fix". Ideally, repeated runs of this job after an uncaught error will not only be idempotent, but guarantee eventual attachment of all the requested files.
I think this is better than the status quo, though. 🤖
Other ideas and contributions are welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that we're choosing between two scenarios:
- Without this change:
This background job fails then retries creating (and potentially attaching) filesets until it hits the max number of retries. - With the change:
This background job fails then retries and succeeds with having created (but potentially not attached) the fileset.
Scenario 1 would give us indication that the job failed whereas scenario 2 wouldn't. Scenario 1 would create multiiple filesets which may be orphaned if they didn't get attached while scenario 2 would create one fileset which may be orphaned. So I'm not really sure which is better or worse. I think my biggest fear is that filesets wouldn't get attached and there wouldn't be any indication since the job would have succeeded on the 2nd run, but I'm not sure of any better way to catch or handle this so ¯_(ツ)_/¯.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scenario 1 would give us indication that the job failed whereas scenario 2 wouldn't.
is this true? in both cases the jobs will raise an uncaught error and retry until they succeed. Scenario 1
may do this indefinitely (depending on QueueAdapter
behavior); Scenario 2
promises to do it maximally once per UploadedFile
, without side effects.
In either case, the error will be reported as a failed job in whatever way the queuing system does that, logged, and captured by honeybadger
or similar (if present).
Both approaches seem to present challenges for remediation, but the addition of the relationship between the UploadedFile
(which is easily identified from the job params) and the FileSet
seems likely to improve things on this front; especially combined with the removal of side effects on retries.
More importantly, I think this fix will be short-lived in production. My expectation is that we can complete the work here by replacing next
with appropriate logic for repairing failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i understand correctly, the actual behavior people are seeing is that the file attachment does eventually succeed, and the outcome is something of a hybrid of the worst properties of both scenarios:
- multiple
FileSet
s get created (and at least sometimes attached) - the job succeeds and falls out of the retry queue, and no one is really equipped to notice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was thinking that it didn't succeed and hit the max retries. I guess either way one needs to be vigilant about job failures (even if the retry succeeded) which is probably a good thing to do regardless. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
AttachFileToWorkJob shouldn't create a new FileSet for an UploadedFile
that already has one. This avoids creating and attaching multiple FileSets in
the event that the job fails, rendering the job idempotent with respect to
FileSet attachment.
This is a quick fix for AttachFileToWorksJob idempotency; connected to #801.
@samvera/hyrax-code-reviewers