TODO job should just include ID #20

edsu · 2024-09-27T17:59:08Z

Blocked by #3

To make coordination easier the SQS message sent to the TODO queue should simply include an id and any options for controlling whisper, and not the list of files:

{
  "id": "abc123",
  "options": {
    "model": "large"
  }
}

If there are no files in the S3 bucket at s3://speech-to-text/media/abc123/ then an error message will be included when the job is put into the DONE queue.

The text was updated successfully, but these errors were encountered:

jmartin-sul · 2024-10-02T22:35:17Z

after some mostly inconclusive discussion between @edsu, @peetucket, and i the last couple days on whether to go forward with this or not, we think we've decided to close it for now?

in favor of closing:

no need to rework the file list logic that's already implemented
if file list logic changes, it'll be in common-accessioning, which more of the team is familiar with, and so it should be easier for more folks to deal with bugs or feature requests in common-accessioning than in the speech-to-text python code.
more stuff explicitly stated in job messages that we can look at, which might make debugging easier if it looks like the wrong files are getting processed
- related, didn't come up in discussion, but i just realized: if something doesn't get written to the bucket for processing, but should've and is included in the file list, we'll get an error instead of a silent skip. but i suspect that we'd get a loud error if we encountered any sort of typical upload failure to S3, since that's what we've seen in e.g. preservation, so this point may actually be moot 🤷

in favor of keeping open:

simpler job messages (but we don't expect to have huge file lists for STTing for any one object, so not sure this was a practical advantage)

but no one seemed to feel strongly on any of the above reasons, and this isn't a huge change, so it's easy to reopen and run with it if we later think of something compelling that hasn't occurred to us yet.

let me know if i got any of that wrong!

jmartin-sul added blocked and removed blocked labels Sep 27, 2024

edsu self-assigned this Oct 2, 2024

edsu mentioned this issue Oct 2, 2024

small readme and comment touchups #22

Merged

jmartin-sul closed this as completed Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO job should just include ID #20

TODO job should just include ID #20

edsu commented Sep 27, 2024 •

edited

Loading

jmartin-sul commented Oct 2, 2024

TODO job should just include ID #20

TODO job should just include ID #20

Comments

edsu commented Sep 27, 2024 • edited Loading

jmartin-sul commented Oct 2, 2024

edsu commented Sep 27, 2024 •

edited

Loading