Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcript command #2623

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Transcript command #2623

wants to merge 7 commits into from

Conversation

kernicPanel
Copy link
Member

Purpose

Adds a transcription command.

Default upload timestamp calculation is extracted to UploadableFileMixin
to avoid the use of get_source_s3_key to get it.
Renaming transcript util to ease patching.
As we want to transcript the whole catalog of videos, we need to add a
management command.
parser.add_argument("--video-id", type=str)

def handle(self, *args, **options):
"""Selects a video to transcript and starts the transcription job."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
"""Selects a video to transcript and starts the transcription job."""
"""Select a video to transcript and start the transcription job."""

Copy link
Member

@jbpenrath jbpenrath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno how you handle CHANGELOG update in this project, but is it normal that you did not add an entry ?

try:
video = Video.objects.get(id=video_id)
except Video.DoesNotExist:
self.stdout.write(f"No video matches the provided id: {video_id}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.stdout.write(f"No video matches the provided id: {video_id}")
self.stdout.write(f"No video matches with the provided id: {video_id}")

@kernicPanel
Copy link
Member Author

Dunno how you handle CHANGELOG update in this project, but is it normal that you did not add an entry ?

As the last one is Add peertube transcript generation, I think this PR is part of it.

def handle(self, *args, **options):
"""Selects a video to transcript and starts the transcription job."""
video_id = options["video_id"]
if video_id:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extract the content of the if and else in private method ?

if video_id:
try:
video = Video.objects.get(id=video_id)
except Video.DoesNotExist:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should catch it or not. IMO I would like to know that the video_id is not valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A message is displayed, and the script is stopped.

        except Video.DoesNotExist:
            self.stdout.write(f"No video found with id {video_id}")
            return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the video_id is invalid (not a uuid) another message will be displayed, and the script will be also stopped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but this command will be probably trigger by another one, especially if the video_id optional argument will be used. And we will have no idea that is has fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like nested command calls, especially in this case where the main logic only calls

from marsha.core.utils.transcript_utils import transcript
transcript(video)

We have to talk about whole catalog transcription.

@@ -64,6 +64,12 @@ def is_ready_to_show(self):
"""
return self.uploaded_on is not None

def uploaded_on_stamp(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests on this new method ?

And IMO you are changing the behavior already implemented. Never the timestamp is computed when the uploaded_on is not set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case of this commit do we compute a timestamp if updloaded_on doesn't exists ?
Here, we only extract to_timestamp(self.uploaded_on) in his own function.



@patch.object(transcript_video, "transcript")
class TranscriptVideoTestCase(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case is missing :

Should call transcript function if the video has a subtitle or caption but no transcript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants