-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added agent to get video transcripts. #72
base: main
Are you sure you want to change the base?
Changes from 6 commits
d90ca52
28d1248
2555fab
a720e6f
91137b6
e9c02f6
4a76907
7f87786
df887b4
ae236a0
4e486b8
8211670
a4bd25e
80df72f
f28a1a8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
import logging | ||
from director.agents.base import BaseAgent, AgentResponse, AgentStatus | ||
from director.core.session import TextContent, MsgStatus | ||
from director.tools.videodb_tool import VideoDBTool | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
class TranscriptionAgent(BaseAgent): | ||
def __init__(self, session=None, **kwargs): | ||
self.agent_name = "video_transcription" | ||
self.description = ( | ||
"This is an agent to get transcripts of videos" | ||
) | ||
self.parameters = self.get_parameters() | ||
super().__init__(session=session, **kwargs) | ||
|
||
def run(self, collection_id: str, video_id: str, timestamp_mode: bool = False, time_range: int = 2) -> AgentResponse: | ||
""" | ||
Transcribe a video and optionally format it with timestamps. | ||
|
||
:param str collection_id: The collection_id where given video_id is available. | ||
:param str video_id: The id of the video for which the transcription is required. | ||
:param bool timestamp_mode: Whether to include timestamps in the transcript. | ||
:param int time_range: Time range for grouping transcripts in minutes (default: 2 minutes). | ||
:return: AgentResponse with the transcription result. | ||
:rtype: AgentResponse | ||
""" | ||
self.output_message.actions.append("Trying to get the video transcription...") | ||
output_text_content = TextContent( | ||
agent_name=self.agent_name, | ||
status_message="Processing the transcription...", | ||
) | ||
self.output_message.content.append(output_text_content) | ||
self.output_message.push_update() | ||
|
||
videodb_tool = VideoDBTool(collection_id=collection_id) | ||
|
||
try: | ||
transcript_text = videodb_tool.get_transcript(video_id) | ||
except Exception: | ||
logger.error("Transcript not found. Indexing spoken words...") | ||
self.output_message.actions.append("Indexing spoken words...") | ||
self.output_message.push_update() | ||
videodb_tool.index_spoken_words(video_id) | ||
transcript_text = videodb_tool.get_transcript(video_id) | ||
ankit-v2-3 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if timestamp_mode: | ||
self.output_message.actions.append("Formatting transcript with timestamps...") | ||
grouped_transcript = self._group_transcript_with_timestamps( | ||
transcript_text, time_range | ||
) | ||
output_text = grouped_transcript | ||
else: | ||
output_text = transcript_text | ||
|
||
output_text_content.text = output_text | ||
output_text_content.status = MsgStatus.success | ||
output_text_content.status_message = "Transcription completed successfully." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Message like "Here is your transcription" would be better since we are using that for title of the trascription. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense. I’ve corrected the changes in the recent commit. |
||
self.output_message.publish() | ||
|
||
return AgentResponse( | ||
status=AgentStatus.SUCCESS, | ||
message="Transcription successful.", | ||
data={"video_id": video_id, "transcript": output_text}, | ||
) | ||
|
||
def _group_transcript_with_timestamps(self, transcript_text: str, time_range: int) -> str: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The grouping logic is not correct. If you will test it you will get only one block like this: Reason: There are no new lines in transcription text, and even if they were new line representing the given range (2 minutes in case of default) is wrong. Correct way would be to use the transcription dictionary that VideoDB tool is sending it has timing information unlike the transcription text that is being used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ashish-spext I’m having a hard time understanding the structure of the transcription dictionary returned by the VideoDB tool. I think the output of the However, I’m unable to see any changes I’ve made or test the app due to API key limitations: can you please provide more information about how timing information is stored in transcription dictionary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This issue can be resolved by adding free LLM models. Merging this PR will fix the problem and it would be helpful for solving similar issues in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sarfarazsiddiquii We have resolved this issue by adding an OpenAI proxy. An OpenAI key is no longer required. Please pull the latest changes from the main branch, ensure that no OpenAI key is present in the .env file, and test the transcript. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ankit-v2-3, Thank you for the update, the code is now testable. I’ve fixed the grouping logic in the latest commit, the agent will now properly group the transcription text into 2 minute intervals by default unless time interval is defined. Let me know if any change is required. |
||
""" | ||
Group transcript into specified time ranges with timestamps. | ||
|
||
:param str transcript_text: The raw transcript text. | ||
:param int time_range: Time range for grouping in minutes. | ||
:return: Grouped transcript with timestamps. | ||
:rtype: str | ||
""" | ||
lines = transcript_text.split("\n") | ||
grouped_transcript = [] | ||
current_time = 0 | ||
|
||
for i, line in enumerate(lines): | ||
if i % time_range == 0 and line.strip(): | ||
timestamp = f"[{current_time:02d}:00 - {current_time + time_range:02d}:00]" | ||
grouped_transcript.append(f"{timestamp} {line.strip()}") | ||
current_time += time_range | ||
|
||
return "\n".join(grouped_transcript) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep the ellipses (..) to a max of two in all status messages and actions?