Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added agent to get video transcripts. #72

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

sarfarazsiddiquii
Copy link

@sarfarazsiddiquii sarfarazsiddiquii commented Nov 19, 2024

Added an agent (in reference to issue #70) to retrieve video transcripts in raw format or with timestamps.

Summary by CodeRabbit

  • New Features

    • Introduced the TranscriptionAgent, enabling video transcription with optional timestamp formatting.
    • Integrated the TranscriptionAgent into the ChatHandler for enhanced functionality.
  • Bug Fixes

    • Improved error handling for cases where transcripts are not found, ensuring a smoother user experience.

@ashish-spext
Copy link
Contributor

Getting the error:

[BACKEND]     "parameters": self.parameters,
[BACKEND]                   ^^^^^^^^^^^^^^^
[BACKEND] AttributeError: 'VideoTranscriptionAgent' object has no attribute 'parameters'. Did you mean: 'get_parameters'?


logger = logging.getLogger(__name__)

class VideoTranscriptionAgent(BaseAgent):
Copy link
Contributor

@ashish-spext ashish-spext Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming should be consistent.

1 - Existing agent file doesn't have _agent
transcript_agent -> transcription.py

2 - Agent name VideoTranscriptionAgent -> TranscriptionAgent (since in future we may reuse this for audio transcription as well)

3 - agent_name video_transcription -> transcription

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarfarazsiddiquii agent name is still video_transcription can you make it simply transcription.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been resolved in the recent commit. Sorry for overlooking it.

backend/director/agents/transcript_agent.py Outdated Show resolved Hide resolved
@sarfarazsiddiquii
Copy link
Author

sarfarazsiddiquii commented Nov 22, 2024

Hey @ashish-spext, the error 'VideoTranscriptionAgent' object has no attribute 'parameters', naming conventions, and conflicts are resolved in the latest commit.
Thanks.

@ashish-spext
Copy link
Contributor

Awesome! Let me test it in a while.

data={"video_id": video_id, "transcript": output_text},
)

def _group_transcript_with_timestamps(self, transcript_text: str, time_range: int) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grouping logic is not correct.

If you will test it you will get only one block like this:
image

Reason: There are no new lines in transcription text, and even if they were new line representing the given range (2 minutes in case of default) is wrong.

Correct way would be to use the transcription dictionary that VideoDB tool is sending it has timing information unlike the transcription text that is being used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashish-spext I’m having a hard time understanding the structure of the transcription dictionary returned by the VideoDB tool.

I think the output of the get_transcript() method, when called with text=False, will give us transcript details.

However, I’m unable to see any changes I’ve made or test the app due to API key limitations:
image

can you please provide more information about how timing information is stored in transcription dictionary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue can be resolved by adding free LLM models. Merging this PR will fix the problem and it would be helpful for solving similar issues in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarfarazsiddiquii We have resolved this issue by adding an OpenAI proxy. An OpenAI key is no longer required. Please pull the latest changes from the main branch, ensure that no OpenAI key is present in the .env file, and test the transcript.

Copy link
Author

@sarfarazsiddiquii sarfarazsiddiquii Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankit-v2-3, Thank you for the update, the code is now testable.

I’ve fixed the grouping logic in the latest commit, the agent will now properly group the transcription text into 2 minute intervals by default unless time interval is defined.

Let me know if any change is required.

image


output_text_content.text = output_text
output_text_content.status = MsgStatus.success
output_text_content.status_message = "Transcription completed successfully."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message like "Here is your transcription" would be better since we are using that for title of the trascription.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. I’ve corrected the changes in the recent commit.

@sarfarazsiddiquii
Copy link
Author

Hi @ashish-spext, I've made the requested changes.
Could you please review them and let me know if there's anything else you'd like me to add to this PR?

@ashish-spext
Copy link
Contributor

Thanks for the changes.
Can you please pull to resolve the conflicts.
@ankit-v2-3 from our team will take up the review.

Copy link
Contributor

coderabbitai bot commented Dec 6, 2024

Walkthrough

The changes introduce a new TranscriptionAgent class in transcription.py, which extends BaseAgent and is responsible for transcribing videos. It includes functionality for optional timestamp formatting and error handling during the transcription process. Additionally, the ChatHandler class in handler.py is updated to incorporate the new agent, ensuring it is part of the agent management system.

Changes

File Path Change Summary
backend/director/agents/transcription.py - Added TranscriptionAgent class extending BaseAgent.
- Implemented run method for transcription processing.
- Added _group_transcript_with_timestamps method for formatting transcripts.
- Included error handling and logging for the transcription process.
backend/director/handler.py - Imported TranscriptionAgent and added it to the self.agents list in ChatHandler.

Possibly related PRs

  • Ashu/fix launch v1 #96: This PR is unrelated as it focuses on modifications to the README.md file, enhancing documentation rather than any code changes or functionality related to the TranscriptionAgent class or its methods.

Suggested reviewers

  • ashish-spext

Poem

🐰 In the realm of code where agents play,
A new friend hops in to brighten the day.
With transcripts to gather, in timestamps they flow,
Our TranscriptionAgent is ready to show!
So let’s raise a cheer for this coding delight,
Hopping through videos, making words bright! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 88b61b0 and f28a1a8.

📒 Files selected for processing (2)
  • backend/director/agents/transcription.py (1 hunks)
  • backend/director/handler.py (2 hunks)
🔇 Additional comments (1)
backend/director/handler.py (1)

25-25: Integration of TranscriptionAgent is implemented correctly

The TranscriptionAgent is properly imported (line 25) and added to the agents list (line 61) in the ChatHandler class. This allows the agent to be utilized within the application as intended.

Also applies to: 61-61

backend/director/agents/transcription.py Show resolved Hide resolved
backend/director/agents/transcription.py Show resolved Hide resolved
backend/director/agents/transcription.py Show resolved Hide resolved
self.output_message.actions.append("Trying to get the video transcription...")
output_text_content = TextContent(
agent_name=self.agent_name,
status_message="Processing the transcription...",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please keep the ellipses (..) to a max of two in all status messages and actions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants