Skip to content

community: add include_comment_forest to RedditSearch #29699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

hemati
Copy link
Contributor

@hemati hemati commented Feb 9, 2025

  • Description:
    Introduces a new include_comment_forest boolean parameter in the RedditSearchSchema and RedditSearchRun tool. When set to True, this fetches and recursively parses the entire nested comment tree using _parse_comment_forest, and formats it using _format_comment_forest. Updated run and results methods to support displaying the full comment hierarchy.

  • Issue:
    (No specific issue to reference.)

  • Dependencies:
    None.

…and format the entire comment forest

- Introduce `include_comment_forest` as a boolean parameter in the `RedditSearchSchema` and `RedditSearchRun` tool.
- Update `RedditSearchAPIWrapper` methods to fetch and recursively parse all nested comments (`_parse_comment_forest`) when `include_comment_forest` is True.
- Add `_format_comment_forest` to output the entire comment tree in a readable format.
- Extend the `run` and `results` methods to utilize this new parameter and display the retrieved comment hierarchy.
@dosubot dosubot bot added the size:L label Feb 9, 2025
Copy link

vercel bot commented Feb 9, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 20, 2025 8:24am

@dosubot dosubot bot added the community label Feb 9, 2025
Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes more sense to initialize the tool with include_comment_forest, instead of requiring the model to populate True/False?

Note that as-is, this PR would require all models using this tool to populate the new parameter. Some models will also start filling out True, will pull a huge comment history, and cause problems for maintainers. IMO this should be an opt-in feature.

@ccurme ccurme self-assigned this Feb 12, 2025
@hemati
Copy link
Contributor Author

hemati commented Feb 15, 2025

Thanks for the thoughtful feedback!

My original reason for having include_comment_forest as a parameter in each request was to let the model/agent decide on a per-call basis whether it needs the full comment tree. For example:

  1. Selective Deep-Dive
    • The agent might first retrieve only post metadata. If it finds something important, it can dynamically request the full comment forest (include_comment_forest=True) on a second call.
  2. Contextual User Requests
    • If a user specifically asks, “What are people saying about X in the comments?” the agent can enable include_comment_forest=True only when needed, instead of always pulling comments.
  3. Adaptive Workflows
    • The agent can decide on the fly based on resource limits or user instructions, rather than one static setting for all calls.

However, I understand your concern that this could lead to unintended large data pulls if an LLM or user call sets include_comment_forest=True by default.

Proposed Compromise:

  1. Keep include_comment_forest in the request schema, defaulted to False.
  2. Add a constructor parameter, for example allow_comment_forest, which defaults to False.
  3. If allow_comment_forest is False, the tool will ignore any True value for include_comment_forest in the call, effectively preventing deep comment pulls.
  4. If someone needs dynamic control, they explicitly set allow_comment_forest=True so the agent can opt in on a per-call basis.

This way:

  • Maintain flexibility: Developers who want dynamic deep dives can enable them at runtime.
  • Protect performance: If maintainers want to prevent accidental large data usage, they leave allow_comment_forest=False.

Would that address both the need for runtime flexibility and your concern about possibly large or unintended data pulls?

Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. This could work but it would be confusing for a model to specify include_comment_forest with no effect. You could dynamically select the schema. But I think ideally we get this out of langchain-community so we can test and version changes like this properly (there also appear to be no tests on this tool, which makes reviewing new features difficult).

I believe you contributed the Discord integration, would you be interested in doing something similar for Reddit?

@hemati
Copy link
Contributor Author

hemati commented Feb 25, 2025

Thanks for the update. This could work but it would be confusing for a model to specify include_comment_forest with no effect. You could dynamically select the schema. But I think ideally we get this out of langchain-community so we can test and version changes like this properly (there also appear to be no tests on this tool, which makes reviewing new features difficult).

I believe you contributed the Discord integration, would you be interested in doing something similar for Reddit?

Thanks for the feedback! Yes, I did contribute the Discord integration. I’d be happy to work on a similar, fully-tested and separately versioned Reddit integration if that’s the direction you prefer.

Do you want to close this PR?

@ccurme
Copy link
Collaborator

ccurme commented Feb 26, 2025

Awesome, yes. Will close for now. Thank you!

@ccurme ccurme closed this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants