Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Document toolkits #1408

Open
1 of 2 tasks
Wendong-Fan opened this issue Jan 7, 2025 · 2 comments
Open
1 of 2 tasks

[Feature Request] Document toolkits #1408

Wendong-Fan opened this issue Jan 7, 2025 · 2 comments
Assignees
Labels
New Feature P0 Task with high level priority
Milestone

Comments

@Wendong-Fan
Copy link
Member

Wendong-Fan commented Jan 7, 2025

Required prerequisites

Motivation

  1. Has the ability to process universal document types whether they are in urls or local files. It is supported by other bottom-level toolkits such as video toolkit, audio toolkit, etc. It uses GraphRAG to process raw text and query.

  2. Support text modification, changing existing text to improve quality, adjust tone, or better suit a specific audience. This is a fundamental task in content creation and editing. (low priority)

some implementation already done by MengKang

Solution

No response

Alternatives

No response

Additional context

No response

@Wendong-Fan
Copy link
Member Author

lead: @willshang76 ; support & review: @Aaron617 , @AveryYay

@Aaron617
Copy link
Collaborator

After testing the current document toolkit on GAIA, me and @Ralph-Zhou found several problems:

  • After processing longer content, it is basically converted into a RAG task. However, the current RAG capability is still somewhat behind agents like H2O. (current reference repository: https://github.com/shibing624/ChatPDF)
  • Function extract_webpage_content needs enhancement. Currently it may miss some content (for example, the table is not processed well).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Feature P0 Task with high level priority
Projects
Status: No status
Development

No branches or pull requests

4 participants