Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 2 firecrawl tools : Scrape and Search #6016

Merged
merged 22 commits into from
Jul 6, 2024
Merged

Add 2 firecrawl tools : Scrape and Search #6016

merged 22 commits into from
Jul 6, 2024

Conversation

ahasasjeb
Copy link
Contributor

@ahasasjeb ahasasjeb commented Jul 5, 2024

Checklist:

Important

Please review the checklist below before submitting your pull request.

  • Please open an issue before creating a PR or link to an existing issue
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

Description

Add 2 Firecrawler tools : Scrape and Search.
I have tested it using Docker.

Fixes #6015

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update, included: Dify Document
  • Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement
  • Dependency upgrade

Testing Instructions

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Test A
  • Test B

ahasasjeb and others added 20 commits July 2, 2024 12:49
Implement the ScrapeTool class inheriting from BuiltinTool in scrape.py, which uses the FirecrawlApp
to scrape data from a given URL. The tool supports custom scraping preferences and can return either
scraped documents or URLs. Also, include the scrape.yaml configuration file to define the tool's
identity, description, and parameters.

BREAKING CHANGE: The addition of the scrape tool may affect existing workflows that do not account
for this new tool. Ensure that your environment is prepared to handle the scrape tool beforedeploying this change.
By tongyi
Introduce the SearchTool class within the firecrawl tools, implementing functionality for
searching data using the Firecrawl API. This update also changes the author field for the
scrape tool from 'Richards Tu' to 'ahasasjeb'.

BREAKING CHANGE: The addition of the search tool and modification of the scrape tool's author
field may affect existing configurations or dependencies. Ensure to review and update
accordingly before deploying.
By tongyi
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jul 5, 2024
@ahasasjeb ahasasjeb changed the title Add 2 Firecrawler tools : Scrape and search Add 2 Firecrawler tools : Scrape and Search Jul 5, 2024
@ahasasjeb ahasasjeb changed the title Add 2 Firecrawler tools : Scrape and Search Add 2 firecrawl tools : Scrape and Search Jul 5, 2024
@crazywoola crazywoola requested a review from Yeuoly July 5, 2024 13:09
Copy link
Member

@crazywoola crazywoola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tools works as expected, but it would be better to have the ability to set the limit and max_depth

image

@ahasasjeb
Copy link
Contributor Author

These tools works as expected, but it would be better to have the ability to set the limit and max_depth

image

It shouldn't be involved here

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 6, 2024
@crazywoola crazywoola merged commit ab847c8 into langgenius:main Jul 6, 2024
5 checks passed
ZhouhaoJiang added a commit that referenced this pull request Jul 7, 2024
* refs/heads/fix/dataset_operator: (33 commits)
  feat: update dataset sort
  feat: add dataset_permissions tenant_id
  chore: optimize memory fetch performance (#6039)
  feat: support moonshot and glm base models for volcengine provider (#6029)
  Optimize db config (#6011)
  fix: token count includes base64 string of input images (#5868)
  chore: skip pip upgrade preparation in api dockerfile (#5999)
  feat(*): Swtich to dify_config. (#6025)
  fix: the input field of tool panel not worked as expected (#6003)
  Add 2 firecrawl tools : Scrape and Search (#6016)
  test(test_rerank): Remove duplicate test cases. (#6024)
  chore: optimize memory messages fetch count limit (#6021)
  Revert "feat: knowledge admin role" (#6018)
  feat: add Llama 3 and Mixtral model options to ddgo_ai.yaml (#5979)
  fix: add status_code 304 (#6000)
  6014 i18n add support for spanish (#6017)
  [Feature] Support loading for mermaid. (#6004)
  fix: update workflow trace query (#6010)
  Removed firecrawl-py, fixed and improved firecrawl tool (#5896)
  fix API tool's schema not support array (#6006)
  ...
@takatost takatost mentioned this pull request Jul 8, 2024
laipz8200 added a commit that referenced this pull request Jul 10, 2024
Co-authored-by: -LAN- <laipz8200@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add 2 Firecrawler tools : Scrape and search
3 participants