-
-
Notifications
You must be signed in to change notification settings - Fork 699
Firecrawl example docs #1438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firecrawl example docs #1438
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,103 @@ | ||||||||||||||
--- | ||||||||||||||
title: "Crawl a URL using Firecrawl" | ||||||||||||||
sidebarTitle: "Firecrawl URL crawl" | ||||||||||||||
description: "This example demonstrates how to crawl a URL using Firecrawl with Trigger.dev." | ||||||||||||||
--- | ||||||||||||||
|
||||||||||||||
## Overview | ||||||||||||||
|
||||||||||||||
Firecrawl is a tool for crawling websites and extracting clean markdown that's structured in an LLM-ready format. | ||||||||||||||
|
||||||||||||||
Here are two examples of how to use Firecrawl with Trigger.dev: | ||||||||||||||
|
||||||||||||||
## Prerequisites | ||||||||||||||
|
||||||||||||||
- A project with [Trigger.dev initialized](/quick-start) | ||||||||||||||
- A [Firecrawl](https://firecrawl.dev/) account | ||||||||||||||
|
||||||||||||||
## Example 1: crawl an entire website with Firecrawl | ||||||||||||||
|
||||||||||||||
This task crawls a website and returns the `crawlResult` object. You can set the `limit` parameter to control the number of URLs that are crawled. | ||||||||||||||
|
||||||||||||||
```ts trigger/firecrawl-url-crawl.ts | ||||||||||||||
import FirecrawlApp from "@mendable/firecrawl-js"; | ||||||||||||||
import { task } from "@trigger.dev/sdk/v3"; | ||||||||||||||
|
||||||||||||||
// Initialize the Firecrawl client with your API key | ||||||||||||||
const firecrawlClient = new FirecrawlApp({ | ||||||||||||||
apiKey: process.env.FIRECRAWL_API_KEY, // Get this from your Firecrawl dashboard | ||||||||||||||
}); | ||||||||||||||
|
||||||||||||||
export const firecrawlCrawl = task({ | ||||||||||||||
id: "firecrawl-crawl", | ||||||||||||||
run: async (payload: { url: string }) => { | ||||||||||||||
const { url } = payload; | ||||||||||||||
|
||||||||||||||
// Crawl: scrapes all the URLs of a web page and return content in LLM-ready format | ||||||||||||||
const crawlResult = await firecrawlClient.crawlUrl(url, { | ||||||||||||||
limit: 100, // Limit the number of URLs to crawl | ||||||||||||||
scrapeOptions: { | ||||||||||||||
formats: ["markdown", "html"], | ||||||||||||||
}, | ||||||||||||||
}); | ||||||||||||||
|
||||||||||||||
if (!crawlResult.success) { | ||||||||||||||
throw new Error(`Failed to crawl: ${crawlResult.error}`); | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
return { | ||||||||||||||
data: crawlResult, | ||||||||||||||
}; | ||||||||||||||
}, | ||||||||||||||
}); | ||||||||||||||
Comment on lines
+31
to
+52
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Add type safety and improve error handling. The implementation could benefit from:
+import { CrawlResponse } from "@mendable/firecrawl-js";
+
export const firecrawlCrawl = task({
id: "firecrawl-crawl",
- run: async (payload: { url: string }) => {
+ run: async (payload: { url: string; limit?: number }) => {
const { url } = payload;
// Crawl: scrapes all the URLs of a web page and return content in LLM-ready format
- const crawlResult = await firecrawlClient.crawlUrl(url, {
+ const crawlResult = await firecrawlClient.crawlUrl(url, {
limit: 100, // Limit the number of URLs to crawl
scrapeOptions: {
formats: ["markdown", "html"],
},
- });
+ }) as CrawlResponse;
|
||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
### Testing your task | ||||||||||||||
|
||||||||||||||
You can test your task by triggering it from the Trigger.dev dashboard. | ||||||||||||||
|
||||||||||||||
```json | ||||||||||||||
"url": "<url-to-crawl>" // Replace with the URL you want to crawl | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
## Example 2: scrape a single URL with Firecrawl | ||||||||||||||
|
||||||||||||||
This task scrapes a single URL and returns the `scrapeResult` object. | ||||||||||||||
|
||||||||||||||
```ts trigger/firecrawl-url-scrape.ts | ||||||||||||||
import FirecrawlApp, { ScrapeResponse } from "@mendable/firecrawl-js"; | ||||||||||||||
import { task } from "@trigger.dev/sdk/v3"; | ||||||||||||||
|
||||||||||||||
// Initialize the Firecrawl client with your API key | ||||||||||||||
const firecrawlClient = new FirecrawlApp({ | ||||||||||||||
apiKey: process.env.FIRECRAWL_API_KEY, // Get this from your Firecrawl dashboard | ||||||||||||||
}); | ||||||||||||||
|
||||||||||||||
export const firecrawlScrape = task({ | ||||||||||||||
id: "firecrawl-scrape", | ||||||||||||||
run: async (payload: { url: string }) => { | ||||||||||||||
const { url } = payload; | ||||||||||||||
|
||||||||||||||
// Scrape: scrapes a URL and get its content in LLM-ready format (markdown, structured data via LLM Extract, screenshot, html) | ||||||||||||||
const scrapeResult = (await firecrawlClient.scrapeUrl(url, { | ||||||||||||||
formats: ["markdown", "html"], | ||||||||||||||
})) as ScrapeResponse; | ||||||||||||||
Comment on lines
+82
to
+84
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid type casting and document the response structure. The type casting to - const scrapeResult = (await firecrawlClient.scrapeUrl(url, {
+ const scrapeResult: ScrapeResponse = await firecrawlClient.scrapeUrl(url, {
formats: ["markdown", "html"],
- })) as ScrapeResponse;
+ }); 📝 Committable suggestion
Suggested change
|
||||||||||||||
|
||||||||||||||
if (!scrapeResult.success) { | ||||||||||||||
throw new Error(`Failed to scrape: ${scrapeResult.error}`); | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
return { | ||||||||||||||
data: scrapeResult, | ||||||||||||||
}; | ||||||||||||||
}, | ||||||||||||||
}); | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
### Testing your task | ||||||||||||||
|
||||||||||||||
You can test your task by triggering it from the Trigger.dev dashboard. | ||||||||||||||
|
||||||||||||||
```json | ||||||||||||||
"url": "<url-to-scrape>" // Replace with the URL you want to scrape | ||||||||||||||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider making the
limit
parameter configurable.The hard-coded limit of 100 URLs might not suit all use cases. Consider: