Skip to content

New puppeteer examples #1355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion docs/config/config-file.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ sidebarTitle: "trigger.config.ts"
description: "This file is used to configure your project and how it's built."
---

import ScrapingWarning from "/snippets/web-scraping-warning.mdx";
import BundlePackages from "/snippets/bundle-packages.mdx";

The `trigger.config.ts` file is used to configure your Trigger.dev project. It is a TypeScript file at the root of your project that exports a default configuration object. Here's an example:
Expand Down Expand Up @@ -473,6 +474,32 @@ export default defineConfig({
});
```

#### puppeteer

<ScrapingWarning />

To use Puppeteer in your project, add these build settings to your `trigger.config.ts` file:

```ts trigger.config.ts
import { defineConfig } from "@trigger.dev/sdk/v3";

export default defineConfig({
project: "<project ref>",
// Your other config settings...
build: {
extensions: [puppeteer()],
},
});
```

And add the following environment variable in your Trigger.dev dashboard on the Environment Variables page:

```bash
PUPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable",
```

Follow [this example](/examples/puppeteer) to get setup with Trigger.dev and Puppeteer in your project.

#### ffmpeg

You can add the `ffmpeg` build extension to your build process:
Expand All @@ -482,7 +509,7 @@ import { defineConfig } from "@trigger.dev/sdk/v3";
import { ffmpeg } from "@trigger.dev/build/extensions/core";

export default defineConfig({
//..other stuff
// Your other config settings...
build: {
extensions: [ffmpeg()],
},
Expand All @@ -505,6 +532,8 @@ export default defineConfig({

This extension will also add the `FFMPEG_PATH` and `FFPROBE_PATH` to your environment variables, making it easy to use popular ffmpeg libraries like `fluent-ffmpeg`.

Follow [this example](/examples/ffmpeg-video-processing) to get setup with Trigger.dev and FFmpeg in your project.

#### esbuild plugins

You can easily add existing or custom esbuild plugins to your build process using the `esbuildPlugin` extension:
Expand Down
1 change: 1 addition & 0 deletions docs/examples/intro.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ description: "Learn how to use Trigger.dev with these practical task examples."
| [OpenAI with retrying](/examples/open-ai-with-retrying) | Create a reusable OpenAI task with custom retry options. |
| [PDF to image](/examples/pdf-to-image) | Use `MuPDF` to turn a PDF into images and save them to Cloudflare R2. |
| [React to PDF](/examples/react-pdf) | Use `react-pdf` to generate a PDF and save it to Cloudflare R2. |
| [Puppeteer](/examples/puppeteer) | Use Puppeteer to generate a PDF or scrape a webpage. |
| [Resend email sequence](/examples/resend-email-sequence) | Send a sequence of emails over several days using Resend with Trigger.dev. |
| [Sharp image processing](/examples/sharp-image-processing) | Use Sharp to process an image and save it to Cloudflare R2. |
| [Stripe webhook](/examples/stripe-webhook) | Trigger a task from Stripe webhook events. |
Expand Down
217 changes: 217 additions & 0 deletions docs/examples/puppeteer.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
---
title: "Puppeteer"
sidebarTitle: "Puppeteer"
description: "These examples demonstrate how to use Puppeteer with Trigger.dev."
---

import LocalDevelopment from "/snippets/local-development-extensions.mdx";
import ScrapingWarning from "/snippets/web-scraping-warning.mdx";

## Overview

There are 3 example tasks to follow on this page:

1. [Basic example](/examples/puppeteer#basic-example)
2. [Generate a PDF from a web page](/examples/puppeteer#generate-a-pdf-from-a-web-page)
3. [Scrape content from a web page](/examples/puppeteer#scrape-content-from-a-web-page)

<ScrapingWarning/>

## Build configurations

To use all examples on this page, you'll first need to add these build settings to your `trigger.config.ts` file:

```ts trigger.config.ts
import { defineConfig } from "@trigger.dev/sdk/v3";

export default defineConfig({
project: "<project ref>",
// Your other config settings...
build: {
// This is required to use the Puppeteer library
extensions: [puppeteer()],
},
});
```

Learn more about [build configurations](/config/config-file#build-configuration) including setting default retry settings, customizing the build environment, and more.

## Set an environment variable

Set the following environment variable in your [Trigger.dev dashboard](/deploy-environment-variables) or [using the SDK](/deploy-environment-variables#in-your-code):

```bash
PUPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable",
```

## Basic example

### Overview

In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.

### Task code

```ts trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";

export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();

await page.goto("https://trigger.dev");

const content = await page.title();
logger.info("Content", { content });

await browser.close();
},
});
```

### Testing your task

There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
Comment on lines +47 to +77
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Improve compatibility and grammar in the basic example

The basic example is clear and functional. However, there are two improvements we can make:

  1. Add launch options for better compatibility across different environments.
  2. Fix a minor grammatical issue in the overview.

Please apply the following changes:

  1. Modify the puppeteer.launch() call in the code snippet:
-    const browser = await puppeteer.launch();
+    const browser = await puppeteer.launch({
+      headless: "new",
+      args: ['--no-sandbox', '--disable-setuid-sandbox']
+    });
  1. Add a comma in the overview sentence:
-In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.
+In this example, we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.

These changes will ensure better compatibility across different environments and improve the grammatical correctness of the documentation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Basic example
### Overview
In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.
### Task code
```ts trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://trigger.dev");
const content = await page.title();
logger.info("Content", { content });
await browser.close();
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
## Basic example
### Overview
In this example, we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.
### Task code
```ts trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch({
headless: "new",
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.goto("https://trigger.dev");
const content = await page.title();
logger.info("Content", { content });
await browser.close();
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
🧰 Tools
LanguageTool

[typographical] ~51-~51: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer t...

(DURING_THAT_TIME_COMMA)


## Generate a PDF from a web page

### Overview

In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).

### Task code

```ts trigger/puppeteer-generate-pdf.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";

// Initialize S3 client
const s3Client = new S3Client({
region: "auto",
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
},
});

export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const response = await page.goto("https://trigger.dev");
const url = response?.url() ?? "No URL found";

// Generate PDF from the web page
const generatePdf = await page.pdf();

logger.info("PDF generated from URL", { url });

await browser.close();

// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET,
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};

logger.log("Uploading to R2 with params", uploadParams);

// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
},
});

```

### Testing your task

There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
Comment on lines +79 to +140
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Enhance error handling, resource management, and grammar in the PDF generation example

The PDF generation example is functional, but we can improve its robustness and clarity:

  1. Implement better error handling and resource management.
  2. Fix a minor grammatical issue in the overview.
  3. Use environment variables more safely.

Please apply the following changes:

  1. Modify the puppeteerWebpageToPDF task:
 export const puppeteerWebpageToPDF = task({
   id: "puppeteer-webpage-to-pdf",
   run: async () => {
+    let browser;
+    try {
-      const browser = await puppeteer.launch();
+      browser = await puppeteer.launch({
+        headless: "new",
+        args: ['--no-sandbox', '--disable-setuid-sandbox']
+      });
       const page = await browser.newPage();
       const response = await page.goto("https://trigger.dev");
       const url = response?.url() ?? "No URL found";

       // Generate PDF from the web page
       const generatePdf = await page.pdf();

       logger.info("PDF generated from URL", { url });

-      await browser.close();

       // Upload to R2
       const s3Key = `pdfs/test.pdf`;
       const uploadParams = {
-        Bucket: process.env.S3_BUCKET,
+        Bucket: process.env.S3_BUCKET ?? '',
         Key: s3Key,
         Body: generatePdf,
         ContentType: "application/pdf",
       };

       logger.log("Uploading to R2 with params", uploadParams);

       // Upload the PDF to R2 and return the URL.
       await s3Client.send(new PutObjectCommand(uploadParams));
-      const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
+      const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`;
       logger.log("PDF uploaded to R2", { url: s3Url });
       return { pdfUrl: s3Url };
+    } catch (error) {
+      logger.error("Error in puppeteerWebpageToPDF", { error });
+      throw error;
+    } finally {
+      if (browser) {
+        await browser.close();
+      }
+    }
   },
 });
  1. Add a comma in the overview sentence:
-In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).
+In this example, we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).

These changes will improve error handling, ensure proper resource cleanup, make the code more robust against potential issues with environment variables, and improve the grammatical correctness of the documentation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Generate a PDF from a web page
### Overview
In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).
### Task code
```ts trigger/puppeteer-generate-pdf.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";
// Initialize S3 client
const s3Client = new S3Client({
region: "auto",
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
},
});
export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const response = await page.goto("https://trigger.dev");
const url = response?.url() ?? "No URL found";
// Generate PDF from the web page
const generatePdf = await page.pdf();
logger.info("PDF generated from URL", { url });
await browser.close();
// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET,
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};
logger.log("Uploading to R2 with params", uploadParams);
// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
## Generate a PDF from a web page
### Overview
In this example, we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).
### Task code
```ts trigger/puppeteer-generate-pdf.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";
// Initialize S3 client
const s3Client = new S3Client({
region: "auto",
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
},
});
export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
let browser;
try {
browser = await puppeteer.launch({
headless: "new",
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
const response = await page.goto("https://trigger.dev");
const url = response?.url() ?? "No URL found";
// Generate PDF from the web page
const generatePdf = await page.pdf();
logger.info("PDF generated from URL", { url });
// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET ?? '',
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};
logger.log("Uploading to R2 with params", uploadParams);
// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
} catch (error) {
logger.error("Error in puppeteerWebpageToPDF", { error });
throw error;
} finally {
if (browser) {
await browser.close();
}
}
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
🧰 Tools
LanguageTool

[typographical] ~83-~83: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer t...

(DURING_THAT_TIME_COMMA)


## Scrape content from a web page

### Overview

In this example we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. See [this list](/examples/puppeteer#proxying) for more proxying services we recommend.

<Note>
When web scraping, you MUST use the technique below which uses a proxy with Puppeteer. Direct scraping without using `browserWSEndpoint` is prohibited and will result in account suspension.
</Note>

### Task code

```ts trigger/scrape-website.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer-core";

export const puppeteerScrapeWithProxy = task({
id: "puppeteer-scrape-with-proxy",
run: async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
});

const page = await browser.newPage();

// Set up BrowserBase proxy authentication
await page.authenticate({
username: "api",
password: process.env.BROWSERBASE_API_KEY || "",
});

try {
// Navigate to the target website
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });

// Scrape the GitHub stars count
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
const text = starElement?.textContent ?? "0";
const numberText = text.replace(/[^0-9]/g, "");
return parseInt(numberText);
});

logger.info("GitHub star count", { starCount });

return { starCount };
} catch (error) {
logger.error("Error during scraping", {
error: error instanceof Error ? error.message : String(error),
});
throw error;
} finally {
await browser.close();
}
},
});
```

### Testing your task

There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).

<LocalDevelopment packages={"the Puppeteer library."} />

## Proxying

If you're using Trigger.dev Cloud and Puppeteer or any other tool to scrape content from websites you don't own, you'll need to proxy your requests. **If you don't you'll risk getting our IP address blocked and we will ban you from our service.**

Here are a list of proxy services we recommend:

- [Browserbase](https://www.browserbase.com/)
- [Brightdata](https://brightdata.com/)
- [Browserless](https://browserless.io/)
- [Oxylabs](https://oxylabs.io/)
- [ScrapingBee](https://scrapingbee.com/)
- [Smartproxy](https://smartproxy.com/)
1 change: 1 addition & 0 deletions docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,7 @@
"examples/ffmpeg-video-processing",
"examples/open-ai-with-retrying",
"examples/pdf-to-image",
"examples/puppeteer",
"examples/sharp-image-processing",
"examples/stripe-webhook",
"examples/supabase-storage-upload",
Expand Down
3 changes: 3 additions & 0 deletions docs/snippets/web-scraping-warning.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<Warning>
**WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension. See [this example](/examples/puppeteer#scrape-content-from-a-web-page) using a proxy.
</Warning>