Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncPlaywrightCrawlerStrategy page-evaluate context destroyed by navigation #304

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

dvschuyl
Copy link
Contributor

@dvschuyl dvschuyl commented Nov 29, 2024

Crawling webpage URLs (https://www.metanous.be/ in the example) sometimes raise the following error when using the default crawling strategy:

[INIT].... → Crawl4AI 0.3.744
[ERROR]... × https://www.metanous.be/... | Error: 
┌───────────────────────────────────────────────────────────────────────────────┐
│ × async_crawler_strategy.py:_crawleb(): Page.evaluate: Execution context was  │
│ destroyed, most likely because of a navigation                                │
└───────────────────────────────────────────────────────────────────────────────┘

I have tracked down the error and found that it is raised when executing the following line of code in the async_crawler_strategy._crawl_web-function:

await page.evaluate(update_image_dimensions_js)

By inserting a await page.wait_for_load_state() just before, I no longer get any errors.

[INIT].... → Crawl4AI 0.3.744
[FETCH]... ↓ https://www.metanous.be/... | Status: True | Time: 1.39s
[SCRAPE].. ◆ Processed https://www.metanous.be/... | Time: 33ms
[COMPLETE] ● https://www.metanous.be/... | Status: True | Total: 1.43s

FYI: The (simplified) code I used to crawl a website:

async def main():
    async with AsyncWebCrawler(verbose=True) as crawler:
        # Crawling the given URL
        result = await crawler.arun(
            url=...,
            extraction_strategy=None,
            cache_mode=CacheMode.BYPASS,
            magic=True,
        )

        # Printing the result
        print(len(result.markdown))
 
asyncio.run(main())

@dvschuyl dvschuyl changed the title AsyncPlaywrightCrawlerStrategy page-evaluate navigation destroyed AsyncPlaywrightCrawlerStrategy page-evaluate context destroyed by navigation Nov 29, 2024
@unclecode unclecode merged commit 1ed7c15 into unclecode:main Nov 29, 2024
@unclecode
Copy link
Owner

Your pull request has been merged, thh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants