Not returning any additional links from menu crawl #494

Firelord710 · 2024-09-03T01:09:00Z

Firelord710
Sep 3, 2024

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/core

Issue description

Hello, running the following code to attempt to scrape just links from this site as an adaptation of the introduction example.

It is not adding any additional links to the que and I am unable to figure out why. Thank you in advance.

Code sample

import asyncio

# You don't need to import RequestQueue anymore.
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee import EnqueueStrategy


async def main() -> None:
    crawler = BeautifulSoupCrawler(max_requests_per_crawl=10000)

    @crawler.router.default_handler
    async def request_handler(context: BeautifulSoupCrawlingContext) -> None:

        # Bypass age confirmation popup
        try:
            await context.soup.wait_for_selector('button:has-text("I am 21 or older")', timeout=5000)
            await context.soup.click('button:has-text("I am 21 or older")')
            await context.soup.wait_for_load_state('networkidle')
        except:
            print("Age confirmation popup not found or already handled.")

        url = context.request.url
        title = context.soup.title.string if context.soup.title else ''
        context.log.info(f'The title of {url} is: {title}.')

        # The enqueue_links function is available as one of the fields of the context.
        # It is also context aware, so it does not require any parameters.
        await context.enqueue_links(selector='button, a', strategy=EnqueueStrategy.ALL)

    # Start the crawler with the provided URLs.
    await crawler.run(['https://dutchie.com/embedded-menu/truormed-dispensary'])


if __name__ == '__main__':
    asyncio.run(main())

Package version

0.3.2b4

Node.js version

Python

Operating system

Windows

Apify platform

Tick me if you encountered this issue on the Apify platform

I have tested this on the `next` release

No response

Other context

No response

Answered by janbuchar

Sep 3, 2024

Hello and thank you for your interest in Crawlee for Python! I checked out the website you're scraping and it looks like it's completely client-rendered, which means you need to use a headless browser - check out PlaywrightCrawler. Also, context.enqueue_links only works for links (a), not button elements

View full answer

janbuchar · 2024-09-03T08:10:04Z

janbuchar
Sep 3, 2024
Maintainer

Hello and thank you for your interest in Crawlee for Python! I checked out the website you're scraping and it looks like it's completely client-rendered, which means you need to use a headless browser - check out PlaywrightCrawler. Also, context.enqueue_links only works for links (a), not button elements

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not returning any additional links from menu crawl #494

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Not returning any additional links from menu crawl #494

Firelord710 Sep 3, 2024

Which package is this bug report for? If unsure which one to select, leave blank

Issue description

Code sample

Package version

Node.js version

Operating system

Apify platform

I have tested this on the next release

Other context

Replies: 1 comment

janbuchar Sep 3, 2024 Maintainer

Firelord710
Sep 3, 2024

I have tested this on the `next` release

janbuchar
Sep 3, 2024
Maintainer