Not returning any additional links from menu crawl #494
-
Which package is this bug report for? If unsure which one to select, leave blank@crawlee/core Issue descriptionHello, running the following code to attempt to scrape just links from this site as an adaptation of the introduction example. It is not adding any additional links to the que and I am unable to figure out why. Thank you in advance. Code sampleimport asyncio
# You don't need to import RequestQueue anymore.
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee import EnqueueStrategy
async def main() -> None:
crawler = BeautifulSoupCrawler(max_requests_per_crawl=10000)
@crawler.router.default_handler
async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
# Bypass age confirmation popup
try:
await context.soup.wait_for_selector('button:has-text("I am 21 or older")', timeout=5000)
await context.soup.click('button:has-text("I am 21 or older")')
await context.soup.wait_for_load_state('networkidle')
except:
print("Age confirmation popup not found or already handled.")
url = context.request.url
title = context.soup.title.string if context.soup.title else ''
context.log.info(f'The title of {url} is: {title}.')
# The enqueue_links function is available as one of the fields of the context.
# It is also context aware, so it does not require any parameters.
await context.enqueue_links(selector='button, a', strategy=EnqueueStrategy.ALL)
# Start the crawler with the provided URLs.
await crawler.run(['https://dutchie.com/embedded-menu/truormed-dispensary'])
if __name__ == '__main__':
asyncio.run(main()) Package version0.3.2b4 Node.js versionPython Operating systemWindows Apify platform
I have tested this on the
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hello and thank you for your interest in Crawlee for Python! I checked out the website you're scraping and it looks like it's completely client-rendered, which means you need to use a headless browser - check out |
Beta Was this translation helpful? Give feedback.
Hello and thank you for your interest in Crawlee for Python! I checked out the website you're scraping and it looks like it's completely client-rendered, which means you need to use a headless browser - check out
PlaywrightCrawler
. Also,context.enqueue_links
only works for links (a
), notbutton
elements