-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get current URL in customCrawl() #364
Comments
Hey @popstas |
We can use preRequest option to skip urls. we can persist or do anything to the url in there |
2 years since the issue was opened, but if others in the future are looking to get the current URL, it's available in the
|
What is the current behavior?
No information about current URL in customCrawl()
What is the motivation / use case for changing the behavior?
I'm want to skip request, but add URL to csv for some files like zip, doc, pdf.
My code that do it - https://github.com/viasite/sites-scraper/blob/59449b1b03/src/scrap-site.js#L240-L255
Proposal
Add crawler to customCrawl:
customCrawl: async (page, crawl, crawler)
I tried to store currentURL with
requeststarted
event, but it fail when more when concurrency > 1.What do you think about it? I can make PR.
The text was updated successfully, but these errors were encountered: