-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request for crawl_images function #384
Comments
Thanks @caroheymesitf For now you can select using a regex for matching img src URL. This is not exactly XPath but can help in many cases. What do you want the playwright integration for? Another use case? Thanks! |
Hi Elias, Great to have a feed-back ! Ideally, I need the Playwright integration in the crawl_images function. Some websites, heavily reliant on JavaScript, do not allow images to load completely through Advertools. Perhaps scrapy-playright can be a fix. I wasn't able to fix the Playwright integration, and since the function is labeled "THIS FUNCTION IS STILL EXPERIMENTAL. Expect many changes," I was wondering if it would be possible to add XPath selection as well? Thanks a million ! |
JS: This is going to depend on how each website does it, and will need a deeper look, because I don't think there is one way that can easily be tackled. I could be wrong though. XPath: It's not clear to me how to implement that yet. I quickly checked scrapy docs and source for the image pipeline, not sure how to implement that. I'll let you know if/when I do. |
Hi Elias,
|
@caroheymesitf Thanks for the code! I think this: if self.xpath is not None:
img_src = response.xpath(self.xpath).getall()
else:
img_src = response.xpath("//img/@src").getall() should be modified to response.xpath(f'{USER_SUPPLIED_XPATH}//*//img/@src').getall() Get the Otherwise I think it's doable. Just a few naming conventions for consistency with other functions |
Hello Elias,
Is there any possibility to add the integration with playright in the crawl_images function ?
Furthermore, I would love to be able to select specific images by their xpath !
Cheers !
The text was updated successfully, but these errors were encountered: