You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is collection of issues or improvements that were discovered during re-implementation of AdaptivePlaywrightCrawler in Python.
Ensure isolation of contexts for static / client only browsing.
Example situation: Rendering type predictor decides that both crawling methods should be used. This means that user handler will run twice. User can modify context in user handler, for example "user_data". This can lead to situation where second handler is working on already modified context.
Default result comparator checks only dataset changes. Maybe add comparison of added links. This on the other hand has to be done carefully as some sites when crawled with browser can generate additional options. Example of "same" link:
Document possible edge case of undesired mutation of global state.
In situation where static crawling failed, browser crawling is used as backup. If context.use_state method was already used in static crawling, then global state can be modified.
TBD ... more will be added during migration
The text was updated successfully, but these errors were encountered:
This is collection of issues or improvements that were discovered during re-implementation of AdaptivePlaywrightCrawler in Python.
Ensure isolation of contexts for static / client only browsing.
Example situation: Rendering type predictor decides that both crawling methods should be used. This means that user handler will run twice. User can modify context in user handler, for example "user_data". This can lead to situation where second handler is working on already modified context.
Default result comparator checks only dataset changes. Maybe add comparison of added links. This on the other hand has to be done carefully as some sites when crawled with browser can generate additional options. Example of "same" link:
Static: https://sdk.apify.com/docs/guides/getting-started
Browser: https://sdk.apify.com/docs/guides/getting-started?__hsfp=1136113150&__hssc=7591405.1.1735494277124&__hstc=7591405.e2b9302ed00c5bfaee3a870166792181.1735494277124.1735494277124.1735494277124.1
In situation where static crawling failed, browser crawling is used as backup. If
context.use_state
method was already used in static crawling, then global state can be modified.TBD ... more will be added during migration
The text was updated successfully, but these errors were encountered: