Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: session.markBad() on requestHandler error #1709

Merged
merged 2 commits into from
Dec 9, 2022
Merged

Conversation

barjin
Copy link
Contributor

@barjin barjin commented Dec 7, 2022

fixes #1635

Enforces the access to the same Session instance throughout the inheritance chain (e.g. BrowserCrawler and BasicCrawler). This is required because of rogue Session mutations (e.g. here).

@metalwarrior665
Copy link
Member

@barjin Isn't that Browser pool part just buggy? Seems like it always restarts the session as the one from launchContext.

@barjin
Copy link
Contributor Author

barjin commented Dec 8, 2022

@metalwarrior665 To be fair, I'm not exactly sure - to me, it seems that the BrowserPool thing pairs the sessions with the Page instances, which seems alright - but better ask the original (git blame) author - @B4nan?

@@ -943,7 +943,7 @@ export class BasicCrawler<Context extends CrawlingContext = BasicCrawlingContext

// reclaim session if request finishes successfully
request.state = RequestState.DONE;
session?.markGood();
crawlingContext.session?.markGood();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing how this helps since both session and crawlingContext.session are pointing out to the same reference. Or is there some issue with the session initialization?

Copy link
Contributor Author

@barjin barjin Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, run it and see - they are not referencing the same object :)

The call to requestHandler at L930 passes the crawlingContext as a reference to the request handling machinery - which eventually, here, overwrites the internal session property with a new copy. The session variable in this scope (used to create the crawlingContext) still points to the old session object, but it's not the session that's in the crawlingContext now - that one that was used during the request handling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Indeed, those mutations...

@B4nan
Copy link
Member

B4nan commented Dec 9, 2022

i dont think i am the original author, you'd have to check the history in apify-ts repo :] i moved it here in one commit, so i am author of pretty much every line prior to crawlee release in git blame :D

anyway, lets merge, refactoring can wait.

@B4nan B4nan merged commit e87eb1f into master Dec 9, 2022
@B4nan B4nan deleted the fix/sessionMarkBad branch December 9, 2022 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Session state is not changed after failed request
3 participants