-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Visit_id is not set at the beginning of some page visits. #123
Comments
This doesn't seem to be an issue with the socket (or the So the question is, under what condition are we sending a null |
Let's use the string |
This was implemented in 0ec97d1. Let's check the latest crawl data to see which urls have this issue. |
In converting the March 2017 25k id detection crawl, I found exactly one http_requests row that had visit_id set to -1. It was the first request of a page load (visit_id=6052, top_url='http://supersport.com'). We believe this is the result of a race condition in the extension, where the visit_id is not set before it starts getting the first page when the browser restarts. We also are afraid that there may be a case where the visit_id is set to the an old visit id for a new site_visit, and we aren't catching that. Attempt to measure: From the 2017-03 id detection, it looks like there are about 200 sites out of the 25,000 for which this MAY have happened. These are 200 sites for which we do not see a row where top_url == url in http_requests. Ideally we may want the browser to block until the extension confirms it got a new visit_id, before visiting the new page. See #135. In the meantime, it be best to drop these rows if it's happening on a small enough scale. The problem seems to only affect the first page load request. |
Proposed fix is in #135. |
While this error report is still valid and unfixed the fix for it is begin discussed in another issue so this one doesn't provide any additional value. Especially seeing how the observable behaviour is completely different by now. |
If a socket closes improperly, then calls to sock.send() will fail, and the error won't get caught. This will cause the TaskManager to die without failing gracefully (for instance, the TaskManager won't be able to save the browser profile).
An example of this in the wild. The first stack trace was caused by an error in the DataAggregator, and the second is the actual socket error. It is unknown if the two are related.
The text was updated successfully, but these errors were encountered: