You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hopefully we can resolve this issue in the instrumentation, but if not we may have more options to address it in the crawler itself. For example we open a new tab and close the previous tab when submitting a new site. There may be other similar tricks to play.
The text was updated successfully, but these errors were encountered:
When recording HTTP requests, we record a bunch of details for every request (including subresource requests) that occurs on a page. The includes the top_level_url, which is the URL of the tab in which the request is occurring. All of the request information is collected using the webRequest API using a variety of event listeners, as described here.
However, in our tests we find that requests are sometimes attributed to the incorrect top-level url, causing us to have to manually re-label them. That's fine for tests, but means that for real data collection we will also dead with mislabeled top-level URLs (which can't be easily corrected). Requests at the tail-end of the visit may be mis-attributed to the following top-level url if the tab navigates.
See https://github.com/mozilla/openwpm-webext-instrumentation/issues/37 for a discussion of the reasons a correct top-level URL may not be available.
Hopefully we can resolve this issue in the instrumentation, but if not we may have more options to address it in the crawler itself. For example we open a new tab and close the previous tab when submitting a new site. There may be other similar tricks to play.
The text was updated successfully, but these errors were encountered: