Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure all records saved by openwpm-webext-instrumentation are attributed to the correct top-level URL #245

Closed
englehardt opened this issue Nov 29, 2018 · 2 comments
Labels
extension Relates to the WebExtension written in TS and JS

Comments

@englehardt
Copy link
Collaborator

See https://github.com/mozilla/openwpm-webext-instrumentation/issues/37 for a discussion of the reasons a correct top-level URL may not be available.

Hopefully we can resolve this issue in the instrumentation, but if not we may have more options to address it in the crawler itself. For example we open a new tab and close the previous tab when submitting a new site. There may be other similar tricks to play.

@englehardt englehardt added extension Relates to the WebExtension written in TS and JS FF60-Upgrade labels Nov 29, 2018
@englehardt
Copy link
Collaborator Author

When recording HTTP requests, we record a bunch of details for every request (including subresource requests) that occurs on a page. The includes the top_level_url, which is the URL of the tab in which the request is occurring. All of the request information is collected using the webRequest API using a variety of event listeners, as described here.

The top-level url is retrieved from the tab object, which is itself retrieved from the tabId pulled off the details object passed to the handler.

However, in our tests we find that requests are sometimes attributed to the incorrect top-level url, causing us to have to manually re-label them. That's fine for tests, but means that for real data collection we will also dead with mislabeled top-level URLs (which can't be easily corrected). Requests at the tail-end of the visit may be mis-attributed to the following top-level url if the tab navigates.

Additional context is provided in #361. It seems like the frame ancestors approach described by Alexei in https://bugzilla.mozilla.org/show_bug.cgi?id=1470537#c11 might work. It would require us to switch the listener over to onBeforeRequest to access frameAncestors. See Privacy Badger's WIP workaround to this issue (EFForg/privacybadger#2198).

@englehardt
Copy link
Collaborator Author

Fixed in #469 and #488.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension Relates to the WebExtension written in TS and JS
Projects
None yet
Development

No branches or pull requests

1 participant