Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary WebAPI JS instrumentation #642

Merged
merged 113 commits into from
Jul 8, 2020
Merged

Arbitrary WebAPI JS instrumentation #642

merged 113 commits into from
Jul 8, 2020

Conversation

birdsarah
Copy link
Contributor

@birdsarah birdsarah commented May 9, 2020

Fixes #641

To do:

  • Getting the pasted below error when instrumenting XMLHttpObject. These all appear to be Can't redefine non-configurable property "XXX". Need to handle this property type. There may be other cases that come up. wontfix - these errors don't prevent instrumentation occurring.
  • Implement API as below. Can also pass in a logSettings object.
  • Ensure do not repeat instrument - creates issues because the new prototype ends up with separate getters and setters.
  • Can pass, for example, "excludedProperties" or "nonExistingPropertiesToInstrument" a logSettings object if desired.
  • Tests
  • Add support to instrument arrays (plugins, mimeTypes currently missing from fingerprinting.json too) - I've added back a version of the instrumentation that could be sufficient.
  • Performance test - tests that:
    • (a) time how long it takes instrumentation to load
    • (b) hits an API over and over and see if we miss any. Decided there's no point in doing this as instrumentation def loads first.
    • (c) will not cover how much instrumentation of n apis slows down a page.
  • Regression - add a test for not propogating down propertiesToInstrument and fix regression I introduced.
  • Add JSON parsing to crawler.py
  • Update README .
    • From @englehardt review - "Now that folks can instrumentation arbitrary JS objects, we should (a) mention that, (b) describe what a shortcut is, and which ones exist already, and (c) provide a short example of two." "I like that we've moved the documentation for instrumentObject out of JS (which is good, since it's user configurable). I'm not sure how someone who wants to write their own settings file would be able to figure out which options are available. Is the only way to read the schema file? That would be pretty clunky. (Perhaps this will be covered by a README update)."
    • Add developer docs about rendering schemas to markdown.
      js_instrument:true,
      js_instrument_modules: [
        // Shortcut
        "fingerprinting",
        // APIs
        "Storage",
        {"XMLHttpRequest": ["send"]},
        // Specific instances on window
        {"window.document": ["cookie", "referrer"]},
        {"window": ["name", "localStorage", "sessionStorage"]}
      ],
      http_instrument:true,
      callstack_instrument:true,

For follow-on issues:

  • Update openwpm-crawler with new input
  • Add an "all" option (nope - crashes everything)
  • Do you want to be able to specify just a property? e.g. XMLHttpReqest.send vs XMLHttpRequest (this could be a follow-on PR)

Questions:

  • For "fingerprinting" - do we want to keep the current set of instrumented APIs which is some whole modules, and sometimes just a limited set of properties? For example, we only take pixelDepth and colorDepth from window.screen instead of the full 15 or so options. The difference might be starker on other modules, I haven't done a thorough review.
    • Keep the same for now.
  • I'm leaving all logSettings options default. As best as I can tell we've never used them. Thoughts? (Given that we haven't used it, I'm tempted to propose removing it to simplify the code).
  • Why not have it so settings are passed around as JSON and JSON is handled webext side.
    • Because by making a magic js string we are actually referring to the window object as opposed to having a string in JSON. It's' the difference between '{object: window.CanvasRenderingContext2D.prototype, instrumentedName: "CanvasRenderingContext2D",...}' and {object: "window.CanvasRenderingContext2D.prototype", instrumentedName: "CanvasRenderingContext2D",...}. In the latter case where we're passing around JSON, on the JS side we then have to find a way to turn the string "window.CanvasRenderingContext2D.prototype" into the object window.CanvasRenderingContext2D.prototype.

birdsarah added 9 commits May 8, 2020 19:49
Getting errors like

OpenWPM: Error name: TypeError post_request_ajax.html:237:17
OpenWPM: Error message: can't redefine non-configurable property
"UNSENT" post_request_ajax.html:238:17
@birdsarah
Copy link
Contributor Author

@englehardt - how would you feel if "fingerprinting" instrumented a few more things e.g. whole of navigator and CanvasRenderingContext2D?

@englehardt
Copy link
Collaborator

@englehardt - how would you feel if "fingerprinting" instrumented a few more things e.g. whole of navigator and CanvasRenderingContext2D?

If it seems like it can be used for fingerprinting I think it's fair game. My only warning is that some APIs are incredibly noisy.

@birdsarah
Copy link
Contributor Author

My only warning is that some APIs are incredibly noisy.

Yeah, I think I need to leave a more flexible solution open.

@birdsarah
Copy link
Contributor Author

@englehardt - how do you feel about this as a starting point API for this PR. It will meet all my needs and the needs of the current fingerprinting instrumentation. We can follow-up with more implementation as needs arise (see "follow-on prs" above).

      js_instrument:true,
      js_instrument_modules: [
        // Shortcut
        "fingerprinting",
        // APIs
        "Storage",
        {"XMLHttpRequest": ["send"]},
        // Specific instances on window
        {"window.document": ["cookie", "referrer"]},
        {"window": ["name", "localStorage", "sessionStorage"]}
      ],
      http_instrument:true,
      callstack_instrument:true,

@englehardt
Copy link
Collaborator

@englehardt - how do you feel about this as a starting point API for this PR. It will meet all my needs and the needs of the current fingerprinting instrumentation. We can follow-up with more implementation as needs arise (see "follow-on prs" above).

      js_instrument:true,
      js_instrument_modules: [
        // Shortcut
        "fingerprinting",
        // APIs
        "Storage",
        {"XMLHttpRequest": ["send"]},
        // Specific instances on window
        {"window.document": ["cookie", "referrer"]},
        {"window": ["name", "localStorage", "sessionStorage"]}
      ],
      http_instrument:true,
      callstack_instrument:true,

This looks like a great starting point! Thanks!

birdsarah added 3 commits May 18, 2020 12:56
* We build and mandate LogSettings.
* We have a new JSInstrumentatinRequest that everything runs through
* Preset, fingerprinting, will be specified in JSON
Enum for Operation
@birdsarah birdsarah marked this pull request as draft May 18, 2020 23:40
@birdsarah birdsarah changed the title [WIP] Arbitrary WebAPI JS instrumentation Arbitrary WebAPI JS instrumentation May 18, 2020
@birdsarah birdsarah changed the base branch from master to issue-443 May 19, 2020 04:43
@birdsarah birdsarah marked this pull request as ready for review June 26, 2020 04:18
@birdsarah birdsarah requested a review from englehardt June 26, 2020 04:18
@birdsarah
Copy link
Contributor Author

@englehardt This PR is finished. I've incorporated all your suggestions / questions / concerns and the things we discussed in person the other day. The only thing I had to not do was "JSON everywhere". Half way through changing, I remembered why I don't do that. Here's why:

(copied from front matter on this PR)

  • Why not have it so settings are passed around as JSON and JSON is handled webext side.
    • Because by making a magic js string we are actually referring to the window object as opposed to having a string in JSON. It's' the difference between '{object: window.CanvasRenderingContext2D.prototype, instrumentedName: "CanvasRenderingContext2D",...}' and {object: "window.CanvasRenderingContext2D.prototype", instrumentedName: "CanvasRenderingContext2D",...}. In the latter case where we're passing around JSON, on the JS side we then have to find a way to turn the string "window.CanvasRenderingContext2D.prototype" into the object window.CanvasRenderingContext2D.prototype.

Whilst it may be possible to do it via JSON being passed to the webext, I don't think it should hold up this PR. Users still write a python list or JSON, it's just the transport between python and webext that's finecky and I don't think this PR needs to guarantee that that interface will be stable.

Copy link
Collaborator

@englehardt englehardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ with a few comments and nits. Thanks for taking this all the way and adding some thorough tests for the changes!

Whilst it may be possible to do it via JSON being passed to the webext, I don't think it should hold up this PR. Users still write a python list or JSON, it's just the transport between python and webext that's finecky and I don't think this PR needs to guarantee that that interface will be stable.

Okay that makes sense. I guess we could eval those strings, but that seems messy. Either way, no need to hold up this PR.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@@ -45,11 +45,13 @@ export interface WebNavigationOnCommittedEventDetails
transitionQualifiers?: TransitionQualifier[];
}

/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why comment these out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. Was planning to delete them. They're not used anywhere so was just cleaning up. That can be separate PR.

@@ -5,7 +5,7 @@

curdir = os.path.dirname(os.path.realpath(__file__))
schema_path = os.path.join(
curdir, 'js_instrumentation', 'js_instrument_modules.schema'
curdir, os.pardir, 'schemas', 'js_instrument_settings.schema.json'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is .schema.json the accepted naming pattern, or should this be _schema.json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the naming pattern that the jsonschema2md uses by default. i'm not wed to it, I don't think it's a big change to jsonschema2md call to fix it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the default that's fine. No need to change, I just wanted to make sure this wasn't a bug.

package.json Outdated Show resolved Hide resolved
test/openwpm_jstest.py Show resolved Hide resolved
test/test_js_instrument.py Show resolved Hide resolved
test/test_js_instrument.py Outdated Show resolved Hide resolved
test/test_js_instrument_py.py Show resolved Hide resolved
birdsarah and others added 6 commits July 1, 2020 23:20
Co-authored-by: Steven Englehardt <englehardt@gmail.com>
* Add title
* Fix typo in mac-osx hyperlink
We're not using the js in two htmls now, so unify like other test files
* pyside test must instrument browser apis
* add more to readme to clarify instrumenting
@birdsarah
Copy link
Contributor Author

@englehardt this is done. The bad test needed a bigger tweak (#642 (comment)) but nothing wild.

@englehardt englehardt merged commit aefa048 into master Jul 8, 2020
@englehardt englehardt deleted the js-instrumentation branch July 8, 2020 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instrument arbitrary JS modules
3 participants