-
-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Community learning discussion draft #2715
base: master
Are you sure you want to change the base?
Conversation
…is enabled - Create new SettingsMap variable, shareLearning, and default to false - pass tab_id to _recordPrevalence so that it can determine whether learning is enabled on a tab - update isLearningEnabled so that it returns true if either local or community learning is enabled; let _recordPrevalence figure out which kind of learning should be done - Create stub for function to share data with remote server
Since _recordPrevalence now checks for local learning enabled internally, update BadgerStorage.merge() to modify the blocklist directly.
update some comments
This is great! And much appreciated how thought out it is from the get. Starting at high level, definitely agreed that this DB for now should be used just for observational purposes, and no blocker lists should be made from it... yet. |
@@ -173,6 +173,14 @@ | |||
"message": "Learn in Private/Incognito windows", | |||
"description": "Checkbox label on the general settings page" | |||
}, | |||
"options_community_learning_setting": { | |||
"message": "Enable community learning and share data about trackers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changing this message to "Enable community learning and share data about trackers" would answer some questions that user might have without them having to read the lengthier warning message (which is great and should still be included on top of this)
if (Math.random() < constants.CL_PROBABILITY) { | ||
// check if we've shared this tracker recently | ||
// note that this check comes after checking against the snitch map | ||
let tr_str = page_host + '+' + tracker_host + '+' + tracker_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some string sanitization and/or sanity checking the input values in this method wouldn't hurt, since they're about to get launched off in a POST
"message": "Enable community learning and share data about trackers", | ||
"description": "Checkbox label on the general settings page" | ||
}, | ||
"options_community_learning_warning": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the link that explains what community learning is (and the decision making that led to it) be included in this message or the one above?
@@ -40,6 +40,17 @@ var exports = { | |||
TRACKING_THRESHOLD: 3, | |||
MAX_COOKIE_ENTROPY: 12, | |||
|
|||
// The max amount of time (in milliseconds) that PB will wait before sharing a | |||
// tracking action with EFF for community learning | |||
MAX_CL_WAIT_TIME: 5 * 60 * 1000, // five minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on reducing network load with the reporting timeouts, but I'm curious why 5 minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completely arbitrary!
e9a532f
to
86432ed
Compare
#1299
This is the first stab at adding community learning to Privacy Badger -- giving users the option to share de-identified information about the trackers they collect with EFF. Users can now choose to opt-in to community learning. Then, whenever Privacy Badger observes a new tracker for the first time, it will share the following information:
Other things:
constants.CL_PROBABILITY
. Right now it's set to 1.0, meaning every action will be reported.constants.MAX_CL_WAIT_TIME
, currently set to 5 minutes.HeuristicBlocker.previouslySharedTrackers
. Tracking actions that are already present in the cache will not be reported again. Obviously, when the user restarts their browser, the cache will be cleared, but the goal is just to prevent crazy amounts of dupes. The size of the cache is capped byconstants.CL_CACHE_SIZE
.This PR makes requests to "localhost:8080." If you want, you can set up a toy SQL server to test the logging.
On the server side, tracking actions will be stored in a SQL table that looks like this:
The server side is more up in the air, but for now, IP addresses and other identifying information will not be stored. Before this is deployed, we'll have some kind of privacy policy explaining exactly how logs etc. will be handled. We're still thinking through how best to prevent malicious reporting or other kinds of griefing.
We're thinking that the first version of community learning will be for informational purposes only -- the CL database will not automatically populate any block lists. This should reduce the incentive for bad actors to populate the database with bad data. We want to use the data to identify deficiencies in Badger Sett: what domains are we missing? on which domains do we need to crawl more widely? are there places where trackers don't appear unless a user logs in?
Eventually, it might make sense to auto-generate a community learning list, but for now, we just want to see what kind of data people are generating.
Feedback, criticism, concern are welcome. What do you think?