Skip to content
This repository has been archived by the owner on Dec 11, 2019. It is now read-only.

refactor isThirdPartyHost and block referer based on base domain #13820

Merged
merged 1 commit into from
Apr 17, 2018

Conversation

diracdeltas
Copy link
Member

@diracdeltas diracdeltas commented Apr 12, 2018

fix #13779
fix #11778
also removes TODO for isThirdPartyHost to handle IP addresses and adds more tests

Submitter Checklist:

  • Submitted a ticket for my issue if one did not already exist.
  • Used Github auto-closing keywords in the commit message.
  • Added/updated tests for this change (for new code or code which already has tests).
  • Ran git rebase -i to squash commits (if needed).
  • Tagged reviewers and labelled the pull request as needed.
  • Request a security/privacy review as needed. (Ask a Brave employee to help if you cannot access this document.)

Test Plan:

  1. unit tests pass
  2. open Brave, make sure cookie setting is block all or block 3rd party
  3. go to docs.google.com and login
  4. documents should appear
  5. open devtools and go to 'network' tab
  6. on requests to non-google.com domains like gstatic.com, the referer
    header should be 'https://gstatic.com' or whatever the domain is, instead of
    'https://docs.google.com...'
  7. turn cookie setting to 'allow all'
  8. repeat step 6. now you should see some requests to third party domains where the referer header is 'https://docs.google.com...'

Reviewer Checklist:

  • Request a security/privacy review as needed if one was not already requested.

Tests

  • Adequate test coverage exists to prevent regressions
  • Tests should be independent and work correctly when run individually or as a suite ref
  • New files have MPL2 license header

fix #13779
fix #13779
also removes TODO for isThirdPartyHost to handle IP addresses and adds
tests

Test plan:
1. unit tests pass
2. open Brave, make sure cookie setting is block all or block 3rd party
3. go to docs.google.com and login
4. documents should appear
5. open devtools and go to 'network' tab
6. on a request to a non-google.com domain like gstatic.com, the referer
   header should be 'https://gstatic.com' or whatever the domain is, instead of
   'https://docs.google.com...'
7. turn cookie setting to 'allow all'
8. repeat step 6. now the referer header should be
   'https://docs.google.com...'
@diracdeltas diracdeltas self-assigned this Apr 12, 2018
@diracdeltas diracdeltas requested review from bbondy and bsclifton April 12, 2018 23:18
@@ -44,6 +47,11 @@ module.exports.getBaseDomain = function (hostname) {
return baseDomain
}

// If the hostname is a TLD, return '' for the base domain
if (hostname in publicSuffixes) {
return ''
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously for an effective TLD like co.uk, this would return co.uk as the base domain. that seems incorrect so i just set it to the empty string.

note that some eTLDs like github.io are resolvable

@codecov-io
Copy link

codecov-io commented Apr 12, 2018

Codecov Report

Merging #13820 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #13820      +/-   ##
==========================================
+ Coverage   56.54%    56.6%   +0.06%     
==========================================
  Files         283      283              
  Lines       28798    28808      +10     
  Branches     4774     4776       +2     
==========================================
+ Hits        16284    16307      +23     
+ Misses      12514    12501      -13
Flag Coverage Δ
#unittest 56.6% <100%> (+0.06%) ⬆️
Impacted Files Coverage Δ
js/lib/baseDomain.js 92.98% <100%> (+22.61%) ⬆️
app/filtering.js 17.79% <100%> (+0.25%) ⬆️
app/browser/isThirdPartyHost.js 100% <100%> (+7.14%) ⬆️

}

if (ip.isV4Format(host1) || ip.isV4Format(host2)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&&?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jumde the problem with && is that getBaseDomain thinks google.com.0.1 and 127.0.0.1 have the same base domain. so if either is an IPv4 address, we should do literal string comparison.

Copy link
Member

@darkdh darkdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking using efficient GURL::DomainIs since we can access it through muon url bindings but it behaves just like our current behavior

@diracdeltas
Copy link
Member Author

++ for doing this in muon/chromium in the future.

@riastradh-brave
Copy link
Contributor

Fixes #13779 twice, or is there another issue you meant to cite?

@riastradh-brave
Copy link
Contributor

Is there a maintained upstream? Should we submit this upstream? Should we be pulling updates from it?

Copy link
Contributor

@riastradh-brave riastradh-brave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me. Some noncritical suggestions about comments and tests.

})
it('Handles IPv4', function () {
assert.ok(isThirdPartyHost('172.217.6.46', '173.217.6.46'))
assert.ok(!isThirdPartyHost('172.217.6.46', '172.217.6.46'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test equivalent decimal representations of IP addresses?

No harm if we say that 127.1 and 127.0.0.1, or 1.2.3.4 and 001.002.003.004, are distinct, but we should be explicitly intentional about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clear from https://github.com/brave/browser-laptop/pull/13820/files#diff-3e19062054041e338c28b2593b82e8d9R23 that this is not intended to handle IP representations that are equivalent but don't have the same string representation for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or are you suggesting adding a test for the equivalent case returning third party so that it's explicit based on tests what the behavior should be? i can do that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ probably best to do in a follow-up PR since it's non-critical and adding it here would dismiss the existing reviews

assert.ok(!isThirdPartyHost('[2001:db8:85a3::8a2e:370:7334]', '[2001:db8:85a3::8a2e:370:7334]'))
assert.ok(!isThirdPartyHost('2001:db8:85a3::8a2e:370:7334', '2001:db8:85a3::8a2e:370:7334'))
assert.ok(isThirdPartyHost('[2001:db8:85a3::8a2e:370:7334]', '[2002:db8:85a3::8a2e:370:7334]'))
assert.ok(isThirdPartyHost('2001:db8:85a3::8a2e:370:7334', '2002:db8:85a3::8a2e:370:7334'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test v4-mapped IPv6 addresses vs equivalent IPv4 addresses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

* Checks if two hosts are third party. Subdomains count as first-party to the
* parent domain. Uses hostname (no port).
* @param {host1} string - First hostname to compare
* @param {host2} string - Second hostname to compare
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: This is supposed to be the complement of an equivalence relation now, yes? I.e., its complement is supposed to be transitive, reflexive, and symmetric? Can this be noted in the comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure the previous implementation's complement was also transitive, reflexive, and symmetric (at least it was supposed to be), but this refactoring makes those properties more obvious

@diracdeltas
Copy link
Member Author

@riastradh-brave updated the issue to not fix the same issue twice, good catch

@diracdeltas
Copy link
Member Author

Is there a maintained upstream? Should we submit this upstream? Should we be pulling updates from it?

tools/updatepsl.sh updates the public suffix list from https://publicsuffix.org/list/public_suffix_list.dat.

WRT the code itself, I originally copied it from https://github.com/EFForg/privacybadger/blob/master/src/lib/basedomain.js (because it was a much faster implementation than any of the NPM libraries I found to do the same thing). It looks like they already have an implementation of the IP checks which doesn't depend on Node, so our patch would be redundant. I could pull their fix but it seemed cleaner to use NPM's IP parsing library (which we already use elsewhere) instead of regexes.

@diracdeltas diracdeltas added this to the 0.24.x (Nightly Channel) milestone Apr 17, 2018
@diracdeltas diracdeltas merged commit 5a77c8b into master Apr 17, 2018
@diracdeltas
Copy link
Member Author

master / 0.24.x: 5a77c8b

@diracdeltas diracdeltas deleted the fix/etld branch April 17, 2018 00:22
bsclifton pushed a commit that referenced this pull request May 2, 2018
refactor isThirdPartyHost and block referer based on base domain
@bsclifton
Copy link
Member

Uplifted to 0.23.x with 21b9274

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

block referer based on eTLD+1 instead of full origin Google Docs never load files
6 participants