Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the tldts package for identifying first and third-party domains. #49

Merged
merged 8 commits into from
Jun 30, 2023

Conversation

mohdsayed
Copy link
Collaborator

@mohdsayed mohdsayed commented Jun 27, 2023

Description

Since we consider subdomains as first-party, it crucial to accurately determine whether a URL is a subdomain of the provided URL

Subdomains can have various formats and can be nested to multiple levels. Additionally, different domain name patterns and top-level domains (TLDs) can introduce further complexity.
While a basic regular expression pattern could capture subdomains in simple cases, it may not cover all possible scenarios. It would not account for variations such as internationalized domain names (IDNs), complex TLDs, subdomains with hyphens, or subdomains with different levels.

Example:

Internationalized Domain Names (IDNs):

URL: https://日本.example.com
Subdomain: 日本

Complex TLDs:

URL: https://example.co.uk
Subdomain: Empty (no subdomain)
Subdomains with Hyphens:

URL: https://demo.sub-domain.example.com
Subdomain: demo.sub-domain
Subdomains with Different Levels:

URL: https://sub1.sub2.example.com
Subdomain: sub1.sub2

Therefore it is important to use a library that can handle all of these cases.

Relevant Technical Choices

  • Use tldts package
  • Add unit tests.
  • Add missing meta tag for devtools.html

Testing Instructions

Go to any website (for example https://indianexpress.com/ ) and check if the first-party and third-party classification is correct.

@mohdsayed mohdsayed self-assigned this Jun 27, 2023
@mohdsayed
Copy link
Collaborator Author

@ayushnirwal

third-party-cookie-phaseout says that

"SameSite=None; Secure;" from the blocked cookies in the “Cookies” tab. These are third-party cookies.

I am thinking if we use SameSite=None; Secure; to designate a cookie as third-party in the response header, we will still have to depend on URL comparison to identify a third-party cookie in request header.

@mohdsayed
Copy link
Collaborator Author

mohdsayed commented Jun 28, 2023

I think developer.mozilla.org gives a good definition

If the cookie domain and scheme match the current page, the cookie is considered to be from the same site as the page, and is referred to as a first-party cookie.

If the domain and scheme are different, the cookie is not considered to be from the same site, and is referred to as a third-party cookie.

@mohdsayed mohdsayed changed the title [WIP] Implement the tldts package for identifying first and third-party domains. Implement the tldts package for identifying first and third-party domains. Jun 29, 2023
@mohdsayed mohdsayed marked this pull request as ready for review June 29, 2023 01:19
@mohdsayed mohdsayed requested a review from ayushnirwal June 29, 2023 01:19
@amedina amedina added the Cookies Issue/feature related to Cookies label Jun 29, 2023
@ayushnirwal
Copy link
Contributor

ayushnirwal commented Jun 30, 2023

I am thinking if we use SameSite=None; Secure; to designate a cookie as third-party in the response header, we will still have to depend on URL comparison to identify a third-party cookie in request header.

We can just store the name of the cookie from the request headers and query the SameSite and secure attribute from chrome.cookies.get

But that would not be enough for example take the cookie '1P_JAR' for example. I have seen this cookie used on docs.google.com and edition.cnn.com. This cookie has the SameSite=None; Secure; has its attribute, but on one site its first party and on the other its third party. To actually on for a site if a cookie is 3p comparison to the site's top level domain is needed.

Copy link
Contributor

@ayushnirwal ayushnirwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just needs one more test case as mentioned above.

@mohdsayed mohdsayed merged commit 6c5c2f6 into develop Jun 30, 2023
@mohdsayed mohdsayed deleted the fix/third-party-domain branch June 30, 2023 06:43
@mohdsayed mohdsayed mentioned this pull request Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cookies Issue/feature related to Cookies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants