Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

Unnecessary prepending of "www." on domains of form "www#." #304

Open
obrien-j opened this issue May 15, 2019 · 3 comments
Open

Unnecessary prepending of "www." on domains of form "www#." #304

obrien-j opened this issue May 15, 2019 · 3 comments

Comments

@obrien-j
Copy link

We noticed recently that we were getting failures for domains with www[0-9]..

Case in point, www1.canada.ca, where pshtt, would recognize the relevant domains, but then sslyze was failing, as it was scanning www.www1.canada.ca

Relevent references to code below. Just curious if others had seen this behaviour, and had a better idea to fix then mine.

if utils.domain_uses_www(domain, cache_dir=cache_dir):

which calls
if domain.startswith("www."):

Suggest a basic change to drop the "." in the startswith condition.

# Check whether we have HTTP behavior data cached for a domain.
# If so, check if we know it canonically prepends 'www'.
def domain_uses_www(domain, cache_dir="./cache"):
    # Don't prepend www to www.
    if domain.startswith("www."):
        return False

Thoughts?

@obrien-j
Copy link
Author

Alternatively, the code snippet below, is also likely broken, or i'm just sleep deprived.

return (

example: Canonical URL : https://www1.canada.ca

Return (False or True) == True, which then prepends www. .......

@obrien-j
Copy link
Author

line 580-582. :/

@echudow
Copy link
Collaborator

echudow commented May 22, 2019

If I'm reading it correctly, it's just looking to see if the canonical endpoint starts with www. in which case the calling function may want to prepend www. when checking whatever it is checking for, so I think the fix should be to change lines 581 - 582 to explicitly check for the . after the www:

    return (
        url.startswith("http://www.") or
        url.startswith("https://www.")
    )

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants