Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSL Private Section Domains WHOIS Checker #2014

Merged
merged 5 commits into from
Jul 22, 2024
Merged

PSL Private Section Domains WHOIS Checker #2014

merged 5 commits into from
Jul 22, 2024

Conversation

groundcat
Copy link
Contributor

@groundcat groundcat commented Jun 27, 2024

This PR is related to #1996. It introduces the tools/private_domains_checker, a Python script created to fetch data from the PSL and check the domain status and expiry dates of the private section domains. It performs WHOIS checks on these domains and saves the results into CSV files for manual review.

Please feel free to make any edits!

The README file has been updated to reflect the usage instructions.

Example CSV outputs from real PSL data:

@groundcat groundcat marked this pull request as ready for review June 27, 2024 05:46
@simon-friedberger
Copy link
Contributor

simon-friedberger commented Jun 28, 2024

I know this would be a big ask but would you be interested in trying to integrating this with the Go validator in https://github.com/publicsuffix/list/tree/master/tools/internal/parser? The idea being, that this can eventually be used to automatically add DNS & whois information to PRs with a Github action. However, that requires a little bit more effort from the parser to determine which sections have changed so only problems in the relevant section can be displayed. But integration would probably just mean providing a function that takes a URL and returns a summary of the status, like "expires in >2y/has expired/...".
This is not super important because the two things do something fairly different but it might still be nice if we could share code here.

Copy link
Contributor

@simon-friedberger simon-friedberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The author of the whois package has declared it unsupported, is there an alternative?

@dnsguru
Copy link
Member

dnsguru commented Jun 28, 2024

This PR is related to #1996. It introduces the tools/private_domains_checker, a Python script created to fetch data from the PSL and check the domain status and expiry dates of the private section domains. It performs WHOIS checks on these domains and saves the results into CSV files for manual review.

Please feel free to make any edits!

The README file has been updated to reflect the usage instructions.

Example CSV outputs from real PSL data:

Separate from the dialog here, I noted in an issue in @groundcat 's repo that identifying the [client|server]Hold status domain names into a separate file would be beneficial. Domains with either of those statuses almost always get that status for a reason that would make the domain something that should not be on the PSL, PLUS the domain would be NXD as those statuses cause the domain name to not be listed with NS delegation in their TLD zone files.

def check_dns_status(domain):
def make_request():
try:
url = f"https://dns.google/resolve?name={domain}&type=NS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google is a good source for this, but I do want to mention that this places authority on to Google - they could theoretically intervene in the resolution process on a given record.

Would we perhaps want to use an array of such resolvers that are randomly selected from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated - Good point about relying solely on Google for DNS, as it centralizes authority and potential intervention risks. Sometimes, it gives errors due to random reasons such as network fluctuations, so perhaps randomly selecting from multiple resolvers is still not error-proof. So, I decided to query two resolvers (Google and Cloudflare) and only take the value if they are consistent across the two: the updated implementation uses both Google and Cloudflare DNS resolvers. It queries both resolvers and only considers the result usable and consistent if both return the same status. If the results are inconsistent or if there are errors, it re-retries up to 5 times.

- Replaced with whois utility and whoisdomain package
- Used two DNS providers to check DNS
- Added checks for expiration (> 2 yrs)
- Added _psl TXT checker
@groundcat
Copy link
Contributor Author

groundcat commented Jun 28, 2024

The author of the whois package has declared it unsupported, is there an alternative?

Thank you for spotting this issue. I have replaced it with the whoisdomain package, which is recommended by the author of the retired whois package.

Update:
I just realized that I was initially using the python-whois package, not the whois package that has been deprecated, even though both were developed by the same author. The python-whois package appears to be under active development. The main difference is that python-whois maintains its own list of whois servers, while the whoisdomain package relies on the Linux whois utility so basically it takes the whois data from the os and parses the information from it. I might use both packages and make one a fallback solution when the other returns null results.

@groundcat
Copy link
Contributor Author

I noted in an issue in @groundcat 's repo that identifying the [client|server]Hold status domain names into a separate file would be beneficial

Thanks for the input. I added a new filter to get a CSV list of domains with any form of hold status, and another filter for CSV files with domains expiring within 2 years. I guess the latter might not be very useful at the moment since a handful of them are expiring less than 2 years and were probably submitted before the requirement was established, so they might not be aware of it, similar to the requirement for keeping the _psl TXT records at all times.

@simon-friedberger
Copy link
Contributor

My stance is that we want to be strict on the _psl DNS entry but lax with the expiration times because it's often impossible to check for us and often impossible to get >2y for the requester.

@simon-friedberger simon-friedberger merged commit ad79d67 into publicsuffix:master Jul 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done or Won't
Development

Successfully merging this pull request may close these issues.

3 participants