Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fresh top500 list from moz.com #10

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

oh2fih
Copy link

@oh2fih oh2fih commented Jul 17, 2024

The current top500 site list is rather old (from Feb 8, 2022). This PR converts the static list into a dynamic one.

  • Refactor sites.py as a class that handles cache of top500 sites in top500.json.
    • Refresh top500 list if the cache is older than 18 hours (64800 seconds).
    • Use cached list as a backup if the download fails or returns unexpected data.
  • Fix logging.
    • Do not repeat old messages & errors on new log lines.
    • Do not print anything directly but always as JSON through the log handler.

oh2fih added 3 commits July 17, 2024 16:09
Refresh top500 list every time sites.py is imported.
Keep cached list in top500.json and use it as a backup.
@oh2fih oh2fih force-pushed the dynamic-top500-sites branch from 0a69e3f to 4679524 Compare July 17, 2024 21:14
Since the GitHub workflow updates the site daily at the same time,
the CACHE_MAX_AGE has to be lower than 24 hours.
@oh2fih
Copy link
Author

oh2fih commented Aug 24, 2024

@OllieJC It is good to see findsecuritycontacts.com is updating daily again! Could you check this PR out to keep the domain list up-to-date, too?

I just checked (with python3 sites.py) that the top 500 list from Jul 17 (top500.json) in this PR still current.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant