Skip to content

An auto-refreshing dataset of major news domains and their X (formerly Twitter) accounts, complete with real-time stats like follower counts and engagement metrics. Made for tracking media trends and analytics at scale.

License

Notifications You must be signed in to change notification settings

charlescol/news-domains-X-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub last commit GitHub repo size GitHub issues GitHub license

US & International News Domains Twitter Stats Tracker

Project Overview

This project aims to compile a list of major news domains along with their associated X (formerly Twitter) accounts. The repository includes auto-refreshing job to fetch real-time statistics related to these X accounts, such as follower count, tweet activity, and engagement metrics. You can find the dataset in news-domains-x.csv.

  • Top 100 accounts (sorted by followers) are updated daily.
  • The other records are updated daily in batches of 300.

The current dataset contains around 4000 accounts collected from multiple sources and will be continuously enriched and updated.

This project leverages multiple free-tier accounts of the X API to implement its refreshing strategy. Each account can retrieve data for up to 100 accounts daily, a limitation imposed by the X API.


Auto-Update Process

The project leverages GitHub Actions to automatically update the statistics for tracked X accounts:

Workflow:

  1. Job 1 (Real-time priority refresh):

    • Updates the top 100 most-followed accounts daily.
  2. Job 2 & Job 3 & Job 4 (Incremental updates):

    • These jobs run in parallel to process accounts in batches of 100. With 3 tokens currently available, records are updated daily in batches of 300.
    • The progress is tracked using a JSON file (state/progress.json) to ensure no accounts are skipped.
  3. Reordering and Cleaning:

    • Once the entire list has been processed, it is re-sorted based on the number of followers.
    • Inactive or suspended accounts are removed automatically.
  4. Commit to GitHub:

    • The updated data is committed back to the repository, ensuring the latest statistics are always available.

Current Data Sources

The data currently used in this project has been sourced from the following repositories:

  1. ercexpo/us-news-domains
  2. palewire/news-homepages

More sources will be added over time.


Contributing

We welcome contributions to expand the dataset and improve automation workflows. Feel free to submit issues and pull requests.

About

An auto-refreshing dataset of major news domains and their X (formerly Twitter) accounts, complete with real-time stats like follower counts and engagement metrics. Made for tracking media trends and analytics at scale.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages