Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do stats collection from Netlify in aggregate #13

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kingdonb
Copy link
Member

This is the laziest way to get all the Top Pages info, quick Ruby script I threw together.

Future enhancements could include going back to get older data, or some improved slicing and dicing/visualization of the data. Unfortunately what we've learned from looking over the data this once is that Netlify only provides analytics about the top 10 pages each day.

So unless something made it to the Top-10 list on at least one given day, it won't show up on this list at all.

Still, better than nothing! Here is the output from running this script today, in the future I imagine we have something better to report on aggregate statistics over arbitrary spans and this goes away completely.

$ ./do-netlify-stats-aggr.sh
Switched to branch 'netlify-stats'
Your branch is up to date with 'upstream/netlify-stats'.
Already up to date.
Switched to branch 'main'
...
[SNIP] - removed a bunch of noise
...
Previous HEAD position was f1cef3ee [ci skip] fluxcd.io stats from netlify-analytics
HEAD is now at 92ff87e4 [ci skip] fluxcd.io stats from netlify-analytics


{
                                                  "/" => 80339,
                                             "/docs/" => 24987,
                                "/docs/installation/" => 18940,
                                 "/docs/get-started/" => 16796,
          "/docs/components/kustomize/kustomization/" => 11653,
                "/docs/components/helm/helmreleases/" => 10913,
                         "/docs/guides/image-update/" => 8408,
                         "/docs/guides/helmreleases/" => 7402,
                                      "/docs/guides/" => 5721,
                 "/docs/guides/repository-structure/" => 4394,
                        "/docs/guides/notifications/" => 3509,
                         "/docs/guides/mozilla-sops/" => 3334,
                                             "/blog/" => 1714,
           "/docs/components/source/gitrepositories/" => 1395,
                        "/blog/2022/04/march-update/" => 790,
                "/blog/2022/04/contributing-to-flux/" => 688,
    "/blog/2022/03/flagger-adds-gateway-api-support/" => 503,
       "/blog/2022/03/flux-puts-the-git-into-gitops/" => 290,
                                   "/docs/use-cases/" => 190,
                                          "/roadmap/" => 102
}

---
days_counted: 43
Previous HEAD position was 92ff87e4 [ci skip] fluxcd.io stats from netlify-analytics
Switched to branch 'main'
Your branch is up to date with 'upstream/main'.

We can run this script again tomorrow and it will show 44 days of accumulated data, the day after we'd get 45... it's all based on when we started taking snapshots of this information, as it scans through each git commit we've made through automation every day, where we got the numbers from the netlify "pages" metric.

I think that Netlify actually keeps the data for longer than 30 days, we just need a way to add them to the git history that won't be too impossible to sort out later, (we could just add a script to go back and poll the rest of the days we've missed since the first of the year... depending on when we did enable analytics and how interested we are in that historical data!)

I realize this is not the easiest way to consume this information but we have limited capabilities in terms of what we can access from this API, what we've planned for, and what else has been done for us upstream already at NiklasMerz/netlify-analytics-collector. Most of this work was done already there.

The next obvious step to enhance would be to see how much further back we can go. I think the 30 day limitation is imposed by the GitHub action and how it works, not by Netlify's upstream API, (but I can't be sure without spending some more time on it and trying to write my own API client next. That doesn't sound like too much work, maybe one for next week...)

This is the laziest way to get all the Top Pages info, future
enhancements could include going back to get older data, or some
improved slicing and dicing/visualization of the data.

Signed-off-by: Kingdon Barrett <kingdon@weave.works>
@kingdonb
Copy link
Member Author

I think the data will be more interesting if it's tagged and posted to a database of some kind, so you can drill down into a particular week and find out what moved the needle during that week. Happy to take any requests if there are ideas for how to make this better.

git checkout netlify-stats && git pull --ff-only
git checkout main && git pull --ff-only

bundle exec --gemfile collect-stats/Gemfile collect-stats/collect-stats.rb
Copy link
Member Author

@kingdonb kingdonb Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're testing and this hasn't been merged yet, you'll want to run this bundle exec command instead of the script (as it won't be present in the main branch yet)

It does a bunch of git checkout stuff and branch switching internally, so I just added these to avoid leaving the clone in an unpredictable or detached state after the script runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant