"No more counting dollars, we'll be counting stars"
~ OneRepublic - https://youtu.be/hT_nvWreIhg?t=15s
This mini-project helps us track ⭐ for projects on GitHub and answer interesting questions about the data.
A big part of achieving our
goals
in DWYL requires tracking certain
"Metrics"
so that we can see trends and derive actionable insights
from our data.
GitHub ⭐ are one of the main (quantitative) measures we have for discovering interesting Open Source projects on GitHub.
Counting ⭐ helps us know if the
learning materials
we are producing are useful to other people.1
Encouraging people to ⭐ our projects is important for "exposure",
and is you can help us with if you aren't already...
The more people ⭐ dwyl repos the more it will help
their friends/followers to discover our useful projects/content.
The other benefit of tracking ⭐ on our projects is that it allows us to understand who is interested in our work, which allows us to discover new & interesting people.
Finally, we think that the GitHub API for ⭐ is not great because for example it does not allow us to answer interesting questions such as:
find all people who are members of an org who have starred xyz project
or
who in the org has the most/least stars
or
which project in the org increased/decreased its stars most this week
So we decided to solve this mini-challenge with some code.
“When you have mastered numbers, you will in fact no longer be reading numbers, any more than you read words when reading books. You will be reading meanings.” ~ W.E.B. Du Bois
GitHub lets it's "users" ⭐ projects (repositories) in order to "favourite" or "bookmark" them.
Both the person starring the project (that interests them) and the rest of community can see the stars which then act as a signal of "interesting" or even "quality".
For example Natalia has the following projects starred: https://github.com/NataliaLKB?tab=stars
Some people use their stars "scarcely", which is ok because they may only want to "bookmark" a handful of things on GitHub. However other "power users" ⭐ many things ... e.g:
https://github.com/feross?tab=stars&q=summer
The immediate question we are going to answer with this project is:
how many distinct people have found dwyl code/tutorials useful
The answer is:
See "how" section below for exactly how this number is derived.
How would you go about tackling this challenge...?
We wrote a few scripts to fetch the data from GitHub:
You will need
node.js
installed on your computer, if you don't already have it, go to: https://nodejs.org/en/download/
Run the following commands:
npm install # install dependencies
npm run crawl # crawls all pages on dwyl's github for stargazers
npm run combine # combines all stargazers into
npm run unique # tallies how many unique people have starred a dwyl repo
npm run learners # just the people who have starred a learn-x repo
or run a single command:
npm run all
The output will be 4 files:
stargazers.csv
- the list of all repos and people who have starred themunique.csv
- unique people that have starred any dwyl repounique_learners.csv
- unique people who have starred alearn-x
repo
npm install && npm run local
You should see something like this:
Sorting the avatars by the color of the avatar requires a little "magic". We first need to download all the profile images so that our script can "analyse" them.
Run this script (and go for a walk/coffee):
npm run people
Note: this will take about 50 minutes to run because we don't want to "DDOS" GitHub with 6k requests at once (and get our IP address blocked!!)
Run this script and go for a quick bathroom break:
npm run people
Note: this will take about 20 minutes to run Again because we don't want to flood GitHub CDN with 6k requests at once.
Fine the line that looks like this in faces.js
:
// var img_base = '/data/img/'; // get avatar from localhost
var img_base = 'https://avatars2.githubusercontent.com/u/';
comment out the github url and un-comment the relative one.
do the same thing again for the lines:
// var src = img_base + uid + '.jpg'; // get avatar from localhost
var src = img_base + uid + '?v=3&s=200'; // GET images from GitHub
Now when you run npm run local
,
wait 60 seconds for the page to load all the images ...
then once they are loaded they will be sorted into a rainbow!
- "One Metric that Matters": http://leananalyticsbook.com/one-metric-that-matters/ discuss at: dwyl/hq#149
- Actionable Insights: http://online-metrics.com/actionable-insights/
P.S: we prefer counting the other type stars, but for now this is a great start. 😉
1Note: while dwyl's "mission" is not simply to produce good learning materials, we think that having good learning tutorials is essential for our mission! If other people find our tutorials useful and they contribute to improving them, then everyone benefits not just the members of the dwyl team building the dwyl "product" #WinWin