Skip to content

Import river position data and make available via the MetaCPAN API #460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
neilb opened this issue Apr 23, 2016 · 7 comments
Closed

Import river position data and make available via the MetaCPAN API #460

neilb opened this issue Apr 23, 2016 · 7 comments
Assignees

Comments

@neilb
Copy link

neilb commented Apr 23, 2016

In the River discussion at the QAH 2016 we agreed one of the things that would help most is if a dist's river position were available via the MetaCPAN API, and also displayed on a dist's home page on MetaCPAN (I'll raise a separate ticket for that).

I'm calculating river position for all dists every week, and have agreed to put this data somewhere. After a chat with @oalders we agreed with that initially this will be simple JSON data like the following:

[
    {
        "dist":      "System-Command",
        "total":     92,
        "immediate": 4,
        "bucket",    2
    },
    {
        "d": "Text-Markdown",
        "t": 92,
        "i": 56,
        "b", 2
    }
]

Here's what those fields are:

  • immediate is the number of immediate downstream dependents (considering required, non-developer prereqs) that the dist has
  • total is the total number of downstream dependents
  • bucket is a number between 0 and 5, the logarithmic binning of total. 0 means no downstream deps, and 5 is the head of the river.

JSON always seems stupidly verbose, so for internal things I tend to use shorter names, like the second example above. Or would you rather go with verbosity?

Once we've agreed on the above format, I'll decide where to publish it so you can grab a first version, then I'll set up something to ensure it's getting regularly updated.

@oalders
Copy link
Member

oalders commented Apr 23, 2016

I'd rather go with verbose, I think. We have to deal with a bunch of different web services and making keys obvious makes for less head scratching. If you're worried about file size, you could compress it.

@jberger
Copy link
Contributor

jberger commented Apr 24, 2016

#462 was merged

@neilb
Copy link
Author

neilb commented Apr 26, 2016

Here's a first version of the river data:

http://neilb.org/river-of-cpan.json.gz

Need to think where to put this ongoing. Maybe in github, then we'll have history data, but the repo (history) will end up getting very big.

@oalders
Copy link
Member

oalders commented Apr 28, 2016

@jberger did you want to look at tweaking your work to deal with the compressed file?

@oalders oalders reopened this Apr 28, 2016
@jberger
Copy link
Contributor

jberger commented Apr 28, 2016

ah good point, yeah thanks

@jberger
Copy link
Contributor

jberger commented Apr 28, 2016

With Mojo::UserAgent if I munge the headers it will just transparently decode the gzip. I'm investigating if the similar behavior is possible with LWP. If it works then we should see if we can get @neilb to serve the file with Content-Encoding: gzip, Content-Type: application/json

@oalders
Copy link
Member

oalders commented Apr 28, 2016

At the very least, it should work with WWW::Mechanize, so feel free to switch to that.

jberger added a commit that referenced this issue Apr 30, 2016
This was referenced Apr 30, 2016
jberger added a commit that referenced this issue May 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants