PyPi downloads badge showing "no longer available" #1671

niccokunzmann · 2018-05-03T20:25:22Z

Currently, the PyPi downloads look like this:

But I would like to see the numbers of the downloads.
In some way,

Examples: PyPI:

https://img.shields.io/pypi/dm/Django.svg
https://img.shields.io/pypi/dw/Django.svg
https://img.shields.io/pypi/dd/Django.svg

But these badges work:

https://img.shields.io/pypi/l/Django.svg

Source code:

shields/server.js

Line 2250 in b126b4e

// PyPI integration.

The text was updated successfully, but these errors were encountered:

niccokunzmann · 2018-05-03T20:28:31Z

Searching for the results in https://pypi.org/pypi/crc8/json , I see no way to get them. They are all set to -1. "downloads":-1

It looks like this is due to the new PyPi which does not serve downloads, yet.

niccokunzmann · 2018-05-03T20:43:00Z

It seems like this issue can be taken on once this issue is resolved: pypi/warehouse#699

ale5000-git · 2018-05-04T07:32:15Z

Download statistics on PyPI should now be retrieved using Google's BigQuery.

Is it possible to add them?

chris48s · 2018-05-04T21:12:42Z

PyPI currently don't serve download stats via the API. See #716

It is unclear if they'll ever add it back but as noted in the linked issue, it isn't something the core warehouse team are working on.

I think @espadrine has already looked at the possibility of using BigQuery and decided against it. I'd assume the same limitation applies now.

paulmelnikow · 2018-06-16T21:07:09Z

Would be happy to incubate a new project in the badges org, which could be hosted separately, that would run a nightly job that fetches the bigquery data, perhaps pushes it all to s3 or google cloud storage, and if necessary provides an API. Something like that would be a great service to the community.

chris48s · 2018-06-17T10:33:51Z

If you wanted to take that on as a 'microservice' (i.e: outside the shields codebase and hence not constrained to having to use javascript), I think the most developed wrapper for working with that data is https://github.com/ofek/pypinfo

Maybe a small JSON API which can query on demand but cache the result on S3 (which can handle the invalidation itself with expiration dates/rules) might be a good approach. Cache on demand might be easier than trying to process the entire python package registry every day. Would save fetching a bunch of stuff you don't need..

Its all fun and games until loads of people start using it though. There is a reason PyPA don't host this themselves anymore..

paulmelnikow · 2018-06-19T14:37:34Z

Hmm, yea, these BigQuery charges seem like they would add up quickly.

It's a good point that the vast majority of the data wouldn't be used.

I wonder what a minimal version of this would be. Do people use the monthly badges, weekly, yearly, or total badges the most? Or did they, back when the badges worked. I wish we had per-badge stats.

Based on what pypinfo is outputting, it seems like fetching a single package's data for one day costs the same as all the packages' data for that day. (Which is roughly $.01.) It makes sense not to put resources into processing counts for packages nobody is interested in, though I don't think hitting BigQuery on demand is going to work…

Fetching all the data needed to support yearly, monthly, and daily queries costs about $.50 and is about 22 MB. Those could be refreshed once a day and stored on S3 or Google Cloud ($15/month). The application could snag those files when they change, and put the results into an in-memory database. Seems like something like the Zeit Now OSS plan could handle the load on the order of Shields' requests.

espadrine · 2018-06-20T16:27:00Z

is about 22 MB. Those could be refreshed once a day and stored on S3 or Google Cloud

22 MB? That's smaller than I thought.

With the <$4/mo VPS we use, we have 2GB of RAM and 20 GB of SSD, so we'd have plenty of space to keep that data in RAM and dump it to SSD for reboot persistence.

So we could have that service for $20/mo.

I expect most to not care about the per-day figure, so we can probably stay at roughly $5/mo. Can we afford that?

I wish we had per-badge stats.

(We have an awkward per-badge statistic through the rate limit monitoring API: https://github.com/badges/shields/blob/master/lib/sys/monitor.js#L68.)

We could also have fun with memcached-like systems :) To be honest, I started working on jsonsync because I wanted to synchronize the https://img.shields.io/$analytics/v1 endpoint.

paulmelnikow · 2018-06-20T19:54:43Z

It's nice to talk with you about this stuff!

An aside about RAM: if we do have RAM to spare I would like to use it to bump up the request-handler cache. That's one of the low hanging fruits when it comes to performance boosting. We've observed this based the home page badges and the frequency at which their corresponding API calls are triggered. Since these badges are rendered all the time, they should stay in the cache until expiration, and they seem to be evicted after minutes.

https://github.com/badges/shields/blob/master/lib/request-handler.js#L28

If I recall you had reduced this to avoid OOM conditions, though it would be great to crank it up by 5–10x.

…

For that matter, since you mention the SSD: a persistent backing seems like a great candidate for this kind of caching! The entries are precious, and we don't care about the difference between microseconds and milliseconds. I was just looking at some already-tested key–value stores that can be backed by disk or other cloud storage, for another project.

Preferably we'd sync this across machines, which would give us another 3x boost. Would rsync be a candidate for that?

Aha! So let's hold off on the RAM change and switch that over to a persistent on-disk cache, instead. I'll open a new issue. Curious your thoughts on syncing…

Okay, back to Python. If it seems safe to keep it in memory, let's do that! If setting up rsync is easy, let's use SSD as the backing. Otherwise I would rather use cloud storage, which is still fast enough to load on startup, and would trivially sync across machines. It also means the BigQuery refresh job could trivially run on another cloud provider, which means more people can have access.

Yea, we can afford $5–20. We've gotten some good-sized contributions, and have an expectation of cash flow and also some runway.

I don't think we've ever had daily stats. Weekly on up. I think daily refresh would be nice for reducing latency in the weekly totals and being more transparent. If we update at midnight UTC, our numbers would be easy for anyone to match using their own query.

paulmelnikow · 2018-06-20T20:03:37Z

A CSV with the project name and the monthly download total is 2.2 MB. That's an admittedly dense format, though. Probably it would take up more space as an ES6 Map. Any idea how to measure that?

espadrine · 2018-06-20T22:02:51Z

If I recall you had reduced this to avoid OOM conditions, though it would be great to crank it up by 5–10x

/me shivers

For that matter, since you mention the SSD: a persistent backing seems like a great candidate for this kind of caching!

Sounds good! In this area, and for SSDs, Facebook's RocksDB is the go-to reflex. Facebook uses it as its MySQL backing store, but so does Ceph, CockroachDB, TiDB…

Preferably we'd sync this across machines, which would give us another 3x boost. Would rsync be a candidate for that?

If I understand rsync correctly, its rolling checksum will flag all the file blocks as changed, and cause a whole-file transmission, so it will be equivalent to a rcp (except in the rare case where nothing changed).

Which is fine. We can have a leader that downloads the updates on a clock and batches the writes to its local kv store. It sends the data to its followers that do the same. Each server already knows everyone else's IP address (they are part of secrets.json), and the first server is a natural leader (it is s0, a canadian server).

paternal · 2018-09-07T07:55:33Z

pypistats.org provides such information (day, week, month downloads) via a json api (no need to fetch data from google big query).

chris48s · 2018-09-07T16:10:41Z

Thanks for posting that @paternal - looks like exactly what we need :)

niccokunzmann changed the title ~~Pypi Downloads not available~~ PyPi Downloads not available May 3, 2018

niccokunzmann changed the title ~~PyPi Downloads not available~~ PyPi downloads badge showing "no longer available" May 3, 2018

chris48s added the service-badge New or updated service badge label May 4, 2018

metaodi mentioned this issue Jun 28, 2018

readme download badge appears to be broken metaodi/osmapi#93

Closed

chris48s mentioned this issue Sep 30, 2018

bring back the [PyPI] downloads badges #2131

Merged

chris48s closed this as completed in #2131 Oct 6, 2018

q1blue mentioned this issue Feb 25, 2023

[Snyk] Upgrade ioredis from 5.2.4 to 5.3.0 https-quantumblockchainai-atlassian-net/shields#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyPi downloads badge showing "no longer available" #1671

PyPi downloads badge showing "no longer available" #1671

niccokunzmann commented May 3, 2018

niccokunzmann commented May 3, 2018 •

edited

Loading

niccokunzmann commented May 3, 2018

ale5000-git commented May 4, 2018

chris48s commented May 4, 2018

paulmelnikow commented Jun 16, 2018

chris48s commented Jun 17, 2018

paulmelnikow commented Jun 19, 2018

espadrine commented Jun 20, 2018

paulmelnikow commented Jun 20, 2018

paulmelnikow commented Jun 20, 2018

espadrine commented Jun 20, 2018

paternal commented Sep 7, 2018

chris48s commented Sep 7, 2018

PyPi downloads badge showing "no longer available" #1671

PyPi downloads badge showing "no longer available" #1671

Comments

niccokunzmann commented May 3, 2018

niccokunzmann commented May 3, 2018 • edited Loading

niccokunzmann commented May 3, 2018

ale5000-git commented May 4, 2018

chris48s commented May 4, 2018

paulmelnikow commented Jun 16, 2018

chris48s commented Jun 17, 2018

paulmelnikow commented Jun 19, 2018

espadrine commented Jun 20, 2018

paulmelnikow commented Jun 20, 2018

paulmelnikow commented Jun 20, 2018

espadrine commented Jun 20, 2018

paternal commented Sep 7, 2018

chris48s commented Sep 7, 2018

niccokunzmann commented May 3, 2018 •

edited

Loading