Skip to content
This repository has been archived by the owner on Aug 27, 2023. It is now read-only.

Feature request: expose serial/timestamp of last modification #204

Open
GaretJax opened this issue Jan 29, 2019 · 3 comments
Open

Feature request: expose serial/timestamp of last modification #204

GaretJax opened this issue Jan 29, 2019 · 3 comments

Comments

@GaretJax
Copy link

We would like to replace our old devpi instance with pypicloud, but we operate an intelligent cache in front of our indexes which makes use of PyPI and DevPI serials to know if and what needs to be updated.

We don't need a full changelog of events (like PyPI offers), but a simple way to check if we're up to date before initiating a full scan of all packages might come in handy.

The simplest form would be to just store the timestamp of the last upload/deletion and expose it as a response header in all API responses.

@stevearc
Copy link
Owner

stevearc commented Feb 3, 2019

Pypicloud used to do something like this by keeping track of the most recent versions of all uploaded packages. It was removed in one of the larger releases because it was not very useful, and (more importantly) it greatly increased the complexity of keeping the cache in sync with the file storage. In particular, the "graceful reload" logic was nearly unreadable, and had to be duplicated across the different cache types.

If this is just an optimistic check to speed up certain operations, maybe there's a simpler solution with weaker guarantees that might work almost as well but be easier to implement. Could you give me a few more details about your use case? It sounds like you have a bunch of packages, some new, and you want to be able to tell which ones need to be uploaded without doing a full scan?

@GaretJax
Copy link
Author

GaretJax commented Feb 7, 2019

Thanks the comment @stevearc. It would just be an optimistic check, and I would be happy with just a global timestamp of the last modification (last package update/release/removal, but also roles/permissions change) which we can compare with the date of the last synchronization.

We currently operate a proxy which fetches packages from remote indexes (pypi, devpi, pypicloud,...) and compiles wheels for different platforms. For pypi we use the changelog and for devpi the serial. For pypicloud a simple check to decide if we want to scan everything or drop out immediately would be enough (then we could scan every minute or so and would do a full sync only if we know that something might have changed).

We're not planning on hosting thousand of packages on pypicloud, so for now this solution would be more than enough.

@stevearc
Copy link
Owner

I think this could be done without too much trouble. The cache backends would need to track the most recent modification time of each package, but that's more doable if there are weak guarantees.

My normal development computer is getting repaired (I'm on a 6 year old chromebook atm), but I'll take a closer look when it gets back. Alternatively, I could point you to the parts of the code that would need to be changed if you'd like to try to make the modification yourself.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants