-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLORA-223] Progressive index import #387
Conversation
f90853f
to
8fdfa3a
Compare
@Mergifyio rebase |
❌ Base branch update has failedGit reported the following error:
err-code: 8A708 |
@@ -0,0 +1,6 @@ | |||
create table if not exists metadata ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have instead a table package_indexes
with the following:
- package_index uuid primary key;
- repository text not null;
- timestamp timestamptz not null;
And an index on the repository field.
:-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does having a repository make sense quite yet? At some point we'll want to point to a repo and then will make sense, but right now we're importing from a directory or an index file, which doesn't have a repository attached to it. Maybe it makes the most sense to default to hackage.haskell.org
and perhaps have an argument to the mirror-cli for specifying the repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, at the moment the only repo is Hackage so you can default to it (in the Haskell code, not the SQL).
A later workflow for CHaP will have to make the distinction between the two, but we're in the clear for now :)
8fdfa3a
to
39ed03f
Compare
@RaoulHC do you want to mark this PR as ready for review? Or are there other stuff you want done before the merge? |
It could do with some stuff in the test suite, and maybe it wants a flag when importing for whether to update the timestamp table or not, but I think the main logic is good for review. Will try and finish off those bits this week. |
This should make it a lot easier to do incremental mirroring later.
39ed03f
to
ffb15ad
Compare
Right think this is all good for review, I've added an extra field to releases so we can check the provenance of where packages came from which helps check what we want to check up to. Will update the changelog if you're happy with this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok overall I'm very satisfied with this PR. Only one point to resolve. :)
@@ -0,0 +1,2 @@ | |||
alter table releases | |||
add repository text default null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought a foreign key to the package_index_id would be the most appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. This does mean we have to add the package index without a timestamp before we start importing releases else you get foreign key violations. We could take the approach of adding/updating the timestamp as we import each release but I think this is probably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it's not stupid to have package indexes that we could know about before having importing them, especially if we want to provision the database with public stuff like hackage & chap (and others in the future).
ffb15ad
to
dcccd48
Compare
Now when mirroring from an index tarball we'll only import packages newer than this timestamp from the same repository. We store the repository provenance in the release table, which means the timestamp is set to only the relevant packages.
dcccd48
to
942d2fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you very much, it's highly appreciated and extremely valuable.
Proposed changes
This adds an option to import directly from index tarballs rather than needing to unpack them, and then uses the timestamp to avoid re importing anything that is older and should have already been importing. It might be that in the future we want to use the timestamp to grab only that latest part of the index tarball, and avoid checking each cabal file for its timestamp, but we'll still want to be storing the timestamp.
Needs some tests and I'm unsure of the metadata table, but could do with some review in its current form.
I also have a commit to update the freeze file which was out of sync with horizon, but this may be superseded by #386.
Contributor checklist