Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLORA-223] Progressive index import #387

Merged
merged 5 commits into from
Jun 26, 2023

Conversation

RaoulHC
Copy link
Contributor

@RaoulHC RaoulHC commented Jun 10, 2023

Proposed changes

This adds an option to import directly from index tarballs rather than needing to unpack them, and then uses the timestamp to avoid re importing anything that is older and should have already been importing. It might be that in the future we want to use the timestamp to grab only that latest part of the index tarball, and avoid checking each cabal file for its timestamp, but we'll still want to be storing the timestamp.

Needs some tests and I'm unsure of the metadata table, but could do with some review in its current form.

I also have a commit to update the freeze file which was out of sync with horizon, but this may be superseded by #386.

Contributor checklist

  • My PR is related to <insert ticket number>
  • I have read and understood the CONTRIBUTING guide
  • I have inserted my change and a link to this PR in the CHANGELOG

@RaoulHC RaoulHC changed the title [Flora-223] Progressive index import [FLORA-223] Progressive index import Jun 10, 2023
@tchoutri
Copy link
Contributor

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Jun 10, 2023

rebase

❌ Base branch update has failed

Git reported the following error:

Rebasing (1/5)
Auto-merging cabal.project.freeze
CONFLICT (content): Merge conflict in cabal.project.freeze
error: could not apply 149ec95... Update freeze file
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 149ec95... Update freeze file

err-code: 8A708

@@ -0,0 +1,6 @@
create table if not exists metadata (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have instead a table package_indexes with the following:

  • package_index uuid primary key;
  • repository text not null;
  • timestamp timestamptz not null;

And an index on the repository field.

:-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does having a repository make sense quite yet? At some point we'll want to point to a repo and then will make sense, but right now we're importing from a directory or an index file, which doesn't have a repository attached to it. Maybe it makes the most sense to default to hackage.haskell.org and perhaps have an argument to the mirror-cli for specifying the repository?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, at the moment the only repo is Hackage so you can default to it (in the Haskell code, not the SQL).
A later workflow for CHaP will have to make the distinction between the two, but we're in the clear for now :)

@tchoutri
Copy link
Contributor

@RaoulHC do you want to mark this PR as ready for review? Or are there other stuff you want done before the merge?

@RaoulHC
Copy link
Contributor Author

RaoulHC commented Jun 19, 2023

@RaoulHC do you want to mark this PR as ready for review? Or are there other stuff you want done before the merge?

It could do with some stuff in the test suite, and maybe it wants a flag when importing for whether to update the timestamp table or not, but I think the main logic is good for review. Will try and finish off those bits this week.

@RaoulHC RaoulHC marked this pull request as ready for review June 26, 2023 10:12
@RaoulHC
Copy link
Contributor Author

RaoulHC commented Jun 26, 2023

Right think this is all good for review, I've added an extra field to releases so we can check the provenance of where packages came from which helps check what we want to check up to. Will update the changelog if you're happy with this approach.

Copy link
Contributor

@tchoutri tchoutri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok overall I'm very satisfied with this PR. Only one point to resolve. :)

@@ -0,0 +1,2 @@
alter table releases
add repository text default null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought a foreign key to the package_index_id would be the most appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. This does mean we have to add the package index without a timestamp before we start importing releases else you get foreign key violations. We could take the approach of adding/updating the timestamp as we import each release but I think this is probably better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it's not stupid to have package indexes that we could know about before having importing them, especially if we want to provision the database with public stuff like hackage & chap (and others in the future).

nix/shell-config.nix Show resolved Hide resolved
Now when mirroring from an index tarball we'll only import packages
newer than this timestamp from the same repository. We store the
repository provenance in the release table, which means the timestamp is
set to only the relevant packages.
@tchoutri tchoutri self-requested a review June 26, 2023 19:07
Copy link
Contributor

@tchoutri tchoutri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you very much, it's highly appreciated and extremely valuable.

@tchoutri tchoutri added the squash+merge me Tell Mergify bot to squash and merge the PR label Jun 26, 2023
@tchoutri tchoutri merged commit 2876fe0 into flora-pm:development Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
squash+merge me Tell Mergify bot to squash and merge the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants