Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return single package when multiple indexers are used #849

Merged
merged 6 commits into from
Jul 27, 2022

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Jul 26, 2022

Combined indexer just merges the results of each individual indexer.
Select the latest version of each package when not all versions are expected.
In case the same version of a package is available in multiple indexers, the first one found is used.

@jsoriano jsoriano requested a review from a team July 26, 2022 11:59
@jsoriano jsoriano self-assigned this Jul 26, 2022
@elasticmachine
Copy link

elasticmachine commented Jul 26, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-07-27T10:40:11.464+0000

  • Duration: 6 min 41 sec

Test stats 🧪

Test Results
Failed 0
Passed 213
Skipped 0
Total 213

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this.

},
},
{
title: "single package unsorted versions",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a case where both indexers contain the same package version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the case where both indexers contain the same package version, what would the behavior be for a query with ?all=true?

I would think even in this case the response should only contain one item for that package version. The reason being that the other APIs don't provide a way to differentiate between the two items (e.g. the only specifiers are name and version in the download request /epr/example/example-1.0.1.zip). Assuming the solution is to only return one in the search/?all=true response then it would be good to specify somewhere what the priority order is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that in that case we have to decide that some indexers have precendence over the others, I will take a look to this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a change so indexes defined later have precedence over the ones defined earlier. I have made that because currently we are defining the zip indexer, with support for signatures, after the plain file indexer, so now if a package is in both indexers, the zip with signature will have precedence.

return packages, nil
}

func latestPackagesVersion(source packages.Packages) (result packages.Packages) {
sort.Sort(byNameVersion(source))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call sort.Sort on the source, it won't alter the original packages slice? I'm double-checking if this is concurrency-safe.

Copy link
Member Author

@jsoriano jsoriano Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will modify the packages slice defined in the CombinerIndexer.Get, that is built on every call to this method and shouldn't be used concurrently in any other place.

CHANGELOG.md Outdated
@@ -10,6 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Bugfixes

* Return only the latest version of each package when a combined index is used. [#849](https://github.com/elastic/package-registry/pull/849)
* Return only last appearance of the same version of a package when it is available in multiple indexes. [#849](https://github.com/elastic/package-registry/pull/849)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss following scenario:

Developer is working on an integration locally. The package manifest defines the same version as for the embedded package already available in the Docker image, so there are 2 different package revisions published under the same version. Will EPR serve the local mounted package or the Docker embedded one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, if both paths are defined in the same file system indexer, it will return the first one it finds, the second one is discarded as duplicated.

If they are defined in different indexers (plain file system vs zip) in a composable indexer, it will return the last one it finds. This is what is being implemented in this PR.

I can make the composable indexer to return the first package found, to be consistent with a file system indexer with multiple paths. But then I think that I would also reorder the initialization of the zip and plain file system indexers so the zip one takes precedence.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I'm asking is that users will struggle to figure out which package is served when. Maybe it's just better to change the order of indexers to ensure priorities:

  1. Local ZIP indexer.
  2. Local "unpacked" indexer.
  3. Google Storage Indexer.

In the nearest future, I'd like to introduce another indexer, a Proxy indexer?, which will connect to https://epr.elastic.co to pull results from them. This way it won't be necessary to ship EPR as heavy Docker images if the user has access to the internet and can stream relevant images.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0116753.

@mtojek mtojek self-requested a review July 27, 2022 10:49
@jsoriano jsoriano merged commit 672baef into elastic:main Jul 27, 2022
@jsoriano jsoriano deleted the combined-indexer-multiple-versions branch July 27, 2022 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants