-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relevance for Lucene needs to be tweaked #411
Comments
I said "at the top", but I really mean "closer to the top". It's currently 7th on the list even though it has way more downloads than the other search result. While I think we have to be careful not to give download count too much weight in a relevance search, I think giving it some weight is appropriate. I also think part of the problem might be that we need to give partial ID even more weight compared to partial title, and that the relative weight of description might be too high. |
There's a way to add weights to records as whole, we could try experimenting with that. However, the only trouble with it is we'd necessarily have to recreate the index ever so often (perhaps a couple of days?). |
Just playing devil's advocate here but we should make sure that its easy for new packages to get seen. I would want to make sure exact ID matches always outweighed download count so a new package could get exposure. |
@osbornm Yes, I agree that exact ID matches should definitely have the strongest weight, and should be at the top. |
It does that today, unfortunately they are strong matches i.e. EntityFramework would result in an exact match but not Entity Framework (with a space). I could tweak around with that. |
@half-ogre +1. We run the same commit of NuGetGallery as nuget.org internally, and a lot of our packages are something like "Organisation.NameSpace.ArchitectCrap.Widget.Library". Searching by id is almost impossible with the existing toolset (Gallery or Nuget) as it seems to default now to a lucene search on a '.' split keyword or something similar (really bad with a lot of packages like the one above). It would be nice for an explicit, exact id match to only result in 1 package (even if it has to be quoted), or at the very least be the one at the top. Today, if you search for "NuGet.Extensions" on http://nuget.org, it turns up mid-way down the second page, despite the exact package id being entered. |
This specific issue is fixed. We're always reviewing our lucene stuff though. |
#411) * Function for adding new package source mappings to nuget.config files. * Defaulting to `$PWD` for relative paths * Array of patterns, so we save once. Package source mapping section presence check.
We need to tweak the relevance weights for the search. As an example, searching for 'Entity' should put EntityFramework at the top: it's a partial ID match, partial title match, and has a crazy amount of downloads. This might be as easy as adding the download count as a factor.
I don't think we need to get this perfect from day 1 (relevance is something we might always tweak), but this is a good example of one we should be able to get right.
The text was updated successfully, but these errors were encountered: