-
Notifications
You must be signed in to change notification settings - Fork 234
Suboptimal Searches (Dev, Trial and any unindexed releases not to be included here) #1231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm looking for File::Temp. https://metacpan.org/search?q=tmpfile (result 8) vs http://search.cpan.org/search?query=tmpfile&mode=all (result 2) |
Thanks for kicking this off http://search.cpan.org/search?query=mop&mode=all vs https://metacpan.org/search?q=mop (known issue, but the most painful manifestation of it) http://search.cpan.org/search?query=dbix+helper&mode=all vs https://metacpan.org/search?q=dbix+helper (note how the only thing coming up is the deprecated one) |
@ribasushi I think the main issue with the dbix+helper search is that the MetaCPAN search results are collapsed. If you follow through on the link for more results you get https://metacpan.org/search?q=distribution:DBIx-Class-Helpers+dbix%20helper which is much more helpful. I'm not invalidating your comment. I'm just trying to work through what we're seeing. Obviously showing a deprecated module as the first result is not helpful. We should look at tweaking the collapsed search in this kind of case. One other problem may be that the search is for "helper" and not "helpers". The collapsed results for "helpers" look better: https://metacpan.org/search?q=dbix+helpers |
@oalders Does ES provide a way to calculate a "churn coefficient"? In other words - can it rank the entries by "most changes since" and thus give you a sane collapse criteria? |
You need a way to specify a search by module name -- you effectively have this for the search box autocomplete, but something like |
More smarts on start matching... I want to find all Plack::Middleware::** modules that have 'time' https://metacpan.org/search?q=plack%3A%3Amiddleware+time This might be a new feature rather than a suboptimal search but thought I'd mention it here |
What @dagolden is proposing is something we can do relatively easily, so I think we should make that a priority. We'd just need to sort out the syntax. The single colon is part of lucene's search syntax. Also we just need to advertise that you can use lucene's syntax to constrain searches. A good example is https://metacpan.org/search?q=plack+author%3ADAGOLDEN |
|
Putting To clarify term filters, they are for exact values (like not_analyzed strings). The reference docs do use the word "contain" (which isn't very clear) but they also say "not_analyzed":
which means it won't be tokenized (hence the exact match requirement). The book ("definitive guide") is slightly more specific:
Also note that the "term" operator doesn't analyze the input, so for example However you can't see that difference using the search box because of the query_string query (which does analyze the input). So, since we have several "fields" for module name, using an analyzed field can get you what you want: module.name.analyzed:MooseX |
This is not user friendly. Instead of making us jump hoops to know, understand and remember your data model and search engine behaviors, why not just intercept the search box contents before it goes to Lucene and create the right search for us?
Or, if you don't like colon separators, do something like DDG: |
My preference here would be to go with the colon separators because that's what people are used to. We could use some other character for stuff that people want to pass directly to ES/Lucene. Aside from the distribution search, I don't think we use this syntax at all. Nobody really seems to be aware of it and it would follow that really nobody is taking advantage of this. Also, you really need to know a fair bit about the internals to take advantage of this. So, I'd say, let's make this as friendly as possible. If someone wants the old behaviour, they can preface the query with some syntax that doesn't get in the way. |
I wasn't suggesting that people should know how to work that (or that it was good enough), I was just trying to clarify what Thomas was experiencing. We actually do have some special casing for |
There is also some DDG-like operator in there, but I'm not sure how that works. We obviously could use a page to explain what's available and how it works. @oalders FWIW, In the search results there's a link that says "search in distribution" which just redoes the current search with an added |
@rwstauner Yeah, that's what I meant with "Aside from the distribution search, I don't think we use this syntax at all". :) |
Yeah, I guess so. I was looking at the next sentence and thinking you were |
@rwstauner Thanks for the great explanation. I was looking at the Lucene docs, which I swear mentioned something about being contains not equals, but I don't see it now. And then to make it more confusing I conflated I wrote the user-friendly version which munges "module:..." as PR #1246. |
vanity searches for pause ids seem to return weird results for modules: You can see a similar result searching for https://metacpan.org/search?q=FREW |
Searching for GetOpt yields a weird, apparently unsorted set of output. |
@andreeap Despite the fact that this ticket is on metacpan-web, most of the fixes here would involve a deep dive into Elasticsearch rather than front end work, so this is perfect for the scope of your OPfW time. You can pick searches from this list which interest you, create new issues for them and then link those issues back to this one so that we can track their progress. |
I should note that a bunch of search-related issues can also be found here https://github.com/CPAN-API/metacpan-web/labels/group:Search |
|
https://metacpan.org/search?q=IO%3A%3AAsync%3A%3ATimer%3A%3APeriod should find https://metacpan.org/pod/IO::Async::Timer::Periodic, but inexplicably finds something else |
I'm trying to find something to parse XML, so I searched for "xml". Most of the first results are from modules with last uploads circa 2000. Giving more weight to modules with more recent upload dates may be helpful. |
From IRC: This search - https://metacpan.org/search?q=uri - places a module from 1998 with no upvotes or reviews above URI.pm which has 71 upvotes and three 5-star reviews. Furthermore, https://metacpan.org/search?q=XSLT does not find XML::LibXSLT anywhere in the top results. |
|
|
|
If you search for either |
https://metacpan.org/search?q=overload In a search for |
I'm opening this issue as a place to collect searches which could be improved. Individual searches can be broken into issues as they are tackled, but this is essentially a place to get the conversation started.
The text was updated successfully, but these errors were encountered: