Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Results - Improve Results #1928

Closed
mheppler opened this issue Apr 8, 2015 · 11 comments
Closed

Search Results - Improve Results #1928

mheppler opened this issue Apr 8, 2015 · 11 comments
Labels
Feature: Search/Browse UX & UI: Design This issue needs input on the design of the UI and from the product owner

Comments

@mheppler
Copy link
Contributor

mheppler commented Apr 8, 2015

On our test server, with the migrated production data, I ran a simple search for "murray" in attempt to locate the "Murray Research Archive Original Collection Dataverse". (FULL DISCLOSURE: Being able to search for AND FIND a dataverse is a very important feature for me, personally. I was hired to customize dataverses, and I can't tell you the amount of man hours I wasted my first few months here, trying to find them in our old UI.)

In the 1,371 results that were returned (see attached), there were some weird things we should improve upon in order to improve the usability of our search results.

  • Distributor Logo URL and Distributor Name were the hits on the first result cards. Those seem to prioritize those fields over things like Dataverse Name and Parent Dataverse Name.
  • Also, we should combined Distributor Logo URL and Distributor Name into just Distributor, and display it as a compound, like we do for metadata tab on the dataset page.
  • File descriptions are turning up "Murray Archive", after we get past all the Distributor datasets.
  • I didn't go page, by page, because I gave up after 20 pages, but it wasn't until page 54 of the results that I saw my first dataset from the Murray Research Archive Original Collection Dataverse. That too hit on Distributor.
  • After going page by page through the first 20, and last 20 pages of results, I have no idea where the actual dataverse landed. There seems to be no rhyme or reason to the results.

screen shot 2015-04-08 at 5 09 51 pm

@mheppler mheppler added Feature: Search/Browse UX & UI: Design This issue needs input on the design of the UI and from the product owner labels Apr 8, 2015
@mheppler mheppler added this to the In Review - 4.0.x milestone Apr 8, 2015
@mheppler mheppler modified the milestones: In Review - 4.0, In Review - 4.0.x Apr 8, 2015
@sbarbosadataverse
Copy link

I noticed that in searching today too...v6.
On Apr 8, 2015 5:32 PM, "Michael Heppler" notifications@github.com wrote:

On our test server, with the migrated production data, I ran a simple
search for "murray" in attempt to locate the "Murray Research Archive
Original Collection Dataverse". (FULL DISCLOSURE: Being able to search for
AND FIND a dataverse is a very important feature for me, personally. I was
hired to customize dataverses, and I can't tell you the amount of man hours
I wasted my first few months here, trying to find them in our old UI.)

In the 1,371 results that were returned (see attached), there were some
weird things we should improve upon in order to improve the usability of
our search results.

  • Distributor Logo URL and Distributor Name were the hits on the first
    result cards. Those seem to prioritize those fields over things like
    Dataverse Name and Parent Dataverse Name.
  • Also, we should combined Distributor Logo URL and Distributor Name
    into just Distributor, and display it as a compound, like we do for
    metadata tab on the dataset page.
  • File descriptions are turning up "Murray Archive", after we get past
    all the Distributor datasets.
  • I didn't go page, by page, because I gave up after 20 pages, but it
    wasn't until page 54 of the results that I saw my first dataset from the
    Murray Research Archive Original Collection Dataverse. That too hit on
    Distributor.
  • After going page by page through the first 20, and last 20 pages of
    results, I have no idea where the actual dataverse landed. There seems to
    be no rhyme or reason to the results. I know there are facets there, and
    those are great, but these results look like a pile of vomit.

[image: screen shot 2015-04-08 at 5 09 51 pm]
https://cloud.githubusercontent.com/assets/687227/7055538/25ee8aee-de12-11e4-91b2-f5e3989c93b0.png


Reply to this email directly or view it on GitHub
#1928.

@scolapasta
Copy link
Contributor

Moving to Beta 15, as critical to investigate what we might be able to do.

@pdurbin
Copy link
Member

pdurbin commented Apr 9, 2015

I created at Dataverse 4.0 Search Relevance Functional Requirements doc we can collaborate on: https://docs.google.com/a/harvard.edu/document/d/1W5UM076mMO8xFtl6t0X_eTnYmcb3G5HOUEQo_O-BsuQ/edit?usp=sharing

pdurbin added a commit that referenced this issue Apr 9, 2015
We only expose the score if debug is enabled in the GUI or if
show_relevance is specified via the Search API.
@pdurbin
Copy link
Member

pdurbin commented Apr 10, 2015

On an internal server we are playing around with "boosting" certain fields like this:

<str name="defType">edismax</str>
<str name="qf">
dvName^170
dvSubject^160
dvDescription^150
dvAffiliation^140
title^130
subject^120
keyword^110
topicClassValue^100
dsDescriptionValue^90
authorName^80
authorAffiliation^70
publicationCitation^60
producerName^50
fileName^40
fileDescription^30
variableLabel^20
variableName^10
text^1.0
</str>

We're collecting feedback via email and a Google doc.

In an ancient commit at 60e640b I introduced this line...

solrQuery.setParam("qt", "/spell")

... which had the affect (as far as I understand) of switching us to using the "/spell" endpoint rather than the default "/select" endpoint. So the XML above has to be put under <requestHandler name="/spell" in solrconfig.xml.

I'd also like to research "edismax" a lot more. From what I understand we are using the default "defType" but according to https://wiki.apache.org/solr/DisMax it would probably be good for us to use dismax or edismax instead of the default (I'm not sure what it's called). More research is needed.

@tercer
Copy link

tercer commented May 22, 2015

I just searched on "Experiment" and half my results on the first two pages were "Experience" in a Field Term. Is this an issue caused by boosting Fields in search. Here's an image from the middle of the first page of search results:

screen shot 2015-05-21 at 9 37 00 pm

@pdurbin
Copy link
Member

pdurbin commented May 22, 2015

@tercer hmm, yep, I can easily replicate this. I mentioned it at http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2015-05-22 but no one jumped in with an explanation. Weird. Thanks.

Also, @mcrosas just mentioned that a search for https://dataverse.harvard.edu/?q=datapass (sadly) does not find https://dataverse.harvard.edu/dataverse/datapass presumably because it has it has a hyphen in it ("Data-PASS").

@scolapasta and @kcondon what's the plan for these tickets that are still open from old beta pushes? Should a new ticket be opened for improving search results? Should we continue to use this one and change the milestone? Should we continue to work on the Dataverse 4.0 Search Relevance Functional Requirements doc I started?

@pdurbin
Copy link
Member

pdurbin commented Jul 14, 2015

It was mentioned today that perhaps we should document in the Installation Guide the "boosting" configuration described at #1928 (comment) that we are running in production.

@kcondon
Copy link
Contributor

kcondon commented Jan 16, 2016

@scolapasta This was assigned to me way back. It seems like there have been specific search issues found but not yet addressed? Phil asked how we should approach improving search results generally.

I can try more testing, with input from others since they're good at finding cases, but sounds like we already have a basis for work, no?

I think this may be a case of iterative improvements like we experienced with the first search implementation: we do our best, verify common results work well, release and get further feedback.

@scolapasta
Copy link
Contributor

@pdurbin @kcondon Should we just close this and if there any specific things to be done open as separate issues?

@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2016

The issues @tercer and @mcrosas mentioned are real issues.

@scolapasta scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016
@pdurbin
Copy link
Member

pdurbin commented Jan 29, 2016

I just discussed this issue with @scolapasta and we agreed to close this issue (which I'll do now) after I create two issues to note the issues mentioned by @tercer and @mcrosas, which are, respectively:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Search/Browse UX & UI: Design This issue needs input on the design of the UI and from the product owner
Projects
None yet
Development

No branches or pull requests

7 participants