Search Results - Improve Results #1928

mheppler · 2015-04-08T21:32:15Z

On our test server, with the migrated production data, I ran a simple search for "murray" in attempt to locate the "Murray Research Archive Original Collection Dataverse". (FULL DISCLOSURE: Being able to search for AND FIND a dataverse is a very important feature for me, personally. I was hired to customize dataverses, and I can't tell you the amount of man hours I wasted my first few months here, trying to find them in our old UI.)

In the 1,371 results that were returned (see attached), there were some weird things we should improve upon in order to improve the usability of our search results.

Distributor Logo URL and Distributor Name were the hits on the first result cards. Those seem to prioritize those fields over things like Dataverse Name and Parent Dataverse Name.
Also, we should combined Distributor Logo URL and Distributor Name into just Distributor, and display it as a compound, like we do for metadata tab on the dataset page.
File descriptions are turning up "Murray Archive", after we get past all the Distributor datasets.
I didn't go page, by page, because I gave up after 20 pages, but it wasn't until page 54 of the results that I saw my first dataset from the Murray Research Archive Original Collection Dataverse. That too hit on Distributor.
After going page by page through the first 20, and last 20 pages of results, I have no idea where the actual dataverse landed. There seems to be no rhyme or reason to the results.

sbarbosadataverse · 2015-04-08T21:42:53Z

I noticed that in searching today too...v6.
On Apr 8, 2015 5:32 PM, "Michael Heppler" notifications@github.com wrote:

On our test server, with the migrated production data, I ran a simple
search for "murray" in attempt to locate the "Murray Research Archive
Original Collection Dataverse". (FULL DISCLOSURE: Being able to search for
AND FIND a dataverse is a very important feature for me, personally. I was
hired to customize dataverses, and I can't tell you the amount of man hours
I wasted my first few months here, trying to find them in our old UI.)

In the 1,371 results that were returned (see attached), there were some
weird things we should improve upon in order to improve the usability of
our search results.

Distributor Logo URL and Distributor Name were the hits on the first
result cards. Those seem to prioritize those fields over things like
Dataverse Name and Parent Dataverse Name.

Also, we should combined Distributor Logo URL and Distributor Name
into just Distributor, and display it as a compound, like we do for
metadata tab on the dataset page.

File descriptions are turning up "Murray Archive", after we get past
all the Distributor datasets.

I didn't go page, by page, because I gave up after 20 pages, but it
wasn't until page 54 of the results that I saw my first dataset from the
Murray Research Archive Original Collection Dataverse. That too hit on
Distributor.

After going page by page through the first 20, and last 20 pages of
results, I have no idea where the actual dataverse landed. There seems to
be no rhyme or reason to the results. I know there are facets there, and
those are great, but these results look like a pile of vomit.

[image: screen shot 2015-04-08 at 5 09 51 pm]
https://cloud.githubusercontent.com/assets/687227/7055538/25ee8aee-de12-11e4-91b2-f5e3989c93b0.png

—
Reply to this email directly or view it on GitHub
#1928.

scolapasta · 2015-04-09T00:27:49Z

Moving to Beta 15, as critical to investigate what we might be able to do.

pdurbin · 2015-04-09T02:18:21Z

I created at Dataverse 4.0 Search Relevance Functional Requirements doc we can collaborate on: https://docs.google.com/a/harvard.edu/document/d/1W5UM076mMO8xFtl6t0X_eTnYmcb3G5HOUEQo_O-BsuQ/edit?usp=sharing

We only expose the score if debug is enabled in the GUI or if show_relevance is specified via the Search API.

pdurbin · 2015-04-10T18:57:50Z

On an internal server we are playing around with "boosting" certain fields like this:

<str name="defType">edismax</str>
<str name="qf">
dvName^170
dvSubject^160
dvDescription^150
dvAffiliation^140
title^130
subject^120
keyword^110
topicClassValue^100
dsDescriptionValue^90
authorName^80
authorAffiliation^70
publicationCitation^60
producerName^50
fileName^40
fileDescription^30
variableLabel^20
variableName^10
text^1.0
</str>

We're collecting feedback via email and a Google doc.

In an ancient commit at 60e640b I introduced this line...

solrQuery.setParam("qt", "/spell")

... which had the affect (as far as I understand) of switching us to using the "/spell" endpoint rather than the default "/select" endpoint. So the XML above has to be put under <requestHandler name="/spell" in solrconfig.xml.

I'd also like to research "edismax" a lot more. From what I understand we are using the default "defType" but according to https://wiki.apache.org/solr/DisMax it would probably be good for us to use dismax or edismax instead of the default (I'm not sure what it's called). More research is needed.

tercer · 2015-05-22T01:39:39Z

I just searched on "Experiment" and half my results on the first two pages were "Experience" in a Field Term. Is this an issue caused by boosting Fields in search. Here's an image from the middle of the first page of search results:

pdurbin · 2015-05-22T14:15:28Z

@tercer hmm, yep, I can easily replicate this. I mentioned it at http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2015-05-22 but no one jumped in with an explanation. Weird. Thanks.

Also, @mcrosas just mentioned that a search for https://dataverse.harvard.edu/?q=datapass (sadly) does not find https://dataverse.harvard.edu/dataverse/datapass presumably because it has it has a hyphen in it ("Data-PASS").

@scolapasta and @kcondon what's the plan for these tickets that are still open from old beta pushes? Should a new ticket be opened for improving search results? Should we continue to use this one and change the milestone? Should we continue to work on the Dataverse 4.0 Search Relevance Functional Requirements doc I started?

pdurbin · 2015-07-14T17:24:31Z

It was mentioned today that perhaps we should document in the Installation Guide the "boosting" configuration described at #1928 (comment) that we are running in production.

kcondon · 2016-01-16T12:04:44Z

@scolapasta This was assigned to me way back. It seems like there have been specific search issues found but not yet addressed? Phil asked how we should approach improving search results generally.

I can try more testing, with input from others since they're good at finding cases, but sounds like we already have a basis for work, no?

I think this may be a case of iterative improvements like we experienced with the first search implementation: we do our best, verify common results work well, release and get further feedback.

scolapasta · 2016-01-27T22:45:52Z

@pdurbin @kcondon Should we just close this and if there any specific things to be done open as separate issues?

pdurbin · 2016-01-27T23:13:07Z

The issues @tercer and @mcrosas mentioned are real issues.

pdurbin · 2016-01-29T21:41:00Z

I just discussed this issue with @scolapasta and we agreed to close this issue (which I'll do now) after I create two issues to note the issues mentioned by @tercer and @mcrosas, which are, respectively:

A search for "experiment" yields results for "experience" · Issue A search for "experiment" yields results for "experience" #2897
Search: dataverse alias not searchable · Issue Search: dataverse alias not searchable #2898

mheppler added Feature: Search/Browse UX & UI: Design This issue needs input on the design of the UI and from the product owner labels Apr 8, 2015

mheppler added this to the In Review - 4.0.x milestone Apr 8, 2015

mheppler added the Priority: Critical label Apr 8, 2015

mheppler modified the milestones: In Review - 4.0, In Review - 4.0.x Apr 8, 2015

scolapasta modified the milestones: Beta 15 - Dataverse 4.0, In Review - 4.0 Apr 9, 2015

scolapasta assigned pdurbin Apr 9, 2015

pdurbin added a commit that referenced this issue Apr 9, 2015

Search: expose score/relevance #1928

f1a73ff

We only expose the score if debug is enabled in the GUI or if show_relevance is specified via the Search API.

scolapasta added the Status: QA label Apr 10, 2015

scolapasta assigned kcondon and unassigned pdurbin Apr 10, 2015

pdurbin mentioned this issue Jul 24, 2015

update jetty header http size (mydata branch) #2311

Closed

pdurbin mentioned this issue Sep 4, 2015

Searching for "digital" finds "digit" #2484

Closed

mercecrosas modified the milestones: Beta 15 - Dataverse 4.0, In Review Nov 30, 2015

scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016

scolapasta assigned pdurbin and unassigned kcondon Jan 29, 2016

This was referenced Jan 29, 2016

A search for "experiment" yields results for "experience" #2897

Closed

Search: dataverse alias not searchable #2898

Closed

pdurbin closed this as completed Jan 29, 2016

pdurbin mentioned this issue Feb 10, 2016

Installation Guide improvements following rewrite for 4.2.4 #2944

Closed

28 tasks

matthew-a-dunlap mentioned this issue Sep 20, 2018

Solr search result ordering broken #4938

Closed

pdurbin removed their assignment Feb 13, 2019

pdurbin mentioned this issue Feb 4, 2020

Search: Accented characters, ampersands, negative numbers and other special characters #820

Closed

pdurbin mentioned this issue Mar 8, 2021

Solr 8.8 upgrade - remaining issues with solrconfig.xml #7662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Results - Improve Results #1928

Search Results - Improve Results #1928

mheppler commented Apr 8, 2015 •

edited by djbrooke

Loading

sbarbosadataverse commented Apr 8, 2015

scolapasta commented Apr 9, 2015

pdurbin commented Apr 9, 2015

pdurbin commented Apr 10, 2015

tercer commented May 22, 2015

pdurbin commented May 22, 2015

pdurbin commented Jul 14, 2015

kcondon commented Jan 16, 2016

scolapasta commented Jan 27, 2016

pdurbin commented Jan 27, 2016

pdurbin commented Jan 29, 2016

Search Results - Improve Results #1928

Search Results - Improve Results #1928

Comments

mheppler commented Apr 8, 2015 • edited by djbrooke Loading

sbarbosadataverse commented Apr 8, 2015

scolapasta commented Apr 9, 2015

pdurbin commented Apr 9, 2015

pdurbin commented Apr 10, 2015

tercer commented May 22, 2015

pdurbin commented May 22, 2015

pdurbin commented Jul 14, 2015

kcondon commented Jan 16, 2016

scolapasta commented Jan 27, 2016

pdurbin commented Jan 27, 2016

pdurbin commented Jan 29, 2016

mheppler commented Apr 8, 2015 •

edited by djbrooke

Loading