Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full-text search using Lunr.js #1726

Merged
merged 48 commits into from
May 26, 2020
Merged

full-text search using Lunr.js #1726

merged 48 commits into from
May 26, 2020

Conversation

ZoeLeBlanc
Copy link
Member

@ZoeLeBlanc ZoeLeBlanc commented Mar 25, 2020

As an alternative to List.js, we could also use Lunr.js which has more robust search capacity.

This branch has my current attempt to get the two of these working together. It requires pulling down the pre-built indices and corpora, then filtering them for a search term, trying to reuse some of the existing UX by having the search results appear where the lesson abstract usually does.

One issue I'm having is that I'm not getting the right position for the search terms in the documents. @mdlincoln I'm using your code from the gist so if you wouldn't mind taking a look I would really appreciate it!

closes #1018

Checklist

If you are having difficulty fixing Travis errors, first consult https://github.com/programminghistorian/jekyll/wiki/Making-Technical-Contributions carefully, especially "Common Travis Errors". Then contact the technical team if you need further help.

@mdlincoln
Copy link
Contributor

I think you're getting the right search position, but applying it to the wrong documents! E.g. if I search georeferencing it doesn't bring up the georeferencing lesson at all, but three other ones that don't even mention the term.

There's an indexing issue somewhere in your code, where you're picking the wrong lessons from the search results and then trying to applying the highlighting to the text bodies and thus getting bizarre results - haven't pinpointed it yet, but that's where I'd look.

@ZoeLeBlanc
Copy link
Member Author

Thanks for figuring this out @mdlincoln !!! Definitely appreciate the help!! I wonder if it the search corpus and index are mismatched somehow 🤔

Otherwise it would likely be the results from the idx.search or where I'm creating the var docs 🕵️‍♀️

js/lessonfilter.js Outdated Show resolved Hide resolved
@mdlincoln
Copy link
Contributor

When I inspect the results from idx.search(searchString); and compare to the ids in search.json they are definitely correct! So at least that part is working.

Where is the portion of the code where you try to match that id number to the entries in the <ul> on the lessons page? Because those integer IDs only exist rn when creating the search.json file 🤔

@mdlincoln
Copy link
Contributor

mdlincoln commented Mar 26, 2020

huh, so in the corpus object you create, the id value is totally wrong, but the index location in the corpus object is correct?

e.g. at position 34 in corpus, the item has id: 84. A search for georeferencing returns the id 34, and the doc in position 34 in corpus is the right doc, it's just that the id doesn't correspond.

@ZoeLeBlanc
Copy link
Member Author

Thanks for the feedback @mdlincoln !! I updated the reference to the search index and I think fixed the issue. You were totally right about the doc.id being the wrong thing. Turns out our Lunr ref was looking for url, so I updated the search-index repo and now everything seems to be working. Gonna add in the logic for multiple languages and then I think we're ready for testing 🎉

@ZoeLeBlanc
Copy link
Member Author

So a weird bug that I could use your help with @mdlincoln. If you've noticed sometimes it gives a blank answer or doesn't show the updated search results or doesn't erase them when you empty out the input.

After lots of testing turns out it does all of that if you click enter at the end of the search query, which I'm pretty sure is an issue with Chrome not updating the new html, but could have to do with List.JS as well...

I've tried using jquery trigger, as well as resizing the window, and using setTimeout to hide and show the search results. So far nothing has worked... any thoughts or ideas of what we should try???

@mdlincoln mdlincoln marked this pull request as draft April 12, 2020 17:28
@ZoeLeBlanc
Copy link
Member Author

Sorry for the very similar commits here, for some reason I wasn't seeing my pushes in the Github UI so overdid it a bit.

Anyways, pretty much have search fully working before tomorrow's meeting 🎉

You can now click on a topic or activity, search within those lessons, undo your search, and still have those lessons selected.

I also added logic for a button or enter press which is pretty much fixing the display bug 98% of the time (sometimes you need a second enter press).

Looking forward to hearing how it works for others 😅

@mdlincoln
Copy link
Contributor

mdlincoln commented Apr 21, 2020

  • Instead of "enable search lessons?" I might write "Start searching" in the initial search button.

  • I type in a search term and click Search Lessons. I then edit the search text and click Search Lessons again, but nothing happens. I would expect that it would execute a new search, but instead I first need to click the "Reset" button

  • I type in a search term and click Search Lessons. I then click on one of the faceting buttons above (e.g. Digital Publishing), and the list expands to include all the digital publishing lessons plus the text search lessons. I would expect that it would show the intersect of the two queries.

    • HOWEVER If I do the above and then click Search Lessons again, we get the intersect.
  • I love that the search query text entry is automatically capitalized, which is a useful way to suggest that its text insensitive. It IS case-insensitive, right?

  • Is the index creation and index search language-sensitive when it does word stemming? Can we set that up if it's not already? (e.g. in English search, "lesson" will also match "lessons" but in Spanish, "lección" will not match "lecciones"
    Questions:

  • You'd mentioned we could enable more advanced wildcard searching. Do we think this is really necessary, or overkill? I lean towards thinking it'd be overkill - we don't even have 100 lessons right now.

@jenniferisasi
Copy link
Contributor

  • 'Enable search lessons?' to 'Start searching' or 'Search in lessons'
  • I looked for text*, then I do text (thinking from a general to specific search) and the search is correct or accurate. Now, upon deleting the work, in this case text the result don't clear out. When I click the return key or the 'search lessons' tab it takes back to the previous search results, here text*, then another return or search lessons and it clears out to the lesson index.
  • In Spanish I realize it shows some singular/plurals and some verb conjugations in the search but not always (not using the * here). Examples:
    -- valor valores
    -- texto textos
    -- dato datos
    -- escribe escribí
    -- cerrar cerrará
    -- guardar guardará (so this indicates I think it takes also whatever follows the characters in the search
  • In the same way, it didn't do all the plurals or follow the characters...
    -- red, didn't show 'redes'
  • After a few searches for the above experiment, it "run out of searches" in other words, I had to reset to see all lessons (had to click a couple of times)
  • It is accent mark sensitive - if missing it won't find the word
  • The * search retrieves "all" possible results for the use of a root, ie. 'utiliza*' (verbs + nouns)

I did some searches in French and this is what I found there:

  • 'avancée' retrieves avancée, avancées, avancés.

  • 'caractéristique' retrieves the plural 'caractéristiques'

  • l'etu(missing the

  • It doesn't retrieve a string of characters that doesn't show the word completely? I do 'orig' thinking of original.
    -- origin , retrieves original; original retrieves origin

  • It looks to me that after a few searches the search lesson stops working until you reset.

@mdlincoln
Copy link
Contributor

Thanks Zoe I see your commits - hoping to peer through them this weekend and then we can launch 🎷

Copy link
Contributor

@mdlincoln mdlincoln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move the lunr stemming code into a js/vendor subdirectory just to make clear that it's drop-in code that didn't need to be customized for our site?

Thinking ahead, how difficult will it e to add in support for Portuguese stemming? Can we make sure that process is documented in the code comments or elsewhere?


/* Function to update lesson-list using featureList and sort direction
- uses URI to generate sort directions
*/
console.log("applying sort from URI");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the console log functions now that we're ready to push this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were there originally (i.e. before I started working on search) but happy to remove them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I also remove where we console.log the new URI?

js/lessonfilter.js Outdated Show resolved Hide resolved
js/lessonfilter.js Outdated Show resolved Hide resolved
@ZoeLeBlanc
Copy link
Member Author

@mdlincoln I like the idea of the js/vendor and happy to document (though it is almost entirely just following the lunr docs). One question I've had is where to document everything? Do you have any suggestions?

Also for the search-info pop out. Are we planning to keep it? If yes, I can add it to our snippets. Finally should I take out all the console.logs or just the one you highlighted. There's quite a few of them so let me know. Thanks!

@mdlincoln
Copy link
Contributor

Ditch all the console.log()

A high-level overview page similar to the wiki I wrote for the translation links would be very helpful. You don't need to explain every single line of code bc the comments are quite good, but mainly to document the architecture.

And that mini-help text is great. Let's add it to snippets

@ZoeLeBlanc
Copy link
Member Author

Alrighty! All console.logs are gone and snippet added 🎉

The example of Twitter and Network in the search info works for Spanish but not French, based on the lessons available. I would recommend for French maybe doing +Python -Ruby but it raises the question of whether we may want to standardize examples across languages or just let each team decide which example search terms would be most useful to readers.

I'll start working on the documentation tutorial this week but seems like we're in the final mile 🙌

@mdlincoln
Copy link
Contributor

@rivaquiroga @programminghistorian/spanish-team
@spapastamkou @programminghistorian/french-team

If you can please help translate the final snippet that provides search documentation for users: https://github.com/programminghistorian/jekyll/pull/1726/files#diff-a407e023d60dfefccac66d5a7959003cR375-R384

spapastamkou
spapastamkou previously approved these changes May 26, 2020
Copy link
Contributor

@mdlincoln mdlincoln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZoeLeBlanc please put the boilerplate JS code into a /vendor directory as discussed above

@ZoeLeBlanc
Copy link
Member Author

@mdlincoln thanks for catching that! Just moved the files to vendor directory and updated the lesson index to only load appropriate stemmer for each language (rather than loading them all on each page).

@mdlincoln
Copy link
Contributor

mdlincoln commented May 26, 2020

ohhh good catch, I didn't even think to check it was loading all that extra JS!

@ZoeLeBlanc
Copy link
Member Author

hmm weirdly isn't showing that I created vendor and moved the files 🤔 Should I try pushing up again??

@mdlincoln
Copy link
Contributor

oh shoot - at one point I think vendor/ was added to our gitignore b/c of the way that ruby gems are installed locally at /vendor/bundle/ruby

It ought to be /vendor in gitignore so that it doesn't ignore all directories that have the name vendor in them, just the one at the root

@ZoeLeBlanc
Copy link
Member Author

ahh that makes more sense! I'll go ahead and fix that in gitignore, and then push up again

@mdlincoln mdlincoln merged commit 850d467 into gh-pages May 26, 2020
@mdlincoln mdlincoln deleted the search-lunr-1018 branch May 26, 2020 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Full-text search
7 participants