Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-text search #1018

Closed
alsalin opened this issue Sep 20, 2018 · 40 comments · Fixed by #1726
Closed

Full-text search #1018

alsalin opened this issue Sep 20, 2018 · 40 comments · Fixed by #1726

Comments

@alsalin
Copy link
Contributor

alsalin commented Sep 20, 2018

Based off of the Community Survey - we had several readers note that there's no easy way to search the website to find lessons. Is this possible? If so, is it hard? If not, what's the possibility of us implementing search capabilities?

On a related note - improving browsing and filtering. Some folks noted that it's not always easy to find things under specific filter headings.

@mdlincoln

This comment has been minimized.

@mdlincoln

This comment has been minimized.

@mdlincoln

This comment has been minimized.

@drjwbaker

This comment has been minimized.

@ZoeLeBlanc

This comment has been minimized.

@mdlincoln

This comment has been minimized.

@drjwbaker

This comment has been minimized.

@ZoeLeBlanc

This comment has been minimized.

@ZoeLeBlanc

This comment has been minimized.

@acrymble

This comment has been minimized.

@drjwbaker

This comment has been minimized.

@mdlincoln
Copy link
Contributor

Yes! From 2018-12-20 meeting:

We need to develop user stories to help guide our implementation:

  • what is the pathway for someone entering the site who doesn't know anything about the site and wants to e.g. see pathways, see different categories of lessons
  • what is the pathway for someone who is looking for a specific tech they know has been discussed (e.g. SPARQL) but needs to find the lesson again

other ?s

  • does this search run across all lessons? should it be faceted by language?
  • searching blog posts / guidelines too? (I'm thinking no, but it's worth asking)

@drjwbaker
Copy link
Member

  • What is the pathway for someone who has just finished a lesson and wants to find another (not necessarily directly related to the one they've done)?

@ZoeLeBlanc

This comment has been minimized.

@drjwbaker

This comment has been minimized.

@mdlincoln

This comment has been minimized.

@ZoeLeBlanc

This comment has been minimized.

@ZoeLeBlanc

This comment has been minimized.

@mdlincoln

This comment has been minimized.

@mdlincoln

This comment has been minimized.

@acrymble

This comment has been minimized.

@mdlincoln

This comment has been minimized.

@mdlincoln
Copy link
Contributor

From 2020-03-19 call:

The big bottleneck for creating FTS is generating the textual index object. The pure "Jekyll" way of doing this is to have the user's browser create this index on the fly. However, our texts are too long and numerous - this caused our browsers to hang and crash.

We sketched out an alternative solution:

  1. Generate JSON files in this repo with the unindexed full text of each lesson (similar to this code)
  2. Create a small Node.js app that can pull down that JSON file from the website, create the index, and then post the generated index file (TBD where @ZoeLeBlanc ?) online.
  3. Now when the user navigates to the site search page, their browser just has to download the much smaller prebuilt index, without doing any computing.

@ZoeLeBlanc will generate 1 and 2 and a mockup of 3 by the next @programminghistorian/technical-team meeting in April for initial feedback.

@mdlincoln
Copy link
Contributor

For hosting the built index, perhaps we just push it to a separate GH pages site? https://docs.travis-ci.com/user/deployment/pages/

@mdlincoln
Copy link
Contributor

Also suggest that we use https://docs.travis-ci.com/user/cron-jobs to periodically re-generate the index

@ZoeLeBlanc
Copy link
Member

@mdlincoln pretty much finished 1 & 2 of the list, but ofc a few unforeseen issues that could use your input 👇

So for 1. I need the site to actually build the search json files for en, es, fr (though code is now in branch issue-1018), but that will only happen if my PR gets merged in right?

Then for 2. Once I merge in issue-1018, then I need to update the code in the search-index repo to actually read the updated json files (though that should take two minutes to add).

Otherwise cron job is setup on travis and search-index repo is working with to deploy the search indexes 🎉 (can see the file here https://github.com/programminghistorian/search-index/tree/master/indices)

Thanks for your help 😊

@mdlincoln
Copy link
Contributor

@ZoeLeBlanc Sure, open up a PR from the issue-1018 branch and I'll review & merge so that you can start the search-index repo off off the live JSON.

@mdlincoln
Copy link
Contributor

https://github.com/programminghistorian/jekyll/compare/issue-1018

did something odd happen to the permissions or something on your files? This comparison shows you've touched every single file! Might be better to open a fresh branch and re-do the JSON :/

@ZoeLeBlanc
Copy link
Member

@mdlincoln super weird that happened but thanks for catching it! I just recreated the branch and files, and made a PR. Thanks!!

@ZoeLeBlanc
Copy link
Member

so a bit of a crazy update for you @mdlincoln. I hadn't really looked into List.js at all before this (docs are here https://listjs.com/) and turns out it has some search functionality already built-in. So I decided to try it out with our current structure and it seems to work 😅. I just pushed up branch search-1018 and if you pull that down, you should be able to search on each language-specific lessons page. Right now I'm searching the abstract, title, and contents of each lesson, but we could switch that up with barely any additional logic.

Let me know what you think but this might be a lightweight enough solution to use instead of going ahead with Lunr 😄

@mdlincoln
Copy link
Contributor

HUH that is... quite impressive! It's like a ctrl-F for the lesson content. Looks like the drawbacks are:

  • exact search only, so searching e.g. principles linked open data won't return the "Principles of Linked Open Data" lesson
  • can't show the keyword-in-context result

But I like that this keeps the architecture super simple and we wouldn't need to go around building indices with a separate service. It also seems to hook nicely into the existing facet buttons.

Make a PR of this so we can all give it a try?

@ZoeLeBlanc
Copy link
Member

Link to PR here #1720

I wrote this in the PR note but a few things to consider/test.

  1. List.js does have fuzzy searching but for some reason I get fewer hits. Might be worth it to dig into the code or have others test out fuzzySearch?
  2. We could still use Lunr or write our own fuzzy search and then use List.js filtering
  3. We could still do keyword-in-context, though that would involve a few more tweaks to the current lesson_describe.html

Let me know if fuzzySearch works for you and I might test writing out the Lunr example with List.js, just so we can compare 😄

@mdlincoln
Copy link
Contributor

I think seeing a comparison using the Lunr code would be very useful for the tech team - would you be able to create a separate PR using Lunr instead?

@mariajoafana
Copy link
Contributor

mariajoafana commented May 5, 2020

This ticket has been open for a while now, are there any pending actions here? The full-text search using listjs ticket is closed, can be close this one too?

@mdlincoln
Copy link
Contributor

The active PR is #1726 this issue is very much active

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants