Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new external search solution #1704

Closed
ang-zeyu opened this issue Dec 14, 2021 · 1 comment
Closed

A new external search solution #1704

ang-zeyu opened this issue Dec 14, 2021 · 1 comment

Comments

@ang-zeyu
Copy link

ang-zeyu commented Dec 14, 2021

Hi folks,

Not sure if this is the right place to post this; feel free to close this issue.

I've just published a little pet project I have been working on this year.

It consists of a cli file indexer, a search library powered by wasm (rust / typescript), and search ui (typescript).

There are 2 main motivations behind this project, to sum it up - more scalability and a complete (indexer -> up till ui) pre built index search solution.
  • Scalability
    My original motivation in creating this project was primarily this issue https://github.com/olivernn/lunr.js/issues/222; Most (if not all, afaik) existing client side search libraries are only able to generate a prebuilt index that is monolithic. This has obvious implications when your collection scales (i.e. you'll need to start considering a search SaaS / server), because it becomes near impossible to download the entire index in a timely manner. (also, memory usage implications)

    The primary approach / difference here as such is providing the option of fragmenting the index into many separate files; At search time, only files needed (by what's searched) are retrieved. There are varying levels of scalability vs file bloat vs response time that can be configured.

    Use of WebAssembly
    I also wanted to see how far I could push the boundaries with compression schemes as such, and therefore pivoted to WebAssembly. The entire thing was built in pure typescript originally, which is quite measurably slower for low level byte-wise processing (not an apples to apples comparison but worth mentioning indexer speed also went from 10min -> 10s =P).

    There were several other reasons, mostly related to more "fancy" features that I wanted to implement efficiently, for example:

    • query term proximity ranking
    • phrase queries
    • get my hands wet with wasm (my first project with it =P)
  • A complete, "offline" prebuilt index search solution
    The secondary motivation quite simply, is providing a complete (indexer -> search library -> ui) search solution / replacement that can built into other software (without something like algolia docsearch which isn't always an option).

Back to why I'm posting this here

I've created a mdbook plugin that is basically a replacement for the search function here. It's built on top of the cli indexer and search ui.

It does a few extras vs the generic library:

  • theming (using mdbook's css variables)
  • automatic scaling: there are various ways to configure the indexer / search in terms of scalability, file bloat, and response time (see here if interested). The preprocessor detects the collection size using a simple ch.content.len() summation and adjusts these settings accordingly.
  • replicate the "navigate searched terms" behaviour partially (quite literally just ctrl-c+v ed the doSearchOrMarkFromUrl function here for now =X)

I'd like to think of this plugin as just a proof of concept at this point for the cli indexer and search ui as well.
Some small things aren't quite on par with the implementation here yet, but should be straightforward to add in coming iterations:

  • no option for breadcrumbs, yet (replaced with title -> heading -> body)
  • keyboard event integration isn't quite there yet (e.g. escape)
  • no search icon (the search bar is always there) - still trying to find a way to add this in within a preprocessor
  • it depends on some existing implementations within mdbook (the search css theme variables mostly), but should be straightforward to pull over completely as well.
  • the indexer cli tool has to be installed separately. I might find a way to build this in shortly.

Some other pluses this tool offers vs the default:

  • typo tolerance
  • phrase, boolean, queries
  • a few more I'll leave to the docs 😁

There are some obvious general downsides I would highlight as well,:

  • ❗ use of wasm -- no IE support (I might look into this in the future, but I think its not quite worth it given the daily decreasing IE usage)
    • I also don't see this replacing mdbook's main search feature as such, at least, as long as IE is a supported target.
  • no client side indexing (unrelated to mdbook, but more generally as a client side search library), this is not within scope

Would love to hear your thoughts!

@ang-zeyu
Copy link
Author

Closing this after all as there's nothing actionable (in case anyone would like to completely decouple / separate search from mdbook) given the IE limitation. =X

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant