Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap to launch #32

Open
2 tasks done
valencik opened this issue Mar 27, 2023 · 2 comments
Open
2 tasks done

Roadmap to launch #32

valencik opened this issue Mar 27, 2023 · 2 comments
Labels
laika Laika integration
Milestone

Comments

@valencik
Copy link
Contributor

valencik commented Mar 27, 2023

Roadmap

Note: This is a work in progress, some steps are missing.

Goal
Provide building blocks for search on Typelevel sites

Summary

  • We need to expand the build matrices of textmogrify, lucille, and protosearch to include Scala 2.12
  • Need to build an sbt-pluign to index, and modify sbt-typelevel to add search to site

Libraries Overview

  • Text Analysis -> textmogrify
    • text analysis is a preprocessing step at both indexing time and query parsing time
  • Query Parsing -> lucille
    • parse user input into a query tree, provide common helpers to rewrite/modify queries
  • Indexing / Query Execution -> protosearch
    • building an index from raw documents given an analyzer, executing queries against that index given a query parser

Analysis

  • Currently we are not leveraging textmogrify
  • Currently protosearch uses a minimalist tokenizer split(" ")
  • If we want textmogrify to be the analysis library, it needs a pure Scala implementation of at least the StandardAnalyzer
  • For an MVP, it could be a some white space tokenizer, probably
  • Lucille probably needs some way to leverage an analyzer/tokenizer during parsing

sbt-plugin

Integrate with laika site

  • We'll need to modify the Laika template sbt-typelevel is using to actually show and power a search bar somewhere.
  • Should we use any frameworks here like calico, or still to pure ScalaJS?
@valencik
Copy link
Contributor Author

valencik commented Mar 27, 2023

What kind of compatibility guarantees should we provide on the index's binary representation?

If possible, I effectively want to provide zero guarantees for now.
I am pretty confident the index structure will change over time, and if we're optimizing for small document collections, there's no reason we can't be frequently rebuilding the index.

@valencik valencik pinned this issue Apr 5, 2023
@valencik
Copy link
Contributor Author

Analysis

To provide a reasonable search over the types of documents we have in cats-effect, http4s, cats, and fs2, we likely want to do some sub-document search, turning every header section into a "sub document".
With this we might possibly be able to get away without highlighting support initially.

@valencik valencik added this to the First Release milestone Apr 22, 2023
@valencik valencik added the laika Laika integration label Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
laika Laika integration
Projects
None yet
Development

No branches or pull requests

1 participant