-
-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lyra): Add WebAssembly support #194
Conversation
} | ||
} | ||
|
||
if found == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkomyno I'm not sure about this. Why are we returning early?
… tf-idf score is the same
PR working fine on Node.js via ESM and CJS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Life is pain, but this PR is suffering folks |
Ok so @ShogunPanda, @jkomyno, this PR is ready to get merged. My main concern is that it will force us to introduce a breaking change by making the The problem is not with the breaking change per se, but with the fact that using a promise could reduce the throughput when running multiple My idea would be either to:
A nice fact is that given how Lyra is built, functions are totally isolated and tree-shakable, so it's up to the user to decide if they want to import the WASM implementation or use the default JS fallback. Pros for making
Cons of making
I think maintaining two separate Let me know what you think folks 🙏 |
How would it reduce the throughput? Also, my 2 cents is to introduce the breaking change by making |
I'd go for the async search, I dont see why performance would be inferior. When I changed the tree I was wondering to propose it because I wanted to make |
There's a lot of literature regarding promises performances, and I am wondering if that is a legit concern for Lyra; just to name a couple of great articles (there are also benchmarks in there):
But you're the expert here Rafael, I trust you 🙂 I know @ShogunPanda has a different opinion on promises though, so let's wait for him too. One other concern I have is about DX and consistency: Lyra will expose the following fundamental functions:
While the reason why About @marco-ippolito comment: I think we could easily make a couple of That said, I'd also personally prefer making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
strategy: | ||
fail-fast: true | ||
matrix: | ||
os: [ubuntu-latest] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to include OSX and Windows here
I already anticipated this to @micheleriva directly, but I'm leaving my opinion here just in case. I think we should go with an approach similar to fastify and other packages: the search function might accept the In other words, the function will become something like this (pseudocode): function search<T>(/* */): T | Promise<T> {
// Do something
const intersected = intersectTokenScores(/* ... */); // Note that intersectTokenScores comes from arguments
if(typeof intersected.then === function) {
return intersected.then(sets => finalizeSearch(sets))
} else {
return finalizeSearch(intersected)
}
} |
@ShogunPanda my main concern with this approach is that it can be confusing for some developers... here's my proposal. I'd ship WASM as an experimental feature for now, that can be enabled in the following way: import { create } from '@lyrasearch/lyra'
const db = create({
schema: { foo: 'string' },
optimizations: {
wasm: true
}
}) The Given that we always pass a Lyra instance to the import { create } from '@lyrasearch/lyra'
const db = create({
schema: { foo: 'string' },
optimizations: {
wasm: true
}
})
search(db, { term: 'foo' }) // <--- search knows if wasm interop is enabled Given that any Lyra instance is essentially an object, we can override this value with ease on demand: import { create } from '@lyrasearch/lyra'
const db = create({
schema: { foo: 'string' },
optimizations: {
wasm: true
}
})
db.optimizations.wasm = false I'm not sure I like it, but might be useful for benchmarks and tests while in an experimental stage. With that being said, we could easily change the I'd propose then to either choose to move on with a |
I'm usually not a big fan of this approach, It's kinda hard to maintain (fast-jwt verify 💀) and to build new components above because you always have to think whether it will return a promise or not. Most of the time you will put an await before independently just not to deal with the dual type return.
I'd choose whether to go |
@micheleriva @marco-ippolito I see both your points. Let's try something different. What about exposing two different interfaces and forbid passing optimizations to the regular one? Something like allowing both the following codes to be valid: import { create, search } from '@lyrasearch/lyra'
/*
This create does not allow for optimizations.
We specifically validate this in Javascript (instead of relying on TS types) so that we
can guide the user in picking the right one.
*/
const db = create(/* ... */)
const results = search(db) and import { create, search } from '@lyrasearch/lyra/async'
/*
Technically create should not be async (yet).
But since we are establishing the new interface we declare as async now so
that we don't need a breaking change later.
Create here accepts optimizations.
*/
const db = await create(/* ... */)
const results = await search(db) At the beginning of each methods for both version I would add a quick check to avoid mixing API. |
@ShogunPanda not a bad idea. So we'd basically alias the methods right? |
Pretty much. |
@ShogunPanda I love this idea. it's like the |
How do you know where I copied the idea from? 😁 |
@jkomyno I keep on getting the following error when importing the compiled JS binding:
Here is the full CI log: any idea? |
Update: we decided that ALL the Lyra functions will be async. Currently working on this. |
All Lyra functions are now async. WASM support will be experimental, and users will have to opt-in via the following interface: import { create } from '@lyrasearch/lyra'
import { intersectTokenScores } from '@lyrasearch/lyra/dist/esm/wasm/intersectTokenScores' // placeholder
await create({
schema: {
foo: 'string'
},
components: {
algorithms: {
intersectTokenScores
}
}
}) This will provide a unified interface that will allow people to bring their own optimizations, and optionally opt-in for the built-in ones. Gonna merge this and provide a separate build system for our WASM optimizations. Thank you all folks, what a ride! |
Context
This PR introduces Rust+WebAssembly support in Lyra, as asked privately by @micheleriva.
I'm currently targeting Node.js only, but you can extend this PR as needed for supporting Deno, browsers, and other JS runtimes.
As a motivating example, we were asked to write a skeleton for the
intersectTokenScores
function originally defined here in favor of theintersect_token_scores
defined in the newlyra-utils
crate here (and exposed to TypeScript via thelyra-utils-wasm
crate here).Tests
It should be noted that tests are currently failing, but we believe that is only due to a different ordering strategy used by
Rust
forintersect_token_scores
. However, we invite the Lyra authors to carefully check that's the case.How to build Rust → Wasm artifacts
With
Rust
andNode
1.6.5
16.15.1
or superiorcargo update -p wasm-bindgen
cargo install -f wasm-bindgen-cli@0.2.83
cd ./rust
(cd ./scripts && npm i)
export LYRA_WASM_PROFILE="release"
export LYRA_WASM_TARGET="nodejs"
node ./scripts/wasmAll.mjs
With
docker
(used by the CI)Docker
docker buildx build --load \ -f Dockerfile --build-context rust=rust \ . -t lyrasearch/lyra-wasm \ --progress plain
docker create --name tmp lyrasearch/lyra-wasm
docker cp dummy:/opt/app/src/wasm ./src/wasm
docker rm -f tmp
In both cases, you should observe the following artifacts in
./src/wasm/
:lyra_utils_wasm_bg.wasm
lyra_utils_wasm_bg.wasm.d.ts
lyra_utils_wasm.d.ts
lyra_utils_wasm.js
This will need to be included in the bundler of your choice. Moreover, you likely do not wish to store these artifacts in the repo, but would rather generate them on the fly in the CI. Feel free to change this as you see fit.