Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(evaluator): add evaluateSync #272

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

ricokahler
Copy link

@ricokahler ricokahler commented Feb 17, 2025

Summary

This PR introduces a synchronous evaluation mode (evaluateSync) to groq-js, allowing certain queries (e.g., permission checks) to run synchronously without async overhead. The core evaluation logic has been refactored to normalize both async and sync execution paths, minimizing code duplication while ensuring correctness.

Use Case & Motivation

This change was driven by the need to evaluate permissions synchronously for a document store with local optimistic edits. The previous async-only model added unnecessary overhead in cases where real-time sync evaluation was required. By introducing evaluateSync, evaluations can now be performed in the same sync execution frame, improving performance and reducing complexity in sync-dependent workflows.

Key Changes

  • Unified execution mode ('sync' | 'async'):

    • All evaluators, operators, and functions now receive a mode parameter to determine execution behavior.
    • In sync mode, functions avoid Promises and return static values.
    • Updated operator functions (+, in, match, etc.) to handle both sync and async contexts.
  • New co helper function for unified control flow:

    • Allows evaluators to use a generator-based approach while supporting both sync and async execution.
    • Helps avoid deep Promise chains in async mode.
  • Enhancements to StreamValue and StaticValue

    • Introduced .reduce() and .first() to both StreamValue and StaticValue for unified reductions and lookups. co can normalize between both of these usages for shared control flow.
  • StaticValue introduced for sync evaluation:

    • Instead of using StreamValue (which is async), sync evaluation paths now return StaticValue for immediate execution.
  • Updated evaluator functions (evaluate, EXECUTORS, etc.):

    • All evaluators pass mode recursively.
    • Parameter resolution, object access, and operators now handle sync execution.
  • New API: evaluateSync exported

    • Enables synchronous query execution.
    • Throws if a Promise is accidentally returned.

Remaining Work & Open Questions

  • Typing issues with co helper

    • TypeScript's generator typings don’t support proper yield overrides, leading to some typing issues.
    • Are there better approaches to achieve this?
  • Unit Tests:

    • More tests are needed to ensure correct sync execution.
    • Where would be the best place to put them?
  • Performance considerations:

    • Are there cases where this approach may introduce unintended overhead?
    • Are there alternative ways to structure the sync execution path?

Request for Feedback

Since this is still a draft PR, I’d appreciate any feedback on:

  • The overall approach—does this seem like a good way to handle sync vs. async evaluation?
  • Any better patterns for handling sync execution without duplicating logic.
  • Thoughts on the co helper—should we explore other options?
  • Best places to add more unit tests.

Would love any input before continuing to refine this further! 🚀

Copy link
Collaborator

@judofyr judofyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over the code now I think we wouldn't actually even need an evaluateSync-mode, but could instead say that it all depends on the implementation of the dataset: If dataset is a synchronous value I think we can guarantee that the result will also be a synchronous value.

Shouldn't we now be able to use the same co-trick, but generate a value which represents an array behind the scenes, and still be incremental? Basically change StreamValue to take in a similar coroutine, and then it can also act in synchronous mode if needed? 🤔

Comment on lines +521 to +525
if (mode === 'sync') {
const data = (yield value.get()) as unknown[]
const next: unknown[] = []

for (const item of data) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we here also be able to use the same generator approach to actually iterate over the values instead of converting to/from JS? 🤔

@ricokahler
Copy link
Author

ricokahler commented Feb 18, 2025

Looking over the code now I think we wouldn't actually even need an evaluateSync-mode, but could instead say that it all depends on the implementation of the dataset: If dataset is a synchronous value I think we can guarantee that the result will also be a synchronous value.

I like this. we could also add some function overloads to the types to that assert a value is synchronous if a synchronous iterator was passed as the dataset or not, however to fully realize this idea, we may want to take a step further and return an async iterator as the result ourselves. this may not be very practical though as a lot of groq operations require exhausting the iterable so hard to say how much benefit there is to it but it's in the spirit of it.

I also figured part of the reason why this lib is async is to attempt to break up the execution to allow queries on large datasets to not block the main thread. I see that StreamValue awaits nextTick before yielding. If we change this default behavior, does that constitute a break change and if so, are we willing to do that?

Shouldn't we now be able to use the same co-trick, but generate a value which represents an array behind the scenes, and still be incremental? Basically change StreamValue to take in a similar coroutine, and then it can also act in synchronous mode if needed? 🤔

The co trick works for unwrapping promises but it does not normalize between async and sync iteration. (for await vs just for). It's possible to normalize these if the Values themselves had higher level operations that returned a promise or not (i did this by adding first() to both StaticValue and StreamValue) but otherwise, two separate implementations for sync and async are needed.

for operations that require collecting all the values before emitting (e.g. scoring, unique), there shouldn't really be a disadvantage in collecting to an array first and processing synchronously but for certain functions (intersects) the result could be returned without exhausting the iterable which is very appealing for async iterables.

some specific questions:

  1. how important is breaking up execution (i.e. await nextTick()) to avoid blocking the main thread while processing in "an async mode"? this could be toggled regardless if the input is a synchronous iterable or not and has its own benefits and drawbacks. this is the only place i see where we are intentionally deferring execution
  2. if we change the behavior to synchronous execution when passed in a synchronous iterable (an array) and also remove the await nextTick() delay, is this considered a breaking change? if so, are we willing to move forward with that?
  3. how important is proper async iteration (possible to finish execution without exhausting the input async iterable)? if this is important, then we'd most likely need to implement two separate implementations for certain operations and have some sort of mode flag brought through the entire evaluation

If 1 is not important, then we can remove this. If we remove the await nextTick, the only reason to support async execution is to benefit from the lazy execution when providing an async iterable. this might be beneficial if say streaming in an NDJSON file from
the network. However this makes the answer to 3 be very important. If this is important, then are we okay with a mode flag being passed throughout the evaluator like it currently is in this draft implementation?

If the answer to 1 and 3 are not very important, then I recommend we only have a synchronous evaluator and we remove all the async logic but that's also a large change. Is that change breaking? Is that the direction we want to move forward with?

If 1 is important and 3 is not important then we can support and async and sync mode without needing to pass a "mode" flag around but the async iterable will always be collected and processed synchronously. we can still yield between operations though to break up execution but yielding between items in arrays seems contradictory give the next operation can't continue until all of it is collected anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants