Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hog): lambdas #24369

Closed
wants to merge 60 commits into from
Closed

feat(hog): lambdas #24369

wants to merge 60 commits into from

Conversation

mariusandra
Copy link
Collaborator

@mariusandra mariusandra commented Aug 14, 2024

Problem

We need to support lambdas in order to support many ClickHouse functions (arrayExists, arrayMap, etc). This is required to match element chain texts in filters in Hog destinations.

Changes

Language features

Functions are now first class variables and can be written as lambdas:

fn func1 () { ... }
let func2 := () -> { ... }
let func3 := x -> x * 2

We can now call () things that are not just identifiers (name()):

let func := x -> x * 2
let arr := [func]

print(func(2))
print(arr[1](2))
print((x -> x * 2)(2))

Closures

Implements closures and upvalues, making the following work as expected:

// example 1
fn outer() {
  let x := 'outside'
  fn inner() {
    print(x)
  }
  return inner
}
let closure := outer()
closure() // prints 'outside'

// another example
let var := 5
let varify := x -> x * var
print(varify(2)) // 10
var := 10
print(varify(2)) // 20

Inline STL

Adds support for arrayMap, arrayExists, arrayFilter:

print(arrayMap(x -> x * 2, [1,2,3]))
print(arrayExists(x -> x like '%nana%', ['apple', 'banana', 'cherry']))
print(arrayFilter(x -> x like '%e%', ['apple', 'banana', 'cherry']))

This is where it gets a bit tricky. Those are added via an "inlined STL". Effectively the bytecode compiler does a round of static analysis and figures out if any of those STL functions will be called. If so, it appends source code for the function before your script.

All three functions are written as Hog functions behind the scenes, for example:

fn arrayExists(func, arr) {
  for (let i in arr) {
    if (func(i)) {
      return true
    }
  }
  return false
}

This was easier (and safer) to get working than implementing a back-and-forth layer between the VM and "native code". We can today call native code from the VM (all those STL functions), but to the do an UNO reverse and call a VM function from native code... requires a refactor too big for this PR.

Currently those STL functions are inlined into the Hog bytecode. I'm not sure if we want to keep it that way, or ship the function bytecodes with each HogVM itself, but this can be changed later.

There are some positive things about having a STL written within Hog: 1) it's easier to extend, and 2) it requires less changes in all the different implementations of Hog (Python vs TS vs future Rust?)

Bytecode versions

Until now bytecode was in the format: ['_h', /* rest of bytecode */]

That's considered version 0.

Now compiled bytecode looks like ['_H', 1, /* rest of bytecode */]

The 1 is the bytecode version field. I had to make some breaking changes from v0 due to optional function arguments (needed to flip something around), yet wanted to not break all existing compiled bytecode. Versions are a way to do this.

I still need to verify, but all existing bytecode should work with both VMs. The slow migration path would be:

  1. We get this in, all code should keep working as it was.
  2. We recompile all filters and bytecodes.
  3. Eventually I can remove support for bytecode version 0 from the codebase.

Global access

Accessing undefined globals is now a compile time error. Previously they would just return silent nulls.

TODO

  • Provide STL function argument counts
  • Backfill with Nones if optional arguments (for missing HogError payloads)
  • Make a decision on missing globals --> throw or null?
  • Get provided library functions working well
  • Implement closures
  • Implement upvalues
  • Recursion
  • Closing of upvalues
  • Return a function and call it directly (x()())
  • Implement map and filter
  • Rename splice_stack_1 to something better
  • Add tests for calling different functions with more or less args
  • Fix double ()(...):
let decode := () -> base64Decode
let sixtyFour := base64Encode
print(decode()(sixtyFour('http://www.google.com')))
  • Move function in front of args in bytecode for CALL_LOCAL
  • Test with old filters and hog functions bytecode
  • Make sure all old bytecode keeps working
  • Capture upvalues in splice_stack_2?

How did you test this code?

WIP

@posthog-bot
Copy link
Contributor

It looks like the code of hogql-parser has changed since last push, but its version stayed the same at 1.0.36. 👀
Make sure to resolve this in hogql_parser/setup.py before merging!

@posthog-bot
Copy link
Contributor

It looks like the code of hogql-parser has changed since last push, but its version stayed the same at 1.0.36. 👀
Make sure to resolve this in hogql_parser/setup.py before merging!

@posthog-bot
Copy link
Contributor

It looks like the code of hogql-parser has changed since last push, but its version stayed the same at 1.0.36. 👀
Make sure to resolve this in hogql_parser/setup.py before merging!

@mariusandra
Copy link
Collaborator Author

All code has been extracted from this branch, thus closing the PR. 🫡

@mariusandra mariusandra deleted the hog-lambdas branch August 28, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants