Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel][Expressions] Make the default expression handler evaluation lazy #2541

Closed
2 of 8 tasks
allisonport-db opened this issue Jan 18, 2024 · 2 comments · Fixed by #2853
Closed
2 of 8 tasks

[Kernel][Expressions] Make the default expression handler evaluation lazy #2541

allisonport-db opened this issue Jan 18, 2024 · 2 comments · Fixed by #2853
Labels
enhancement New feature or request good medium issue Good for those with Delta Lake experience kernel
Milestone

Comments

@allisonport-db
Copy link
Collaborator

allisonport-db commented Jan 18, 2024

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

The expression handler should short-circuit on logical operators. This requires making all expression evaluation lazy since evaluation needs to be on a per-row basis.

First step: Convert all comparators (e.g. =, > etc) to be lazy.
Rough idea on how to do that is:

  • Create a comparator based on the data type of inputs. Both inputs are expected to have the same inputs.
interface VectorComparator {
   /*
    * Compare the entry at given `rowId` in `left` and `right` vectors and return the comparison
    * result (-1, 0, 1)
   int compare(ColumnVector left, ColumnVector right, int rowId)
}
  • We need utility method that can create these comparator. Pseudo code looks like below
  DataType dataType = left.getDataType();

   if (dataType instance BooleanType) {
      return new VectorComparator() {
        int compare(ColumnVector left, ColumnVector right, int rowId) {
            boolean leftResult = left.getBoolean(rowId);
            boolean rightResult = right.getBoolean(rowId);
            return Boolean.compare(leftResult, rightResult);
        }
     } else if (dataType instance of IntegerType) {
    ....
     }
     .. more types...
   }
  • Return a ColumnVector that wraps the above created comparator and returns result based on the comparator operator type
return new ColumnVector() {
   public getDataType() { return BooleanType.BOOLEAN;}

   public getSize() { argResults.leftResult.getSize(); }

   public isNullAt(int rowId) {
       boolean isLeftNull = argResults.leftResult.isNullAt(rowId);
       boolean isRightNull = argResults.rightResult.isNullAt(rowId);
       return isLeftNull || isRightNull;
   }

  public getBoolean(int rowId) {
      // check both are null. if yes return true, otherwise..

     // `==` will change depending upon the operator we are evaluating.
      compartor.compare(leftResult, rightResult, rowId) == 0
   }
}

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
@allisonport-db allisonport-db added enhancement New feature or request kernel labels Jan 18, 2024
@vkorukanti vkorukanti added the good medium issue Good for those with Delta Lake experience label Feb 19, 2024
@vkorukanti vkorukanti added this to the 3.2.0 milestone Feb 22, 2024
@zzl-7
Copy link
Contributor

zzl-7 commented Apr 3, 2024

FYI , I am working on this right now per recommendation from #2830 (comment)
Will submit a PR in a few days
Thanks :)

@zzl-7
Copy link
Contributor

zzl-7 commented Apr 5, 2024

Hi @vkorukanti I added lazy evaluation support for comparator expressions
#2853
I have some question on the behavior of null comparison, if you have time can you take a look
Thank you

@vkorukanti vkorukanti modified the milestones: 3.2.0, 3.3.0 Apr 24, 2024
vkorukanti pushed a commit that referenced this issue May 30, 2024
## Description
Resolves #2541

## How was this patch tested?
Existing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good medium issue Good for those with Delta Lake experience kernel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants