Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5201] feat(client-python): Implement expressions in python client #5646

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

SophieTech88
Copy link
Contributor

@SophieTech88 SophieTech88 commented Nov 22, 2024

What changes were proposed in this pull request?

Implement expression from java, including:

  • Expression.java
  • FunctionExpression.java
  • NamedReference.java
  • UnparsedExpression.java
  • distributions/
  • literals/
  • sorts/
  • transforms/

convert to python client, and add unit test for each class.

Why are the changes needed?

We need to support the expressions in python client

Fix: #5201

Does this PR introduce any user-facing change?

No

How was this patch tested?

Need to pass all unit tests.

@SophieTech88 SophieTech88 marked this pull request as draft November 22, 2024 06:24
@@ -0,0 +1,76 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call this a named reference?
Is there a case where a reference has no name?
Do we have to differentiate these two types of references?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xunliu Do you have any idea about the name for this feature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mchades Do you have any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call this a named reference?

It represents a reference to a field/column by its name, most common way to refer to columns in SQL: SELECT name FROM table

Is there a case where a reference has no name?

Yes, examples include:

  • Positional references: SELECT $1
  • Expression results: SELECT (a + b)
  • Anonymous subquery columns

Do we have to differentiate these two types of references?

We currently don't have UnnamedReference in Gravitino because we haven't encountered the usage scenarios.

So my suggestion is to keep it as it is like Java implementation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate this clarification. Thanks.

self._function_name == other._function_name
and self._arguments == other._arguments
)
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gut feeling is that you may want to leave a TODO here ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. Updated the TODO comment. Does that work for you?

@jerryshao
Copy link
Contributor

@mchades @xunliu would you please also help to review this PR?

@SophieTech88 SophieTech88 marked this pull request as ready for review November 25, 2024 05:21
@xunliu
Copy link
Member

xunliu commented Nov 25, 2024

hi @SophieTech88 I will help you improve this PR.
Please review my commit.

Copy link

@tengqm tengqm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we split this into a few smaller PRs.

pass

@abstractmethod
def data_type(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like that the above two methods are be implemented by subclasses anyway.
If that is true, I don't think we we do a pass here.
We may want to raise an exception instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Just updated the code to raise NotImplementedError() for those 2 functions.


def __init__(
self,
value: Union[int, float, str, datetime, time, date, bool, Decimal, None],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a Decimal here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@xunliu xunliu changed the title [5201]Implement expressions in python client [5201] feat(client-python): Implement expressions in python client Nov 26, 2024
Copy link
Contributor

@mchades mchades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we split this into a few smaller PRs.

Agree with @tengqm, the current PR is too large to review.

As this PR is focused on expressions, I suggest moving distributions, sorts, and transforms to separate PRs. This will make it easier for us to review this PR. WDYT? @SophieTech88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Support Expression system
5 participants