Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rename PPL function #618

Merged
merged 7 commits into from
Oct 1, 2024
Merged

Conversation

kt-eliatra
Copy link
Contributor

Description

This PR implements the rename PPL function.

Issues Resolved

#509

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kt-eliatra
Copy link
Contributor Author

There is one problem, when the rename command is not followed by the fields command,
the results contain duplicate values - from both the original and the renamed column, e.g https://github.com/opensearch-project/opensearch-spark/pull/618/files#diff-cb628f2b7ed981862c709fe9b25149c53b5d7eb2fe87a8bb1bbfce4c702352b1R105-R108.
I'm not sure how to solve this. Any ideas? Or maybe this can be accepted as a known limitation?

@YANG-DB
Copy link
Member

YANG-DB commented Sep 6, 2024

There is one problem, when the rename command is not followed by the fields command, the results contain duplicate values - from both the original and the renamed column, e.g https://github.com/opensearch-project/opensearch-spark/pull/618/files#diff-cb628f2b7ed981862c709fe9b25149c53b5d7eb2fe87a8bb1bbfce4c702352b1R105-R108. I'm not sure how to solve this. Any ideas? Or maybe this can be accepted as a known limitation?

please check out the next code that should help remove the double projected fields:

@YANG-DB
Copy link
Member

YANG-DB commented Sep 6, 2024

please also add README examples of this command usage

@kt-eliatra
Copy link
Contributor Author

There is one problem, when the rename command is not followed by the fields command, the results contain duplicate values - from both the original and the renamed column, e.g https://github.com/opensearch-project/opensearch-spark/pull/618/files#diff-cb628f2b7ed981862c709fe9b25149c53b5d7eb2fe87a8bb1bbfce4c702352b1R105-R108. I'm not sure how to solve this. Any ideas? Or maybe this can be accepted as a known limitation?

please check out the next code that should help remove the double projected fields:

* [Context projected fields list](https://github.com/opensearch-project/opensearch-spark/blob/88ad15fd208b78c50b061226eb45e9d15f998ce3/ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/CatalystPlanContext.java#L73)

* [Usage of this list](https://github.com/opensearch-project/opensearch-spark/blob/88ad15fd208b78c50b061226eb45e9d15f998ce3/ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/utils/ParseStrategy.java#L43)
drop table if exists test;

create table test (
  name STRING, age INT, email STRING, street_address STRING
);

insert into test values 
("Alice", 30, "alice@example.com", "123 Main St, Seattle"),
("Bob", 55, "bob@test.org", "456 Els St, Portland"),
("Charlie", 65, "charlie@domain.net", "789 Pine St, San Francisco"),
("David", 19, "david@anotherdomain.com", "101 Maple St, New York");

source=test | parse email '.+@(?<email>.+)' | fields email;

returns

[AMBIGUOUS_REFERENCE] Reference `email` is ambiguous, could be: [`email`, `spark_catalog`.`default`.`test`.`email`].

Documentation says that parse command can override existing field, at least in the "main sql plugin" https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/cmd/parse.rst#example-2-override-an-existing-field

@YANG-DB
Copy link
Member

YANG-DB commented Sep 11, 2024

@kt-eliatra thanks - this looks like an issue in the parse command ?
in addition can you plz add README examples of this command usage

@salyh
Copy link
Contributor

salyh commented Sep 12, 2024

@kt-eliatra thanks - this looks like an issue in the parse command ? in addition can you plz add README examples of this command usage

We opened this issue regarding the parse command: #650 (thx @kt-eliatra)

But our assumption is that replacing a existing column with new computed column with same name is impossible to do in spark.

@YANG-DB YANG-DB added Lang:PPL Pipe Processing Language support backport 0.5 backport 0.5-nexus labels Sep 12, 2024
Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some CI test are failing - plz review
Please add usage example in the readme.md doc

@kt-eliatra
Copy link
Contributor Author

Some CI test are failing - plz review Please add usage example in the readme.md doc

it should be fine now

Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can u plz resolve the latest conflict ?
thanks

@kt-eliatra
Copy link
Contributor Author

Can u plz resolve the latest conflict ? thanks

done

@YANG-DB
Copy link
Member

YANG-DB commented Sep 27, 2024

@kt-eliatra sorry agains - plz resolve latest conflict...

Signed-off-by: Kacper Trochimiak <kacper.trochimiak@eliatra.com>
Signed-off-by: Kacper Trochimiak <kacper.trochimiak@eliatra.com>
Signed-off-by: Kacper Trochimiak <kacper.trochimiak@eliatra.com>
Signed-off-by: Kacper Trochimiak <kacper.trochimiak@eliatra.com>
Signed-off-by: Kacper Trochimiak <kacper.trochimiak@eliatra.com>
@kt-eliatra
Copy link
Contributor Author

@kt-eliatra sorry agains - plz resolve latest conflict...

np, done :)

@LantaoJin
Copy link
Member

LantaoJin commented Sep 27, 2024

OH, there is a PR for rename, I opened a duplicated issue #697. I am starting to working on the implementation, but I am okey you have already started this. I will stop mine.
However, the rename is not just an alias of field, make sure the source field never be accessed again.
For example. source=t | rename a as b | fields a should throw exception.

Further, may not need to be done in this PR, the a field after renaming can't be accessed again in any commands:
e.g. source=t | rename a as b | stats sum(a).
And the identity a could be reused without conflicts:
e.g source=t | rename a as b | eval a = c + 1

@YANG-DB
Copy link
Member

YANG-DB commented Sep 28, 2024

@kt-eliatra plz fix scalafmtCheck checkstyle

Signed-off-by: Hendrik Saly <hendrik.saly@eliatra.com>
@salyh
Copy link
Contributor

salyh commented Sep 29, 2024

@kt-eliatra plz fix scalafmtCheck checkstyle

done


test("test single renamed field in fields command") {
val frame = sql(s"""
| source = $testTable | rename age as renamed_age | fields name, renamed_age
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test the following case:

| source = $testTable | rename age as renamed_age | fields age

Ref #618 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, the query source = $testTable | rename age as renamed_age | fields age has worked "correctly"; that is, the field age was present in the query result. I introduced corrections and updated the tests. Currently, the execution of the previously mentioned query causes an error.

Signed-off-by: Lukasz Soszynski <lukasz.soszynski@eliatra.com>
@YANG-DB YANG-DB merged commit deaa83e into opensearch-project:main Oct 1, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 Lang:PPL Pipe Processing Language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants