Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Fix bug when parsing SqlFunctionHandle #24059

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pdabre12
Copy link
Contributor

@pdabre12 pdabre12 commented Nov 16, 2024

Resolves: #24077

If release note is NOT required, use:

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Nov 16, 2024
@pdabre12 pdabre12 changed the title [native] Fix bug when handling multiple params and no params functions [native] Fix bug when parsing SqlFunctionHandle Nov 18, 2024
@pdabre12 pdabre12 marked this pull request as ready for review November 18, 2024 20:20
@pdabre12 pdabre12 requested a review from a team as a code owner November 18, 2024 20:20
@pdabre12
Copy link
Contributor Author

pdabre12 commented Nov 18, 2024

CC : @czentgr @aditi-pandit

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pdabre12

@@ -271,4 +271,8 @@ class VeloxBatchQueryPlanConverter : public VeloxQueryPlanConverterBase {
};

void registerPrestoPlanNodeSerDe();
void parseSqlFunctionHandle(
std::shared_ptr<protocol::SqlFunctionHandle> sqlFunction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use const&

break;
}
auto argumentType = functionId.substr(start + 1, pos - start - 1);
rawInputTypes.push_back(stringToType(argumentType, typeParser));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add VELOX_CHECK that !argumentType.empty()

facebook::presto::parseSqlFunctionHandle(
sqlFunctionHandle, actualRawInputTypes, typeParser);
ASSERT_TRUE(actualRawInputTypes.empty());
EXPECT_EQ(expectedRawInputTypes.size(), actualRawInputTypes.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the next 3 lines, since the expectedRawInputTypes is empty ?

std::vector<TypePtr> actualRawInputTypes;
std::vector<TypePtr> expectedRawInputTypes;
TypeParser typeParser;
auto sqlFunctionHandle =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about what is the significance of the event_based_revenue argument in the json if the SQLFunction handle doesn't have any arguments ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I blindly copied the test cases from https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/presto_protocol/tests/CallExpressionTest.cpp.
I just needed to have my hands on an instance of a SqlFunctionHandle for the unit test, because the parseSqlFunctionHandle function had a bug which failed in the parsing of the functionId string.
Do you think this is a good way to test it? Please let me know if you have any better ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdabre12 : Where do you anticipate this json to be generated ? You can run a query with an aggregate function and pick up the json from the web-console.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only be generated for functions in the new function namespace manager : #23358.
Below is the json for the array_constructor function :

                "@type" : "call",
                  "sourceLocation" : {
                    "line" : 1,
                    "column" : 74
                  },
                  "displayName" : "array_constructor",
                  "functionHandle" : {
                    "@type" : "native",
                    "signature" : {
                      "name" : "native.default.array_constructor",
                      "kind" : "SCALAR",
                      "typeVariableConstraints" : [ ],
                      "longVariableConstraints" : [ ],
                      "returnType" : "array(double)",
                      "argumentTypes" : [ "double" ],
                      "variableArity" : false
                    },
                    "functionId" : "native.default.array_constructor;double",
                    "version" : "1"
                  },

The problem is without #23358 landing, native execution does not understand the native functionHandle type, I suppose I could pick the json but keep the functionHandle type as json_file as we really only are testing the parsing of the functionId string because that's where the bug manifests, let me know if that sounds okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdabre12 : Yes, that is fine. Thanks.

facebook::presto::parseSqlFunctionHandle(
sqlFunctionHandle, actualRawInputTypes, typeParser);
ASSERT_FALSE(actualRawInputTypes.empty());
EXPECT_EQ(expectedRawInputTypes.size(), actualRawInputTypes.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a small inline function to take expectedRawInputTypes and sqlFunctionHandle as inputs and do the parse and compare actualRowInputTypes and expectedRawInputTypes.

This snippet is repeated in all the functions.

"@type": "call",
"arguments": [
{
"@type": "variable",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm confused since the type of this variable is real, but the SQLFunctionHandle has an integer parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the above comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to the above comment,
This is the string that was present in the CallExpression unit tests.

std::string str = R"(
          {
            "@type": "call",
            "arguments": [
              {
                "@type": "variable",
                "name": "event_based_revenue",
                "type": "real"
              }
            ],
            "displayName": "sum",
            "functionHandle": {
              "@type": "json_file",
              "functionId": "json.x4.sum;INTEGER;INTEGER",
              "version": "1"
            },
            "returnType": "double"
          }
    )"

ASSERT_NE(p.functionHandle, nullptr);

std::vector<TypePtr> actualRawInputTypes;
std::vector<TypePtr> expectedRawInputTypes{BIGINT(), BIGINT()};
Copy link
Contributor

@aditi-pandit aditi-pandit Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a function with all the different input types (like INT, BIGINT, REAL, DOUBLE, VARCHAR etc) to ensure they are all parsed correctly ?

How are complex types handled in this function ? Do you have an example ?

class PrestoToVeloxQueryPlanTest : public ::testing::Test {};

TEST_F(PrestoToVeloxQueryPlanTest, parseSqlFunctionHandleWithZeroParam) {
std::string str = R"(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add a test corresponding to the error that you saw previously that led you to debugging this issue.

Copy link
Contributor Author

@pdabre12 pdabre12 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the error was very generic.
Whenever we tried to parse any SqlFunctionHandle with zero / multiple params, because of the wrong exit condition it lead to a never-ending loop, for eg: "functionId": "json.x4.sum;" failed because of zero params, and "functionId": "json.x4.sum;bigint;bigint;" will fail too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure to parse SqlFunctionHandle
3 participants