Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): support javascript/python script User Defined Aggregate Function #17054

Merged
merged 23 commits into from
Dec 24, 2024

Conversation

forsaken628
Copy link
Collaborator

@forsaken628 forsaken628 commented Dec 16, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

part of #16729

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@forsaken628 forsaken628 force-pushed the udaf branch 2 times, most recently from 5a56a73 to 2318578 Compare December 18, 2024 03:13
@forsaken628 forsaken628 changed the title WIP feat(query): User Defined Aggregate Function script feat(query): support User Defined Aggregate Function script Dec 18, 2024
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Dec 18, 2024
@forsaken628 forsaken628 changed the title feat(query): support User Defined Aggregate Function script feat(query): support javascript/python script User Defined Aggregate Function Dec 20, 2024
Signed-off-by: coldWater <forsaken628@gmail.com>
@forsaken628 forsaken628 marked this pull request as ready for review December 23, 2024 02:01
@forsaken628 forsaken628 requested a review from sundy-li December 23, 2024 02:50
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the metadata section of this PR.

For the remaining sections, I recommend having them reviewed by team members with direct expertise in those components, as they would be better positioned to evaluate the specific implementation details.
@sundy-li

Reviewed 7 of 52 files at r1.
Reviewable status: 7 of 52 files reviewed, 7 unresolved discussions (waiting on @forsaken628 and @sundy-li)


src/meta/proto-conv/tests/it/v115_add_udaf_script.rs line 100 at r1 (raw file):

    common::test_pb_from_to(func_name!(), want())?;
    common::test_load_old(func_name!(), bytes.as_slice(), 115, want())
}

Is this case related to the new type UDAFScript? It looks like a duplicate of the UDF test.

Code quote:

#[test]
fn test_decode_udf_script() -> anyhow::Result<()> {
    let bytes: Vec<u8> = vec![
        10, 5, 109, 121, 95, 102, 110, 18, 21, 84, 104, 105, 115, 32, 105, 115, 32, 97, 32, 100,
        101, 115, 99, 114, 105, 112, 116, 105, 111, 110, 50, 78, 10, 9, 115, 111, 109, 101, 32, 99,
        111, 100, 101, 18, 5, 109, 121, 95, 102, 110, 26, 6, 112, 121, 116, 104, 111, 110, 34, 17,
        154, 2, 8, 58, 0, 160, 6, 115, 168, 6, 24, 160, 6, 115, 168, 6, 24, 42, 17, 154, 2, 8, 74,
        0, 160, 6, 115, 168, 6, 24, 160, 6, 115, 168, 6, 24, 50, 6, 51, 46, 49, 50, 46, 50, 160, 6,
        115, 168, 6, 24, 42, 23, 49, 57, 55, 48, 45, 48, 49, 45, 48, 49, 32, 48, 48, 58, 48, 48,
        58, 48, 48, 32, 85, 84, 67, 160, 6, 115, 168, 6, 24,
    ];

    let want = || UserDefinedFunction {
        name: "my_fn".to_string(),
        description: "This is a description".to_string(),
        definition: UDFDefinition::UDFScript(UDFScript {
            code: "some code".to_string(),
            handler: "my_fn".to_string(),
            language: "python".to_string(),
            arg_types: vec![DataType::Number(NumberDataType::Int32)],
            return_type: DataType::Number(NumberDataType::Float32),
            runtime_version: "3.12.2".to_string(),
        }),
        created_on: DateTime::<Utc>::default(),
    };

    common::test_pb_from_to(func_name!(), want())?;
    common::test_load_old(func_name!(), bytes.as_slice(), 115, want())
}

src/meta/proto-conv/src/udf_from_to_protobuf_impl.rs line 179 at r1 (raw file):

            .arg_types
            .into_iter()
            .map(|arg_type| Ok(DataType::from(&TableDataType::from_pb(arg_type)?)))

Could be simplified like this?

Suggestion:

            .map(|arg_type| DataType::from(&TableDataType::from_pb(arg_type)))

src/meta/proto-conv/src/udf_from_to_protobuf_impl.rs line 191 at r1 (raw file):

                    DataType::from(&TableDataType::from_pb(data_type)?),
                ))
            })

Same here:

Code quote:

            .map(|(name, data_type)| {
                Ok(DataField::new(
                    name,
                    DataType::from(&TableDataType::from_pb(data_type)?),
                ))
            })

src/meta/app/src/principal/user_defined_function.rs line 56 at r1 (raw file):

    pub return_type: DataType,
    pub runtime_version: String,
}

Please add comment explaining these essential fields.

Code quote:

pub struct UDAFScript {
    pub code: String,
    pub language: String,
    pub arg_types: Vec<DataType>,
    pub state_fields: Vec<DataField>,
    pub return_type: DataType,
    pub runtime_version: String,
}

src/meta/protos/proto/udf.proto line 62 at r1 (raw file):

  repeated DataType arg_types = 5;
  repeated string state_names = 6;
  repeated DataType state_types = 7;

Why not using a map of <state_name, state_type>? Does the order matter?

Code quote:

  repeated string state_names = 6;
  repeated DataType state_types = 7;

src/meta/protos/proto/udf.proto line 75 at r1 (raw file):

    UDFServer udf_server = 4;
    UDFScript udf_script = 6;
    // UDAFServer udaf_server = 7;

If this field is intentionally reserved, add a comment explaining the purpose of reserving it.

Copy link
Collaborator Author

@forsaken628 forsaken628 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 52 files reviewed, 7 unresolved discussions (waiting on @drmingdrmer and @sundy-li)


src/meta/proto-conv/tests/it/v115_add_udaf_script.rs line 100 at r1 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Is this case related to the new type UDAFScript? It looks like a duplicate of the UDF test.

Tests for databend_common_meta_app::principal::UDFScript are missing


src/meta/protos/proto/udf.proto line 62 at r1 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

Why not using a map of <state_name, state_type>? Does the order matter?

Compared to [(a,b)], ([a],[b]) will have better compatibility, and smaller size. The cost is not so intuitive.

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 52 files at r1.
Reviewable status: 8 of 52 files reviewed, 7 unresolved discussions (waiting on @forsaken628 and @sundy-li)


src/meta/proto-conv/tests/it/v115_add_udaf_script.rs line 100 at r1 (raw file):

Previously, forsaken628 (coldWater) wrote…

Tests for databend_common_meta_app::principal::UDFScript are missing

Oh. Thank you!

UDFScript seems to be introduced in commit 6777c17, in #14799.
Thus it belongs to v081_udf_script.rs. Please move it there. So that when a v081 is no longer supported, just to remove v081_udf_script.rs would completely remove all the associated tests.


src/meta/protos/proto/udf.proto line 62 at r1 (raw file):

Previously, forsaken628 (coldWater) wrote…

Compared to [(a,b)], ([a],[b]) will have better compatibility, and smaller size. The cost is not so intuitive.

The cost might be unclear, but my main concern is the relationship between the name and value. If there is a one-to-one mapping between state_name and state_type, it would be more appropriate to use a map instead of two separate vectors. Otherwise, the higher-level logic would need to constantly validate their correspondence before using them.

Copy link
Member

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sundy-li
Copy link
Member

documents about UDAF @soyeric128

@sundy-li sundy-li added this pull request to the merge queue Dec 24, 2024
Merged via the queue into databendlabs:main with commit 6770592 Dec 24, 2024
72 of 73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants