Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Signature concept #7580

Open
izveigor opened this issue Sep 16, 2023 · 2 comments
Open

New Signature concept #7580

izveigor opened this issue Sep 16, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@izveigor
Copy link
Contributor

Is your feature request related to a problem or challenge?

Follow on #6559.

Argument quality:

  • definite (def) and undefined (undef);
  • equal (eq) and unequal (uneq);

Combine qualities:

  • eq-def (equal definite)
  • eq-undef (equal undefined)
  • uneq-def (unequal definite)
  • uneq-undef (unequal undefined)

Algebra

Kleene algebra {+, ·, *}

TypeSignature Kleene algebra
Variadic (uneq-def1 + uneq-def2 + ...)*
VariadicEqual (eq-undef)*
VariadicAny (uneq-undef)*
Uniform (uneq-def1 + uneq-def2 + ...)^n
Exact (uneq-def1 · uneq-def2 · ...)
Any (uneq-undef)^n

(+): present in the current version;

(-): not present in the current version;

Undefined

Kleene algebra/Quality eq-undef uneq-undef
(arg)* VariadicEqual (+) VariadicAny (+)
(arg)^n Equal (-) Any (+)

Definite

Kleene algebra/Quality eq-def uneq-def
(arg1 + arg2 + ...)* Variadic with single argument (+) Variadic with multiple arguments (+)
(arg1 + arg2 + ...)^n Uniform with single argument (+) Uniform with multiple arguments (+)
(arg1 · arg2 · ...) Exact with the same data type (+) Exact with different data types (+)

Meta Algebra

Kleene algebra {+, ·, *}

TypeSignature Kleene algebra Usage
OneOf (+) expr1 + expr2 + ... used with any TypeSignature if it makes sense
Concat (-) expr1 · expr2 · ... Exact, Uniform, Equal, Any (without Kleene closure)

Argument expansion

/// A function's nested argument.
pub enum TypeNestedArgument {
    /// `DataType::List`
    List,
    /// `DataType::Struct`
    Struct,
}
/// A function's type argument, which defines the function's valid data types.
pub enum TypeArgument {
    /// Supports arbitrarily (with any dimensions) nested data types (like `DataType::List`)
    /// `base_types`: base data types for nested type
    /// `include_nested`: include inner nested datatypes (all defined nested data types are arbitrary)
    ///
    /// # Examples:
    /// `TypeArgument::List { arg: TypeNestedArgument::List, base_types: vec![DataType::Int8, DataType::UInt8], include_nested: vec![TypeNestedArgument::Struct] }` can accept
    /// data types: `DataType::List(DataType::UInt8)`, `DataType::List(DataType::Int8)`,
    /// `DataType::List(DataType::Struct(DataType::Int8))`, `DataType::List(DataType::Struct(DataType::Struct(DataType::UInt8)))` and so on.
    Nested {
        arg: TypeNestedArgument,
        base_types: Vec<DataType>,
        include_nested: Vec<TypeNestedArgument>,
    },
    /// Supports non nested data types
    Common(Vec<DataType>),
}

Variadic

Input:

Variadic(
    vec![
        TypeArgument::Nested {
            arg: TypeNestedArgument::List,
            base_types: vec![DataType::Int8, DataType::UInt8],
            include_nested: vec![],
        },
        TypeArgument::Common(vec![DataType::Int8, DataType::UInt8]),
        TypeArgument::Nested {
            arg: TypeNestedArgument::Struct,
            base_types: vec![DataType::Int8, DataType::UInt8],
            include_nested: vec![],
        }
    ]
)

Output:

(DataType::List(DataType::Int8 + DataType::UInt8 + DataType::List(recursive))
+ DataType::Int8 + DataType::UInt8 + DataType::Struct(DataType::Int8 + DataType::UInt8 + DataType::Struct(recursive)))*

Uniform

Input:

Uniform(
    n,
    vec![
        TypeArgument::Nested {
            arg: TypeNestedArgument::List,
            base_types: vec![DataType::Int8, DataType::UInt8],
            include_nested: vec![],
        },
        TypeArgument::Common(vec![DataType::Int8, DataType::UInt8]),
        TypeArgument::Struct{
            arg: TypeNestedArgument::Struct,
            base_types: vec![DataType::Int8, DataType::UInt8],
            include_nested: vec![],
        }
    ]
)

Output:

(DataType::List(DataType::Int8 + DataType::UInt8 + DataType::List(recursive)) + DataType::Int8 + DataType::UInt8 + DataType::Struct(DataType::Int8 + DataType::UInt8 + DataType::Struct(recursive))) ^ n

Exact

Input:

Exact(
    vec![
        TypeArgument::Nested{
            arg: TypeNestedArgument::List,
            base_types: vec![DataType::Int8, DataType::UInt8],
            include_nested: vec![],
        },
        TypeArgument::Common(vec![DataType::Int8, DataType::UInt8]),
        TypeArgument::Nested{
            arg: TypeNestedArgument::Struct,
            base_types: vec![DataType::Int8, DataType::UInt8],
            include_nested: vec![]
        }
    ]
)

Output:

DataType::List(DataType::Int8 + DataType::UInt8 + DataType::List(recursive)) |
DataType::Int8 |
DataType::UInt8 |
DataType::Struct(DataType::Int8 + DataType::UInt8 + DataType::Struct(recursive))

Proposed code for future features:

            BuiltinScalarFunction::ArrayAppend
            | BuiltinScalarFunction::ArrayPositions
            | BuiltinScalarFunction::ArrayRemove
            | BuiltinScalarFunction::ArrayRemoveAll
            | BuiltinScalarFunction::ArrayHas => Signature::concat(
                vec![
                    Exact(vec![TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    }]),
                    Uniform(1, vec![
                        TypeArgument::Common(
                            array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        ),
                        TypeArgument::Nested {
                            arg: TypeNestedArgument::List,
                            base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                            include_nested: vec![],
                        },
                    ]),
                ],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayPopBack
            | BuiltinScalarFunction::ArrayDims
            | BuiltinScalarFunction::ArrayEmpty
            | BuiltinScalarFunction::Flatten
            | BuiltinScalarFunction::ArrayNdims
            | BuiltinScalarFunction::Cardinality => Signature::exact(
                vec![TypeArgument::Nested {
                    arg: TypeNestedArgument::List,
                    base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                    include_nested: vec![],
                }],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayConcat => Signature::variadic(
                vec![TypeArgument::Nested {
                    arg: TypeNestedArgument::List,
                    base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                    include_nested: vec![],
                }],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayHasAll | BuiltinScalarFunction::ArrayHasAny => {
                Signature::exact(
                    vec![
                        TypeArgument::Nested {
                            arg: TypeNestedArgument::List,
                            base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                            include_nested: vec![],
                        },
                        TypeArgument::Nested {
                            arg: TypeNestedArgument::List,
                            base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                            include_nested: vec![],
                        },
                    ],
                    self.volatility(),
                )
            }
            BuiltinScalarFunction::ArrayElement => Signature::exact(
                vec![
                    TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    },
                    TypeArgument::Common(vec![DataType::Int64]),
                ],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArraySlice => Signature::exact(
                vec![
                    TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    },
                    TypeArgument::Common(vec![DataType::Int64, DataType::Int64]),
                ],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayLength => Signature::one_of(
                vec![
                    Exact(vec![TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    }]),
                    Exact(vec![
                        TypeArgument::Nested {
                            arg: TypeNestedArgument::List,
                            base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                            include_nested: vec![],
                        },
                        TypeArgument::Common(vec![DataType::Int64]),
                    ]),
                ],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayPosition => Signature::one_of(
                vec![Exact(vec![
                    TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    },
                    TypeArgument::Common(vec![DataType::Int64, DataType::Int64]),
                ])],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayPrepend => Signature::concat(
                vec![
                    Uniform(1, vec![
                        TypeArgument::Common(
                            array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        ),
                        TypeArgument::Nested {
                            arg: TypeNestedArgument::List,
                            base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                            include_nested: vec![],
                        },
                    ]),
                    Exact(TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    }),
                ],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayRepeat => Signature::concat(
                vec![
                    Uniform(1, vec![
                        TypeArgument::Common(
                            array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        ),
                        TypeArgument::Nested {
                            arg: TypeNestedArgument::List,
                            base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                            include_nested: vec![],
                        },
                    ]),
                    Exact(TypeArgument::Common(vec![DataType::Int64])),
                ],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayRemoveN => Signature::concat(vec![
                Exact(vec![TypeArgument::Nested {
                    arg: TypeNestedArgument::List,
                    base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                    include_nested: vec![],
                }]),
                Uniform(
                1,
                    vec![
                    TypeArgument::Common(
                        array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                    ),
                    TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    },
                ]),
                Exact(vec![TypeArgument::Common(vec![DataType::Int64])]),
            ]),
            BuiltinScalarFunction::ArrayReplace
            | BuiltinScalarFunction::ArrayReplaceAll => Signature::concat(
                vec![
                    Exact(vec![TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    }]),
                    Uniform(
                    2,
                    vec![
                        TypeArgument::Common(
                            array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        ),
                        TypeArgument::Nested {
                            arg: TypeNestedArgument::List,
                            base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                            include_nested: vec![],
                        },
                    ]),
                ],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayReplaceN => Signature::concat(
                vec![
                    Exact(vec![TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    }]),
                    Uniform(2, vec![
                        TypeArgument::Common(
                            array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        ),
                        TypeArgument::Nested {
                            arg: TypeNestedArgument::List,
                            base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                            include_nested: vec![],
                        },
                    ]),
                    Exact(vec![TypeArgument::Common(vec![DataType::Int64])]),
                ],
                self.volatility(),
            ),
            BuiltinScalarFunction::ArrayToString => Signature::concat(
                Exact(vec![TypeArgument::Nested {
                    arg: TypeNestedArgument::List,
                    base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                    include_nested: vec![],
                }]),
                OneOf(vec![
                    Exact(vec![TypeArgument::Common(vec![DataType::Utf8])]),
                    Exact(vec![TypeArgument::Common(vec![
                        DataType::Utf8,
                        DataType::Utf8,
                    ])]),
                ]),
            ),
            BuiltinScalarFunction::MakeArray => Signature::concat(
                vec![Variadic(vec![
                    TypeArgument::Nested {
                        arg: TypeNestedArgument::List,
                        base_types: array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                        include_nested: vec![],
                    },
                    TypeArgument::Common(
                        array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
                    ),
                ])],
                self.volatility(),
            ),

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@crepererum
Copy link
Contributor

I agree that the signature system needs some work and I guess that you came up w/ a quite solid approach. However I am having some trouble following it 😅 . Some questions / comments:

  • argument quality: Could you add a short explanation for dev/undef and eq/uneq, i.e. what do they mean? Maybe an example could help here.
  • kleen algebra: that word alone doesn't say much, it just says that you follow the Kleen algebra properties. However it doesn't say what the components are, namely:
    • the set elements (i.e. is this the argument quality, the argument type, the argument value)
    • the two operators
    • the * function
  • meta algebra: same comment as for the normal algebra

@izveigor
Copy link
Contributor Author

Hello, @crepererum!
I'll try to explain as best I can.

About the categories:

There is a finite set of data types (which we denote by DataType)
Any of the possible types (options) of Signature can be described into 4 categories:


definite (def) and undefined (undef);



Definition: a definite data type means that only a specific data type from the DataType set is suitable.
(Example Exact(DataType::Int8) accepts (DataType::Int8), but not (DataType::UInt8)).


Definition: An undefined data type means that any data type from the set of DataTypes is suitable.
(Example: Any() accepts (DataType::Int8), and also (DataType::UInt8)).



equal (eq) and unequal (uneq);


Definition of Equality: the category of equality means that all elements are equal to each other.
(Example: VariadicEqual() can accept (DataType::Int8, DataType::Int8) or (DataType::UInt8, DataType::UInt8, DataType::UInt8), but not (DataType::Int8, DataType::UInt8)).


Definition of Inequality: The category of inequality means that the elements can be arbitrary.
(Example: VariadicAny() can accept (DataType::Int8, DataType::Int8), and also (DataType::Int8, DataType::UInt8))


Now, combine categories:

  • Definite-Equality (means that it should be a specific equal data type (sure, it should be only 1));
  • Definite-Unequality (means that it should be a specific set of data types);
  • Undefinite-Equality (means that it can accept any equal data type));
  • Undefinite-Unequality (means that it can accept any data types from DataType);

About Kleene algebra:

I choosed Kleene algebra (which is used for regular expressions). So if we create the algorithm for Signature (i. e. for boolean function, which can accept either input data set or not).
For out case it is sufficient to apply only regular language (like can accept Deterministic finite automaton (DFA)).

So, each type of signature represents a seperate DFA.

About meta algebra:

As exist situations, which a signature can check not only by one DFA, but many. So, we create the same regular meta language.
For example, if we want to use two DFAs, and if one of it returns the positive answer, than input data set suits us (OneOf case).

For full compatibility, it is worth adding two new signature types (Equal и Concat).


Separately, it is worth mentioning the function Concat. Concat can accept only input data set (without Kleene star (only Equal, Any, Uniform and Exact)).
Concat takes a set of data, divides them according to the size of each type of signature and uses the already specific DFA.

English is not the author’s native language, so there may be some difficulties in understanding.
I hope you understood me and my idea seemed reasonable to you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants