-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-5105] Add MEASURE type and AGGREGATE aggregate function #2965
Conversation
The MEASURE type is internal. A RexNode expression that contains a measure and evaluates to an INTEGER will have the type MEASURE<INTEGER>, using a parameterized SQL type similar to ARRAY<INTEGER>. But the same measure column, as seen from SQL, will just have type INTEGER. The parameterized type helps us keep things straight in the relational algebra as we apply planner rules. The AGGREGATE function belongs to the new CALCITE function library. To use it, add 'lib=calcite' to your connect string. Add a new validator configuration parameter, `boolean SqlValidator.Config.nakedMeasures()`. The query SELECT deptno, AGGREGATE(avg_sal) FROM emp GROUP BY deptno is valid if `avg_sal` is a measure. If `nakedMeasures` is true, then the following query is a valid shorthand for it: SELECT deptno, avg_sal FROM emp GROUP BY deptno In the long term, we would like people to feel comfortable using the latter form. Measures are not necessarily aggregate functions, but are just expressions whose value depends on their context (the current GROUP BY key in an aggregate query, or the current row in a regular query). And we will generalize measures to analytic expressions, which are not necessarily just references to measure columns. But in the short term, setting the `nakedMeasures` flag to false provides a level of comfort to people (and tools that generate SQL) who think of measures as aggregate functions, and think that measures should only be used in `GROUP BY` queries. Extend mock catalog with a table that has measure columns. Add a new Quidem test, measure.iq. It is disabled because we don't yet have the means to create measure columns in queries (or views). That is to come in [CALCITE-4496]. Close apache#2965
eccde6a
to
9c0b37d
Compare
site/_docs/reference.md
Outdated
|
||
| C | Function | Reason not documented | ||
|:--|:-------------- |:--------------------- | ||
| c | AGGREGATE(m) | TODO: document; also AS MEASURE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess when this get "officially" documented, we will also need to add 'c' as a new value for the compatibility column in the "Dialect-specific Operators", but there is no point to add it now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've added the code 'c' now, but not much explanation of measures at this point.
ApplySqlType(SqlTypeName typeName, boolean isNullable, | ||
Iterable<? extends RelDataType> types) { | ||
super(typeName, isNullable, null); | ||
this.types = ImmutableList.copyOf(types); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to check that types
is not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that 0 type arguments makes sense. So I don't think we should add the check.
* <p>For a measure, {@code selectItem} will have the form | ||
* {@code AS(MEASURE(exp), alias)} and this method returns {@code exp}. */ | ||
public static @Nullable SqlNode getMeasure(SqlNode selectItem) { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems unfinished, will it be part of future work? Maybe add a TODO or a comment in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will be extended in CALCITE-4496. I added a note.
@julianhyde Thanks for this contribution. Looks good. I have taken a look, and left some minor comments (I'm not very familiar with SQL parser and validator parts though, so I'm probably not the most suited reviewer in there). |
The MEASURE type is internal. A RexNode expression that contains a measure and evaluates to an INTEGER will have the type MEASURE<INTEGER>, using a parameterized SQL type similar to ARRAY<INTEGER>. But the same measure column, as seen from SQL, will just have type INTEGER. The parameterized type helps us keep things straight in the relational algebra as we apply planner rules. The AGGREGATE function belongs to the new CALCITE function library. To use it, add 'lib=calcite' to your connect string. Add a new validator configuration parameter, `boolean SqlValidator.Config.nakedMeasures()`. The query SELECT deptno, AGGREGATE(avg_sal) FROM emp GROUP BY deptno is valid if `avg_sal` is a measure. If `nakedMeasures` is true, then the following query is a valid shorthand for it: SELECT deptno, avg_sal FROM emp GROUP BY deptno In the long term, we would like people to feel comfortable using the latter form. Measures are not necessarily aggregate functions, but are just expressions whose value depends on their context (the current GROUP BY key in an aggregate query, or the current row in a regular query). And we will generalize measures to analytic expressions, which are not necessarily just references to measure columns. But in the short term, setting the `nakedMeasures` flag to false provides a level of comfort to people (and tools that generate SQL) who think of measures as aggregate functions, and think that measures should only be used in `GROUP BY` queries. Extend mock catalog with a table that has measure columns. Add a new Quidem test, measure.iq. It is disabled because we don't yet have the means to create measure columns in queries (or views). That is to come in [CALCITE-4496]. Close apache#2965
…TE-5155) (#9) * [CALCITE-5197] Bump gradle to 7.4.2 Add a Gradle task to automatically update the checksum in the Gradle Wrapper. In HOWTO guide, add a section 'How to ugrade Gradle and the Gradle Wrapper', and update the Gradle version. Close apache#2841 * [CALCITE-5344] Migrate Travis CI and AppVeyor configuration to Github Actions The ASF is discontinuing Travis CI for testing, and it will no longer be available after 31 December 2022. * [CALCITE-5340] Tests should fail when actual and expected XML reference files are not identical File check added to the following test suites: - HepPlannerTest - RelOptRulesTest - RuleMatchVisualizerTest - SqlHintsConverterTest - SqlToRelConverterTest - SqlPrettyWriterTest - TypeCoercionConverterTest - TopDownOptTest Updated XML test reference files: - HepPlannerTest.xml - RelOptRulesTest.xml - SqlHintsConverterTest.xml - SqlToRelConverterTest.xml - TypeCoercionConverterTest.xml * [CALCITE-5314] Prune empty parts of a query by exploiting stats/metadata Close apache#2935 * [CALCITE-4804] Support Snapshot operator serialization and deserizalization This closes apache#2955 * [CALCITE-5351] Upgrade jackson databind to 2.13.4.2 and jackson to 2.13.4 * [CALCITE-5355] Use the Presto SQL dialect for AWS Athena * [CALCITE-5252] JDBC adapter sometimes misses parentheses around Query in WITH_ITEM body This closes apache#2952 * [CALCITE-4982] Do not push 'cast to not null' through Join in ProjectJoinTransposeRule We should not push down dangerous expressions through Join at last, see more in CALCITE-5315. In this issue, we fixed this by stop pushing down 'cast to not null' through Join for a quick fix. This closes 2686 * Use JDK 17 as default javadoc root * Refactor tests to allow testing custom type systems * Refactor RexImpTable Break up the constructor into a builder pattern, and make RexImpTable immutable after construction. The constructor was getting too long (approaching 400 lines), so this change introduces a private static class Builder and moves initialization into chained populate() and populate2() methods. Usually we put inner classes towards the end of the file. In this case, we put the Builder inner class after the constructor, in order to reduce diff noise. * Refactor: Deprecate SqlValidatorUtil.getAlias Whether `SqlValidatorUtil.getAlias` returns null depends on whether the ordinal argument is less than zero, which makes it difficult to reason about nullability. Replace with two methods `alias(SqlNode)` that is nullable and `alias(SqlNode, int)` that is not nullable. * Various improvements to OperandTypes In SqlSingleOperandTypeChecker.java add default method implementations so that subtypes don't need to. Add OperandTypes.interval(SqlTypeName) to match an interval operand. Add CompositeOperandTypeChecker.withGenerator, so that a checker can generate signatures given a function name, as opposed to hard-coding the function name in the signatures as today. Add SqlOperandTypeChecker.and, SqlSingleOperandTypeChecker.or, and similar methods, to combine checkers using a binary operation. Flatten composite checkers, so that 'or(or(a, b), c)' becomes 'or(a, b,c)'. * Remove Nullable from RelBuilder.alias * Refactor SqlValidatorNamespace.fieldExists Add method SqlValidatorNamespace.field * Improve digest for Window * [CALCITE-5348] When translating ORDER BY in OVER, use the session's default null collation (e.g. NULLS LAST) This repeats the fix that was made in [CALCITE-2323] and that I accidentally undid in [CALCITE-4497]. Deprecate method SqlToRelConverter.convertSortExpression; makeOver now goes directly to RexNode rather than using an intermediate RexFieldCollation. * Add class MonotonicSupplier * Add "mssql" (Microsoft SQL Server) function library Add functions DATEADD, DATEDIFF, DATEPART, DATE_PART as aliases of TIMESTAMPADD, TIMESTAMPDIFF, EXTRACT. (Not fully implemented yet.) * [CALCITE-5372] Upgrade vlsi-release-plugins to 1.84 * [CALCITE-5155] Custom time frames Before this change, you can use the ISO SQL time units (SECOND, HOUR, DAY, MONTH, YEAR, etc.) to perform datetime arithmetic (FLOOR, CEIL, EXTRACT) and also when defining materialized views and using them in queries. But applications would like to be able to define their own time frames, such as "MINUTE15" (a 15 minute period aligned with the hour) or "MONTH4" (a 4 month period aligned with the year), or "WEEK(THURSDAY)" (a 7 day week that starts on a Thursday). After this change, applications can define their own time frames. We add a `class TimeFrameSet`, and in the `interface RelDataTypeSystem` we add a method `TimeFrameSet deriveTimeFrameSet(TimeFrameSet frameSet)`. This method is called during query preparation, and the application has the opportunity to define a set that contains custom and existing time frames. Time frames can be defined that are multiples of and multiply to built-in time frames (as, for example, MINUTE is a multiple of SECOND and MILLISECOND multiplies to SECOND). You can also define that a time frame is aligned with another (as, for example, DAY is aligned with MONTH even though the multiplier is not constant). The following functions allow time frame arguments: * DATEADD (Postgres, MSSql) * DATEDIFF (Postgres, MSSql) * DATEPART (MSSql) * DATE_PART (Postgres) * EXTRACT (Calcite built-in, also SQL standard) * CEIL (Calcite built-in) * FLOOR (Calcite built-in) * TIMESTAMPADD (Calcite built-in, also JDBC standard) * TIMESTAMPDIFF (Calcite-builtin, also JDBC standard) * TIMESTAMP_TRUNC (BigQuery) * TIME_TRUNC (BigQuery) Calls to the above functions with invalid time units would previously be a parse error and are now detected during validation. The SQL_TSI_xxx (e.g. SQL_TSI_HOUR) arguments are treated as time frames, and the parser passes them as identifiers. They are no longer reserved keywords. Previously, NANOSECOND and MILLISECOND were allowed in EXTRACT but no other functions. Now all functions that accept time frames accept the same time frames (built-in time intervals, identifiers for user-defined time frames, and SQL_TSI_xxx which are defined in the JDBC standard but are treated as identifiers until validation). The representation of calls to the above functions has changed. Previously the operand was a time unit, now it is an identifier. Deprecate SqlAbstractParserImpl.setTimeUnitCodes() and SqlParser.Config.timeUnitCodes(), because you can now create aliases for time units by creating custom time frames using TimeFrameSet.Builder.alias(). Add commons-math3 as a dependency because TimeFrame uses BigFraction. Currently ISO_YEAR is not handled by DATEADD, TIMESTAMPADD, DATEDIFF, TIMESTAMPDIFF, etc. Adding or subtracting an ISO_YEAR will no-op. I don't know what the behavior should be. Close apache#2960 * [CALCITE-5356] Update junit4 to 4.13.2 and junit5 to 5.9.1 Close apache#2958 * Site: Add instructions to consult/update the JIRA release dashboard Close apache#2963 * Site: Add Dmitry Sysolyatin as committer * Site: Add Bertil Chapuis as committer * [CALCITE-5353] Document new procedure for requesting JIRA accounts and becoming a contributor * [CALCITE-5383] Add CONCAT to BIG_QUERY dialect Close apache#2970 * [CALCITE-5310] JSON_OBJECT in scalar sub-query throws AssertionError Close apache#2929 * Quidem: Allow CREATE VIEW in 'scott' connection * [CALCITE-5105] Add MEASURE type and AGGREGATE aggregate function The MEASURE type is internal. A RexNode expression that contains a measure and evaluates to an INTEGER will have the type MEASURE<INTEGER>, using a parameterized SQL type similar to ARRAY<INTEGER>. But the same measure column, as seen from SQL, will just have type INTEGER. The parameterized type helps us keep things straight in the relational algebra as we apply planner rules. The AGGREGATE function belongs to the new CALCITE function library. To use it, add 'lib=calcite' to your connect string. Add a new validator configuration parameter, `boolean SqlValidator.Config.nakedMeasures()`. The query SELECT deptno, AGGREGATE(avg_sal) FROM emp GROUP BY deptno is valid if `avg_sal` is a measure. If `nakedMeasures` is true, then the following query is a valid shorthand for it: SELECT deptno, avg_sal FROM emp GROUP BY deptno In the long term, we would like people to feel comfortable using the latter form. Measures are not necessarily aggregate functions, but are just expressions whose value depends on their context (the current GROUP BY key in an aggregate query, or the current row in a regular query). And we will generalize measures to analytic expressions, which are not necessarily just references to measure columns. But in the short term, setting the `nakedMeasures` flag to false provides a level of comfort to people (and tools that generate SQL) who think of measures as aggregate functions, and think that measures should only be used in `GROUP BY` queries. Extend mock catalog with a table that has measure columns. Add a new Quidem test, measure.iq. It is disabled because we don't yet have the means to create measure columns in queries (or views). That is to come in [CALCITE-4496]. Close apache#2965 * Workaround * Looker instructions * Temporary workaround: change SqlInternalOperator's syntax back to FUNCTION * [CALCITE-4998] Undo 4b34903 * Make sure the test kit can be used externally * [CALCITE-5052] Allow Source based on a URL with jar: protocol This allows dependent projects to run tests using Bazel. (Previously, DiffRepository would give errors because Bazel has packaged the .xml files it needs inside JAR files.) Close apache#2750 * [CALCITE-5349] RelJson deserialization should support SqlLibraryOperators * Support UDT declarations from root of schema model. (#7) Related to CALCITE-5346. Previously we relied on the parser to map unrecognized datatypes to a known type using SqlAlienSystemTypeNameSpec. This worked but made it difficult to change or add new types as necessary. One would have to update at least 3 different parsers (babel, core, server) to make a change. This change allows for declaring user-defined types at the root of a schema model and allows for easy type alias mapping. These data types are shared by all schema in the model so cast and DDL expressions do not need to scope data type references to a particular sub-schema. For example: ``` inline: { version: '1.0', types: [ { name: 'BOOL', type: 'BOOLEAN' }, { name: 'BYTES', type: 'VARBINARY' }, ... ], ``` Allows for `CAST("true" as BOOL)` * Rebase and clean up git history Co-authored-by: Sergey Nuyanzin <snuyanzin@gmail.com> Co-authored-by: Francis Chuang <francischuang@apache.org> Co-authored-by: Alessandro Solimando <alessandro.solimando@gmail.com> Co-authored-by: Hanumath Maduri <hanu.ncr@gmail.com> Co-authored-by: xiejiajun <jiajunbernoulli@foxmail.com> Co-authored-by: James Turton <james@somecomputer.xyz> Co-authored-by: wumou.wm <wumou.wm@alibaba-inc.com> Co-authored-by: xurenhe <xurenhe19910131@gmail.com> Co-authored-by: Julian Hyde <jhyde@apache.org> Co-authored-by: Stamatis Zampetakis <zabetak@gmail.com> Co-authored-by: dssysolyatin <dm.sysolyatin@gmail.com> Co-authored-by: Bertil Chapuis <bchapuis@gmail.com> Co-authored-by: Oliver Lee <oliverlee@google.com> Co-authored-by: Benchao Li <libenchao@gmail.com> Co-authored-by: Will Noble <wnoble@google.com> Co-authored-by: Marieke Gueye <mariekes@google.com> Co-authored-by: TJ Banghart <tjbanghart@google.com>
The MEASURE type is internal. A RexNode expression that contains a measure and evaluates to an INTEGER will have the type MEASURE<INTEGER>, using a parameterized SQL type similar to ARRAY<INTEGER>. But the same measure column, as seen from SQL, will just have type INTEGER. The parameterized type helps us keep things straight in the relational algebra as we apply planner rules. The AGGREGATE function belongs to the new CALCITE function library. To use it, add 'lib=calcite' to your connect string. Add a new validator configuration parameter, `boolean SqlValidator.Config.nakedMeasures()`. The query SELECT deptno, AGGREGATE(avg_sal) FROM emp GROUP BY deptno is valid if `avg_sal` is a measure. If `nakedMeasures` is true, then the following query is a valid shorthand for it: SELECT deptno, avg_sal FROM emp GROUP BY deptno In the long term, we would like people to feel comfortable using the latter form. Measures are not necessarily aggregate functions, but are just expressions whose value depends on their context (the current GROUP BY key in an aggregate query, or the current row in a regular query). And we will generalize measures to analytic expressions, which are not necessarily just references to measure columns. But in the short term, setting the `nakedMeasures` flag to false provides a level of comfort to people (and tools that generate SQL) who think of measures as aggregate functions, and think that measures should only be used in `GROUP BY` queries. Extend mock catalog with a table that has measure columns. Add a new Quidem test, measure.iq. It is disabled because we don't yet have the means to create measure columns in queries (or views). That is to come in [CALCITE-4496]. Close apache#2965
No description provided.