[CALCITE-5105] Add MEASURE type and AGGREGATE aggregate function #2965

julianhyde · 2022-11-10T01:13:50Z

No description provided.

The MEASURE type is internal. A RexNode expression that contains a measure and evaluates to an INTEGER will have the type MEASURE<INTEGER>, using a parameterized SQL type similar to ARRAY<INTEGER>. But the same measure column, as seen from SQL, will just have type INTEGER. The parameterized type helps us keep things straight in the relational algebra as we apply planner rules. The AGGREGATE function belongs to the new CALCITE function library. To use it, add 'lib=calcite' to your connect string. Add a new validator configuration parameter, `boolean SqlValidator.Config.nakedMeasures()`. The query SELECT deptno, AGGREGATE(avg_sal) FROM emp GROUP BY deptno is valid if `avg_sal` is a measure. If `nakedMeasures` is true, then the following query is a valid shorthand for it: SELECT deptno, avg_sal FROM emp GROUP BY deptno In the long term, we would like people to feel comfortable using the latter form. Measures are not necessarily aggregate functions, but are just expressions whose value depends on their context (the current GROUP BY key in an aggregate query, or the current row in a regular query). And we will generalize measures to analytic expressions, which are not necessarily just references to measure columns. But in the short term, setting the `nakedMeasures` flag to false provides a level of comfort to people (and tools that generate SQL) who think of measures as aggregate functions, and think that measures should only be used in `GROUP BY` queries. Extend mock catalog with a table that has measure columns. Add a new Quidem test, measure.iq. It is disabled because we don't yet have the means to create measure columns in queries (or views). That is to come in [CALCITE-4496]. Close apache#2965

rubenada · 2022-11-11T09:40:45Z

site/_docs/reference.md

+
+| C | Function       | Reason not documented
+|:--|:-------------- |:---------------------
+| c | AGGREGATE(m)   | TODO: document; also AS MEASURE


I guess when this get "officially" documented, we will also need to add 'c' as a new value for the compatibility column in the "Dialect-specific Operators", but there is no point to add it now, right?

Good point. I've added the code 'c' now, but not much explanation of measures at this point.

rubenada · 2022-11-11T10:02:22Z

core/src/main/java/org/apache/calcite/sql/type/ApplySqlType.java

+  ApplySqlType(SqlTypeName typeName, boolean isNullable,
+      Iterable<? extends RelDataType> types) {
+    super(typeName, isNullable, null);
+    this.types = ImmutableList.copyOf(types);


Would it make sense to check that types is not empty?

It's possible that 0 type arguments makes sense. So I don't think we should add the check.

rubenada · 2022-11-11T10:10:30Z

core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorUtil.java

+   * <p>For a measure, {@code selectItem} will have the form
+   * {@code AS(MEASURE(exp), alias)} and this method returns {@code exp}. */
+  public static @Nullable SqlNode getMeasure(SqlNode selectItem) {
+    return null;


This method seems unfinished, will it be part of future work? Maybe add a TODO or a comment in that case?

Yes, it will be extended in CALCITE-4496. I added a note.

rubenada · 2022-11-11T10:17:41Z

@julianhyde Thanks for this contribution. Looks good. I have taken a look, and left some minor comments (I'm not very familiar with SQL parser and validator parts though, so I'm probably not the most suited reviewer in there).

The MEASURE type is internal. A RexNode expression that contains a measure and evaluates to an INTEGER will have the type MEASURE<INTEGER>, using a parameterized SQL type similar to ARRAY<INTEGER>. But the same measure column, as seen from SQL, will just have type INTEGER. The parameterized type helps us keep things straight in the relational algebra as we apply planner rules. The AGGREGATE function belongs to the new CALCITE function library. To use it, add 'lib=calcite' to your connect string. Add a new validator configuration parameter, `boolean SqlValidator.Config.nakedMeasures()`. The query SELECT deptno, AGGREGATE(avg_sal) FROM emp GROUP BY deptno is valid if `avg_sal` is a measure. If `nakedMeasures` is true, then the following query is a valid shorthand for it: SELECT deptno, avg_sal FROM emp GROUP BY deptno In the long term, we would like people to feel comfortable using the latter form. Measures are not necessarily aggregate functions, but are just expressions whose value depends on their context (the current GROUP BY key in an aggregate query, or the current row in a regular query). And we will generalize measures to analytic expressions, which are not necessarily just references to measure columns. But in the short term, setting the `nakedMeasures` flag to false provides a level of comfort to people (and tools that generate SQL) who think of measures as aggregate functions, and think that measures should only be used in `GROUP BY` queries. Extend mock catalog with a table that has measure columns. Add a new Quidem test, measure.iq. It is disabled because we don't yet have the means to create measure columns in queries (or views). That is to come in [CALCITE-4496]. Close apache#2965

…TE-5155) (#9) * [CALCITE-5197] Bump gradle to 7.4.2 Add a Gradle task to automatically update the checksum in the Gradle Wrapper. In HOWTO guide, add a section 'How to ugrade Gradle and the Gradle Wrapper', and update the Gradle version. Close apache#2841 * [CALCITE-5344] Migrate Travis CI and AppVeyor configuration to Github Actions The ASF is discontinuing Travis CI for testing, and it will no longer be available after 31 December 2022. * [CALCITE-5340] Tests should fail when actual and expected XML reference files are not identical File check added to the following test suites: - HepPlannerTest - RelOptRulesTest - RuleMatchVisualizerTest - SqlHintsConverterTest - SqlToRelConverterTest - SqlPrettyWriterTest - TypeCoercionConverterTest - TopDownOptTest Updated XML test reference files: - HepPlannerTest.xml - RelOptRulesTest.xml - SqlHintsConverterTest.xml - SqlToRelConverterTest.xml - TypeCoercionConverterTest.xml * [CALCITE-5314] Prune empty parts of a query by exploiting stats/metadata Close apache#2935 * [CALCITE-4804] Support Snapshot operator serialization and deserizalization This closes apache#2955 * [CALCITE-5351] Upgrade jackson databind to 2.13.4.2 and jackson to 2.13.4 * [CALCITE-5355] Use the Presto SQL dialect for AWS Athena * [CALCITE-5252] JDBC adapter sometimes misses parentheses around Query in WITH_ITEM body This closes apache#2952 * [CALCITE-4982] Do not push 'cast to not null' through Join in ProjectJoinTransposeRule We should not push down dangerous expressions through Join at last, see more in CALCITE-5315. In this issue, we fixed this by stop pushing down 'cast to not null' through Join for a quick fix. This closes 2686 * Use JDK 17 as default javadoc root * Refactor tests to allow testing custom type systems * Refactor RexImpTable Break up the constructor into a builder pattern, and make RexImpTable immutable after construction. The constructor was getting too long (approaching 400 lines), so this change introduces a private static class Builder and moves initialization into chained populate() and populate2() methods. Usually we put inner classes towards the end of the file. In this case, we put the Builder inner class after the constructor, in order to reduce diff noise. * Refactor: Deprecate SqlValidatorUtil.getAlias Whether `SqlValidatorUtil.getAlias` returns null depends on whether the ordinal argument is less than zero, which makes it difficult to reason about nullability. Replace with two methods `alias(SqlNode)` that is nullable and `alias(SqlNode, int)` that is not nullable. * Various improvements to OperandTypes In SqlSingleOperandTypeChecker.java add default method implementations so that subtypes don't need to. Add OperandTypes.interval(SqlTypeName) to match an interval operand. Add CompositeOperandTypeChecker.withGenerator, so that a checker can generate signatures given a function name, as opposed to hard-coding the function name in the signatures as today. Add SqlOperandTypeChecker.and, SqlSingleOperandTypeChecker.or, and similar methods, to combine checkers using a binary operation. Flatten composite checkers, so that 'or(or(a, b), c)' becomes 'or(a, b,c)'. * Remove Nullable from RelBuilder.alias * Refactor SqlValidatorNamespace.fieldExists Add method SqlValidatorNamespace.field * Improve digest for Window * [CALCITE-5348] When translating ORDER BY in OVER, use the session's default null collation (e.g. NULLS LAST) This repeats the fix that was made in [CALCITE-2323] and that I accidentally undid in [CALCITE-4497]. Deprecate method SqlToRelConverter.convertSortExpression; makeOver now goes directly to RexNode rather than using an intermediate RexFieldCollation. * Add class MonotonicSupplier * Add "mssql" (Microsoft SQL Server) function library Add functions DATEADD, DATEDIFF, DATEPART, DATE_PART as aliases of TIMESTAMPADD, TIMESTAMPDIFF, EXTRACT. (Not fully implemented yet.) * [CALCITE-5372] Upgrade vlsi-release-plugins to 1.84 * [CALCITE-5155] Custom time frames Before this change, you can use the ISO SQL time units (SECOND, HOUR, DAY, MONTH, YEAR, etc.) to perform datetime arithmetic (FLOOR, CEIL, EXTRACT) and also when defining materialized views and using them in queries. But applications would like to be able to define their own time frames, such as "MINUTE15" (a 15 minute period aligned with the hour) or "MONTH4" (a 4 month period aligned with the year), or "WEEK(THURSDAY)" (a 7 day week that starts on a Thursday). After this change, applications can define their own time frames. We add a `class TimeFrameSet`, and in the `interface RelDataTypeSystem` we add a method `TimeFrameSet deriveTimeFrameSet(TimeFrameSet frameSet)`. This method is called during query preparation, and the application has the opportunity to define a set that contains custom and existing time frames. Time frames can be defined that are multiples of and multiply to built-in time frames (as, for example, MINUTE is a multiple of SECOND and MILLISECOND multiplies to SECOND). You can also define that a time frame is aligned with another (as, for example, DAY is aligned with MONTH even though the multiplier is not constant). The following functions allow time frame arguments: * DATEADD (Postgres, MSSql) * DATEDIFF (Postgres, MSSql) * DATEPART (MSSql) * DATE_PART (Postgres) * EXTRACT (Calcite built-in, also SQL standard) * CEIL (Calcite built-in) * FLOOR (Calcite built-in) * TIMESTAMPADD (Calcite built-in, also JDBC standard) * TIMESTAMPDIFF (Calcite-builtin, also JDBC standard) * TIMESTAMP_TRUNC (BigQuery) * TIME_TRUNC (BigQuery) Calls to the above functions with invalid time units would previously be a parse error and are now detected during validation. The SQL_TSI_xxx (e.g. SQL_TSI_HOUR) arguments are treated as time frames, and the parser passes them as identifiers. They are no longer reserved keywords. Previously, NANOSECOND and MILLISECOND were allowed in EXTRACT but no other functions. Now all functions that accept time frames accept the same time frames (built-in time intervals, identifiers for user-defined time frames, and SQL_TSI_xxx which are defined in the JDBC standard but are treated as identifiers until validation). The representation of calls to the above functions has changed. Previously the operand was a time unit, now it is an identifier. Deprecate SqlAbstractParserImpl.setTimeUnitCodes() and SqlParser.Config.timeUnitCodes(), because you can now create aliases for time units by creating custom time frames using TimeFrameSet.Builder.alias(). Add commons-math3 as a dependency because TimeFrame uses BigFraction. Currently ISO_YEAR is not handled by DATEADD, TIMESTAMPADD, DATEDIFF, TIMESTAMPDIFF, etc. Adding or subtracting an ISO_YEAR will no-op. I don't know what the behavior should be. Close apache#2960 * [CALCITE-5356] Update junit4 to 4.13.2 and junit5 to 5.9.1 Close apache#2958 * Site: Add instructions to consult/update the JIRA release dashboard Close apache#2963 * Site: Add Dmitry Sysolyatin as committer * Site: Add Bertil Chapuis as committer * [CALCITE-5353] Document new procedure for requesting JIRA accounts and becoming a contributor * [CALCITE-5383] Add CONCAT to BIG_QUERY dialect Close apache#2970 * [CALCITE-5310] JSON_OBJECT in scalar sub-query throws AssertionError Close apache#2929 * Quidem: Allow CREATE VIEW in 'scott' connection * [CALCITE-5105] Add MEASURE type and AGGREGATE aggregate function The MEASURE type is internal. A RexNode expression that contains a measure and evaluates to an INTEGER will have the type MEASURE<INTEGER>, using a parameterized SQL type similar to ARRAY<INTEGER>. But the same measure column, as seen from SQL, will just have type INTEGER. The parameterized type helps us keep things straight in the relational algebra as we apply planner rules. The AGGREGATE function belongs to the new CALCITE function library. To use it, add 'lib=calcite' to your connect string. Add a new validator configuration parameter, `boolean SqlValidator.Config.nakedMeasures()`. The query SELECT deptno, AGGREGATE(avg_sal) FROM emp GROUP BY deptno is valid if `avg_sal` is a measure. If `nakedMeasures` is true, then the following query is a valid shorthand for it: SELECT deptno, avg_sal FROM emp GROUP BY deptno In the long term, we would like people to feel comfortable using the latter form. Measures are not necessarily aggregate functions, but are just expressions whose value depends on their context (the current GROUP BY key in an aggregate query, or the current row in a regular query). And we will generalize measures to analytic expressions, which are not necessarily just references to measure columns. But in the short term, setting the `nakedMeasures` flag to false provides a level of comfort to people (and tools that generate SQL) who think of measures as aggregate functions, and think that measures should only be used in `GROUP BY` queries. Extend mock catalog with a table that has measure columns. Add a new Quidem test, measure.iq. It is disabled because we don't yet have the means to create measure columns in queries (or views). That is to come in [CALCITE-4496]. Close apache#2965 * Workaround * Looker instructions * Temporary workaround: change SqlInternalOperator's syntax back to FUNCTION * [CALCITE-4998] Undo 4b34903 * Make sure the test kit can be used externally * [CALCITE-5052] Allow Source based on a URL with jar: protocol This allows dependent projects to run tests using Bazel. (Previously, DiffRepository would give errors because Bazel has packaged the .xml files it needs inside JAR files.) Close apache#2750 * [CALCITE-5349] RelJson deserialization should support SqlLibraryOperators * Support UDT declarations from root of schema model. (#7) Related to CALCITE-5346. Previously we relied on the parser to map unrecognized datatypes to a known type using SqlAlienSystemTypeNameSpec. This worked but made it difficult to change or add new types as necessary. One would have to update at least 3 different parsers (babel, core, server) to make a change. This change allows for declaring user-defined types at the root of a schema model and allows for easy type alias mapping. These data types are shared by all schema in the model so cast and DDL expressions do not need to scope data type references to a particular sub-schema. For example: ``` inline: { version: '1.0', types: [ { name: 'BOOL', type: 'BOOLEAN' }, { name: 'BYTES', type: 'VARBINARY' }, ... ], ``` Allows for `CAST("true" as BOOL)` * Rebase and clean up git history Co-authored-by: Sergey Nuyanzin <snuyanzin@gmail.com> Co-authored-by: Francis Chuang <francischuang@apache.org> Co-authored-by: Alessandro Solimando <alessandro.solimando@gmail.com> Co-authored-by: Hanumath Maduri <hanu.ncr@gmail.com> Co-authored-by: xiejiajun <jiajunbernoulli@foxmail.com> Co-authored-by: James Turton <james@somecomputer.xyz> Co-authored-by: wumou.wm <wumou.wm@alibaba-inc.com> Co-authored-by: xurenhe <xurenhe19910131@gmail.com> Co-authored-by: Julian Hyde <jhyde@apache.org> Co-authored-by: Stamatis Zampetakis <zabetak@gmail.com> Co-authored-by: dssysolyatin <dm.sysolyatin@gmail.com> Co-authored-by: Bertil Chapuis <bchapuis@gmail.com> Co-authored-by: Oliver Lee <oliverlee@google.com> Co-authored-by: Benchao Li <libenchao@gmail.com> Co-authored-by: Will Noble <wnoble@google.com> Co-authored-by: Marieke Gueye <mariekes@google.com> Co-authored-by: TJ Banghart <tjbanghart@google.com>

The MEASURE type is internal. A RexNode expression that contains a measure and evaluates to an INTEGER will have the type MEASURE<INTEGER>, using a parameterized SQL type similar to ARRAY<INTEGER>. But the same measure column, as seen from SQL, will just have type INTEGER. The parameterized type helps us keep things straight in the relational algebra as we apply planner rules. The AGGREGATE function belongs to the new CALCITE function library. To use it, add 'lib=calcite' to your connect string. Add a new validator configuration parameter, `boolean SqlValidator.Config.nakedMeasures()`. The query SELECT deptno, AGGREGATE(avg_sal) FROM emp GROUP BY deptno is valid if `avg_sal` is a measure. If `nakedMeasures` is true, then the following query is a valid shorthand for it: SELECT deptno, avg_sal FROM emp GROUP BY deptno In the long term, we would like people to feel comfortable using the latter form. Measures are not necessarily aggregate functions, but are just expressions whose value depends on their context (the current GROUP BY key in an aggregate query, or the current row in a regular query). And we will generalize measures to analytic expressions, which are not necessarily just references to measure columns. But in the short term, setting the `nakedMeasures` flag to false provides a level of comfort to people (and tools that generate SQL) who think of measures as aggregate functions, and think that measures should only be used in `GROUP BY` queries. Extend mock catalog with a table that has measure columns. Add a new Quidem test, measure.iq. It is disabled because we don't yet have the means to create measure columns in queries (or views). That is to come in [CALCITE-4496]. Close apache#2965

julianhyde added 2 commits November 9, 2022 16:52

Quidem: Allow CREATE VIEW in 'scott' connection

2473bf3

julianhyde force-pushed the 5105-measure-ref branch from eccde6a to 9c0b37d Compare November 10, 2022 04:58

rubenada reviewed Nov 11, 2022

View reviewed changes

Address comments from @rubenada

88e49ae

julianhyde closed this in 406c913 Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-5105] Add MEASURE type and AGGREGATE aggregate function #2965

[CALCITE-5105] Add MEASURE type and AGGREGATE aggregate function #2965

julianhyde commented Nov 10, 2022

rubenada Nov 11, 2022

julianhyde Nov 13, 2022

rubenada Nov 11, 2022

julianhyde Nov 13, 2022

rubenada Nov 11, 2022

julianhyde Nov 13, 2022

rubenada commented Nov 11, 2022

[CALCITE-5105] Add MEASURE type and AGGREGATE aggregate function #2965

[CALCITE-5105] Add MEASURE type and AGGREGATE aggregate function #2965

Conversation

julianhyde commented Nov 10, 2022

rubenada Nov 11, 2022

Choose a reason for hiding this comment

julianhyde Nov 13, 2022

Choose a reason for hiding this comment

rubenada Nov 11, 2022

Choose a reason for hiding this comment

julianhyde Nov 13, 2022

Choose a reason for hiding this comment

rubenada Nov 11, 2022

Choose a reason for hiding this comment

julianhyde Nov 13, 2022

Choose a reason for hiding this comment

rubenada commented Nov 11, 2022