[CORE] Add InputIteratorTransformer to decouple ReadRel and iterator index #3854

ulysses-you · 2023-11-27T09:36:03Z

What changes were proposed in this pull request?

This pr adds a new transformer InputIteratorTransformer to connect SparkPlan and TransformSupport. It replace the original ColumnarInputAdapter to provide the substrait plan ReadRel for the child columnar iterator, so that the TransformSupport always has input. It would be transformed to ValueStreamNode at native Velox side.

The reason is that:

decopule with scan transofrmer, the scan transofrmer does not need iterator index
make the whole stage transform framework more clear, all the TransformSupport does not need to check if the child is a TransformSupport and build a ReadRel by theirself
make the metrics framework more clear, the metrics should belong to the input iterator rather than the operators
it aligns with native side, e.g., velox backend will transform it to ValueStreamNode

How was this patch tested?

Pass CI and manually test

github-actions · 2023-11-27T09:36:21Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2023-11-27T09:36:37Z

Run Gluten Clickhouse CI

github-actions · 2023-11-27T11:35:27Z

Run Gluten Clickhouse CI

github-actions · 2023-11-27T12:42:44Z

Run Gluten Clickhouse CI

github-actions · 2023-11-28T03:20:35Z

Run Gluten Clickhouse CI

github-actions · 2023-11-28T03:29:27Z

Run Gluten Clickhouse CI

github-actions · 2023-11-28T04:03:16Z

Run Gluten Clickhouse CI

backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/MetricsApiImpl.scala

github-actions · 2023-11-28T06:11:46Z

Run Gluten Clickhouse CI

github-actions · 2023-11-28T07:16:34Z

Run Gluten Clickhouse CI

github-actions · 2023-11-28T07:33:10Z

Run Gluten Clickhouse CI

github-actions · 2023-11-28T11:28:36Z

Run Gluten Clickhouse CI

github-actions · 2023-11-29T02:19:52Z

Run Gluten Clickhouse CI

ulysses-you · 2023-11-29T02:30:55Z

/Benchmark Velox

ulysses-you · 2023-11-29T02:56:58Z

Run Gluten Clickhouse CI

github-actions · 2023-11-29T03:19:17Z

Run Gluten Clickhouse CI

GlutenPerfBot · 2023-11-29T03:39:55Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_3854_time.csv	log/native_master_11_28_2023_e0f197fc2_time.csv	difference	percentage
q1	35.12	33.03	-2.093	94.04%
q2	24.95	24.97	0.021	100.08%
q3	37.93	37.84	-0.096	99.75%
q4	36.93	38.37	1.437	103.89%
q5	71.93	71.84	-0.089	99.88%
q6	7.12	7.08	-0.036	99.49%
q7	83.88	83.42	-0.464	99.45%
q8	85.25	86.34	1.097	101.29%
q9	123.15	124.39	1.247	101.01%
q10	45.45	45.34	-0.110	99.76%
q11	19.56	20.27	0.719	103.67%
q12	27.02	26.06	-0.962	96.44%
q13	46.75	47.23	0.483	101.03%
q14	16.09	13.83	-2.263	85.94%
q15	27.90	29.53	1.634	105.86%
q16	15.47	15.40	-0.076	99.51%
q17	102.94	102.38	-0.565	99.45%
q18	148.26	150.59	2.325	101.57%
q19	14.51	13.04	-1.469	89.88%
q20	27.96	27.68	-0.279	99.00%
q21	227.71	223.37	-4.342	98.09%
q22	13.29	14.21	0.924	106.95%
total	1239.16	1236.20	-2.957	99.76%

github-actions · 2023-11-29T04:30:32Z

Run Gluten Clickhouse CI

github-actions · 2023-11-29T06:38:53Z

Run Gluten Clickhouse CI

github-actions · 2023-11-29T08:14:17Z

Run Gluten Clickhouse CI

ulysses-you · 2023-11-29T12:15:35Z

cc @Yohahaha @zhztheplayer @zhouyuan @zzcclp thank you

zhztheplayer · 2023-11-30T01:43:51Z

cc @rui-mo

zhztheplayer · 2023-11-30T02:29:42Z

backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/MetricsApiImpl.scala

+      sparkContext: SparkContext): Map[String, SQLMetric] = {
+    Map(
+      "cpuCount" -> SQLMetrics.createMetric(sparkContext, "cpu wall time count"),
+      "wallNanos" -> SQLMetrics.createNanoTimingMetric(sparkContext, "totaltime of batch scan"),


Is it accurate to use "totaltime of batch scan"? I thought the transformer is common for both scan and other inputs.

good catch, I think it is copied from a wrong code path..

zhztheplayer · 2023-11-30T02:36:15Z

gluten-core/src/main/scala/org/apache/spark/sql/execution/ColumnarCollapseTransformStages.scala

+ * `ReadRel` for the child columnar iterator, so that the [[TransformSupport]] always has input. It
+ * would be transformed to `ValueStreamNode` at native side.
+ */
+case class InputIteratorTransformer(child: SparkPlan) extends UnaryTransformSupport {


Should we combine InputIteratorTransformer and ColumnarInputAdapter later on?

It seems we do not need ColumnarInputAdapter any more

github-actions · 2023-11-30T03:28:49Z

Run Gluten Clickhouse CI

Yohahaha · 2023-11-30T11:49:59Z

LGTM! thank you for this great patch.

zhztheplayer

Thanks!

GlutenPerfBot · 2023-11-30T13:04:16Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_3854_time.csv	log/native_master_11_29_2023_0ac1dc42d_time.csv	difference	percentage
q1	33.98	34.61	0.625	101.84%
q2	25.13	24.87	-0.258	98.97%
q3	37.79	36.61	-1.186	96.86%
q4	36.87	36.91	0.036	100.10%
q5	70.21	71.43	1.218	101.74%
q6	5.42	7.22	1.805	133.29%
q7	86.89	85.50	-1.384	98.41%
q8	85.81	88.07	2.258	102.63%
q9	127.83	127.33	-0.505	99.61%
q10	46.12	44.31	-1.806	96.08%
q11	20.13	20.45	0.315	101.56%
q12	24.65	27.02	2.367	109.60%
q13	45.90	46.05	0.145	100.32%
q14	18.00	18.98	0.980	105.44%
q15	27.25	29.21	1.961	107.20%
q16	15.50	15.65	0.152	100.98%
q17	101.31	101.75	0.442	100.44%
q18	149.21	150.75	1.541	101.03%
q19	14.15	13.94	-0.207	98.54%
q20	27.12	30.32	3.206	111.82%
q21	224.64	225.21	0.564	100.25%
q22	13.44	13.20	-0.240	98.21%
total	1237.36	1249.39	12.029	100.97%

GlutenPerfBot · 2023-11-30T21:51:40Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_master_11_30_2023_time.csv	log/native_master_11_29_2023_0ac1dc42d_time.csv	difference	percentage
q1	32.81	34.61	1.793	105.46%
q2	26.03	24.87	-1.160	95.54%
q3	37.62	36.61	-1.010	97.32%
q4	37.38	36.91	-0.468	98.75%
q5	72.32	71.43	-0.888	98.77%
q6	6.56	7.22	0.667	110.17%
q7	86.62	85.50	-1.116	98.71%
q8	87.07	88.07	1.001	101.15%
q9	126.83	127.33	0.496	100.39%
q10	47.00	44.31	-2.689	94.28%
q11	20.00	20.45	0.453	102.26%
q12	25.82	27.02	1.199	104.64%
q13	46.37	46.05	-0.318	99.31%
q14	16.00	18.98	2.976	118.60%
q15	27.70	29.21	1.502	105.42%
q16	15.45	15.65	0.200	101.30%
q17	102.12	101.75	-0.378	99.63%
q18	150.61	150.75	0.137	100.09%
q19	12.85	13.94	1.090	108.48%
q20	27.48	30.32	2.842	110.34%
q21	226.90	225.21	-1.689	99.26%
q22	13.04	13.20	0.165	101.27%
total	1244.58	1249.39	4.805	100.39%

zzcclp · 2023-12-01T01:54:13Z

Good patch. Sorry for the late review, if there is any issue for ch backend, I will fix later.

ulysses-you force-pushed the input-transformer branch from a31f6d8 to 5348287 Compare November 28, 2023 03:20

Yohahaha reviewed Nov 28, 2023

View reviewed changes

backends-velox/src/main/scala/io/glutenproject/backendsapi/velox/MetricsApiImpl.scala Outdated Show resolved Hide resolved

ulysses-you force-pushed the input-transformer branch from 7565873 to 28f4e87 Compare November 28, 2023 07:16

Add InputIteratorTransformer to decouple ReadRel and iterator index

3b665d3

ulysses-you force-pushed the input-transformer branch from 9899965 to 3b665d3 Compare November 28, 2023 11:28

fix ch ci

2fbe3d5

fix ci

11c9add

fix ch ci

7ae51e8

fix ch ci

ccdb82c

fix ch ci

ad82c75

zhztheplayer reviewed Nov 30, 2023

View reviewed changes

ulysses-you added 2 commits November 30, 2023 11:13

address comments

bf0da0a

address comments

e4e68fb

ulysses-you requested a review from zhztheplayer November 30, 2023 09:00

zhztheplayer approved these changes Nov 30, 2023

View reviewed changes

ulysses-you merged commit d2980b7 into apache:main Nov 30, 2023
18 checks passed

ulysses-you deleted the input-transformer branch November 30, 2023 12:27

ulysses-you mentioned this pull request Dec 5, 2023

[GLUTEN-3854][CORE][FOLLOWUP] Add ColumnarInputAdapter back to recover UI graph #3933

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE] Add InputIteratorTransformer to decouple ReadRel and iterator index #3854

[CORE] Add InputIteratorTransformer to decouple ReadRel and iterator index #3854

ulysses-you commented Nov 27, 2023 •

edited

Loading

github-actions bot commented Nov 27, 2023

github-actions bot commented Nov 27, 2023

github-actions bot commented Nov 27, 2023

github-actions bot commented Nov 27, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 29, 2023

ulysses-you commented Nov 29, 2023

ulysses-you commented Nov 29, 2023

github-actions bot commented Nov 29, 2023

GlutenPerfBot commented Nov 29, 2023

github-actions bot commented Nov 29, 2023

github-actions bot commented Nov 29, 2023

github-actions bot commented Nov 29, 2023

ulysses-you commented Nov 29, 2023

zhztheplayer commented Nov 30, 2023

zhztheplayer Nov 30, 2023

ulysses-you Nov 30, 2023

zhztheplayer Nov 30, 2023

ulysses-you Nov 30, 2023

github-actions bot commented Nov 30, 2023

Yohahaha commented Nov 30, 2023

zhztheplayer left a comment

GlutenPerfBot commented Nov 30, 2023

GlutenPerfBot commented Nov 30, 2023

zzcclp commented Dec 1, 2023

[CORE] Add InputIteratorTransformer to decouple ReadRel and iterator index #3854

[CORE] Add InputIteratorTransformer to decouple ReadRel and iterator index #3854

Conversation

ulysses-you commented Nov 27, 2023 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Nov 27, 2023

github-actions bot commented Nov 27, 2023

github-actions bot commented Nov 27, 2023

github-actions bot commented Nov 27, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Nov 29, 2023

ulysses-you commented Nov 29, 2023

ulysses-you commented Nov 29, 2023

github-actions bot commented Nov 29, 2023

GlutenPerfBot commented Nov 29, 2023

github-actions bot commented Nov 29, 2023

github-actions bot commented Nov 29, 2023

github-actions bot commented Nov 29, 2023

ulysses-you commented Nov 29, 2023

zhztheplayer commented Nov 30, 2023

zhztheplayer Nov 30, 2023

Choose a reason for hiding this comment

ulysses-you Nov 30, 2023

Choose a reason for hiding this comment

zhztheplayer Nov 30, 2023

Choose a reason for hiding this comment

ulysses-you Nov 30, 2023

Choose a reason for hiding this comment

github-actions bot commented Nov 30, 2023

Yohahaha commented Nov 30, 2023

zhztheplayer left a comment

Choose a reason for hiding this comment

GlutenPerfBot commented Nov 30, 2023

GlutenPerfBot commented Nov 30, 2023

zzcclp commented Dec 1, 2023

ulysses-you commented Nov 27, 2023 •

edited

Loading