Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ignore dropped column statistics by column id when reducing block statistics #10051

Merged
merged 4 commits into from
Feb 15, 2023
Merged

fix: ignore dropped column statistics by column id when reducing block statistics #10051

merged 4 commits into from
Feb 15, 2023

Conversation

lichuang
Copy link
Contributor

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

  1. fix: ignore dropped column statistics by column id when reducing block statistics
  2. add test case.

Closes #10020

@lichuang lichuang requested a review from zhyass February 15, 2023 03:49
@vercel
Copy link

vercel bot commented Feb 15, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated
databend ⬜️ Ignored (Inspect) Feb 15, 2023 at 7:55AM (UTC)

@lichuang lichuang changed the title Update add column bug fix: ignore dropped column statistics by column id when reducing block statistics Feb 15, 2023
@mergify mergify bot added the pr-bugfix this PR patches a bug in codebase label Feb 15, 2023
@BohuTANG BohuTANG merged commit 7329f4b into databendlabs:main Feb 15, 2023
@zhyass
Copy link
Member

zhyass commented Feb 16, 2023

Thank you very much!
But there may be another problem, the case of the added column is ignored here.
For example:

mysql> create table t(a int);
Query OK, 0 rows affected (0.07 sec)

mysql> insert into t values(1),(2);
Query OK, 2 rows affected (0.12 sec)

mysql> alter table t add column b int default 0;
Query OK, 0 rows affected (0.04 sec)

mysql> insert into t values(3,3),(4,4);
Query OK, 2 rows affected (0.03 sec)

mysql> select * from t;
+------+------+
| a    | b    |
+------+------+
|    1 |    0 |
|    2 |    0 |
|    3 |    3 |
|    4 |    4 |
+------+------+
4 rows in set (0.04 sec)
Read 4 rows, 32.00 B in 0.012 sec., 337.97 rows/sec., 2.64 KiB/sec.

mysql> select * from t where b=0;
+------+------+
| a    | b    |
+------+------+
|    1 |    0 |
|    2 |    0 |
+------+------+
2 rows in set (0.03 sec)
Read 2 rows, 16.00 B in 0.010 sec., 193.03 rows/sec., 1.51 KiB/sec.

mysql> optimize table t compact ;
Query OK, 4 rows affected (0.04 sec)

mysql> select * from t where b=0;
Empty set (0.04 sec)
Read 0 rows, 0.00 B in 0.010 sec., 0 rows/sec., 0.00 B/sec.


mysql> explain select * from t where b=0;
+--------------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                                  |
+--------------------------------------------------------------------------------------------------------------------------+
| Filter                                                                                                                   |
| ├── filters: [eq(t.b (#1), to_int32(0))]                                                                                 |
| ├── estimated rows: 0.00                                                                                                 |
| └── TableScan                                                                                                            |
|     ├── table: default.default.t                                                                                         |
|     ├── read rows: 0                                                                                                     |
|     ├── read bytes: 0                                                                                                    |
|     ├── partitions total: 1                                                                                              |
|     ├── partitions scanned: 0                                                                                            |
|     ├── pruning stats: [segments: <range pruning: 1 to 0>, blocks: <range pruning: 0 to 0, bloom pruning: 0 to 0>]       |
|     ├── push downs: [filters: [eq(t.b (#1), 0)], limit: NONE]                                                            |
|     └── estimated rows: 4.00                                                                                             |
+--------------------------------------------------------------------------------------------------------------------------+
12 rows in set (0.04 sec)
Read 0 rows, 0.00 B in 0.005 sec., 0 rows/sec., 0.00 B/sec.

The statistics in segment after compact:

"col_stats": {
	"0": {
		"min": {
			"Number": {
				"Int32": 1
			}
		},
		"max": {
			"Number": {
				"Int32": 4
			}
		},
		"null_count": 0,
		"in_memory_size": 16,
		"distinct_of_values": 4
	},
	"1": {
		"min": {
			"Number": {
				"Int32": 3
			}
		},
		"max": {
			"Number": {
				"Int32": 4
			}
		},
		"null_count": 0,
		"in_memory_size": 8,
		"distinct_of_values": 3
	}
}

The minmax of column b isnot [3,4], it's [0,4].

@BohuTANG

This comment was marked as off-topic.

@BohuTANG

This comment was marked as off-topic.

@lichuang lichuang deleted the update_add_column_bug branch February 21, 2023 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: compact not work as expected with alter table
3 participants