-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
large overestimation when where conditions contain OR and matches several indexes with different selectivity #54323
Labels
affects-5.4
This bug affects 5.4.x versions.
affects-6.1
affects-6.5
affects-7.1
affects-7.5
affects-8.1
sig/planner
SIG: Planner
type/enhancement
The issue or PR belongs to an enhancement.
Comments
time-and-fate
added
the
type/enhancement
The issue or PR belongs to an enhancement.
label
Jun 28, 2024
This corresponds to two places that need to be improved:
|
ti-chi-bot
added
affects-5.4
This bug affects 5.4.x versions.
affects-6.1
affects-6.5
affects-7.1
affects-7.5
affects-8.1
labels
Jul 5, 2024
A simple rust script to reproduce it: #!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[dependencies]
sqlx = { version = "0.7", features = ["mysql", "runtime-tokio-native-tls"] }
tokio = { version = "1", features = ["full"] }
rand = "0.8"
---
use sqlx::mysql::MySqlPool;
use rand::Rng;
use std::error::Error;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
// Replace with your MySQL connection string
let pool = MySqlPool::connect("mysql://root@localhost:4000/test").await?;
// Create table
sqlx::query(
"CREATE TABLE IF NOT EXISTS t(
a INT,
b INT,
c INT,
d INT,
INDEX iabc(a,b,c),
INDEX ib(b)
)"
)
.execute(&pool)
.await?;
// Function to generate random data
fn generate_data() -> (i32, i32, i32, i32) {
let mut rng = rand::thread_rng();
(
rng.gen_range(0..100000),
rng.gen_range(0..10),
rng.gen_range(0..1000),
rng.gen_range(0..1000),
)
}
// Insert initial data
for _ in 0..200 {
let data: Vec<_> = (0..20).map(|_| generate_data()).collect();
for (a, b, c, d) in data {
sqlx::query("INSERT INTO t (a, b, c, d) VALUES (?, ?, ?, ?)")
.bind(a)
.bind(b)
.bind(c)
.bind(d)
.execute(&pool)
.await?;
}
}
// Double the data multiple times
for _ in 0..5 { // Adjust this number based on how much data you want
sqlx::query("INSERT INTO t SELECT * FROM t")
.execute(&pool)
.await?;
}
// Analyze table
sqlx::query("ANALYZE TABLE t")
.execute(&pool)
.await?;
println!("Script completed successfully.");
Ok(())
} |
// We greedy select the stats info based on:
// (1): The stats type, always prefer the primary key or index.
// (2): The number of expression that it covers, the more the better.
// (3): The number of columns that it contains, the less the better.
// (4): The selectivity of the covered conditions, the less the better.
// The rationale behind is that lower selectivity tends to reflect more functional dependencies
// between columns. It's hard to decide the priority of this rule against rule 2 and 3, in order
// to avoid massive plan changes between tidb-server versions, I adopt this conservative strategy
// to impose this rule after rule 2 and 3.
if (bestTp == ColType && set.Tp != ColType) ||
bestCount < bits ||
(bestCount == bits && bestNumCols > set.numCols) ||
(bestCount == bits && bestNumCols == set.numCols && bestSel > set.Selectivity) {
bestID, bestCount, bestTp, bestNumCols, bestMask, bestSel = i, bits, set.Tp, set.numCols, curMask, set.Selectivity
} This is because that the column number of the stats on (ib) is less than the stats on (iabc), so it chooses the stats on (ib) first. |
13 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
affects-5.4
This bug affects 5.4.x versions.
affects-6.1
affects-6.5
affects-7.1
affects-7.5
affects-8.1
sig/planner
SIG: Planner
type/enhancement
The issue or PR belongs to an enhancement.
Reproduce
There is a big overestimation
The text was updated successfully, but these errors were encountered: