Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: fix wrong column stats loading after analyze twice (#42076) #42097

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion statistics/handle/handle.go
Original file line number Diff line number Diff line change
Expand Up @@ -1335,9 +1335,20 @@ func (h *Handle) columnStatsFromStorage(reader *statsReader, row chunk.Row, tabl
// 2. this column is not handle, and:
// 3. the column doesn't has any statistics before, and:
// 4. loadAll is false.
//
// Here is the explanation of the condition `!col.IsStatsInitialized() || col.IsAllEvicted()`.
// For one column:
// 1. If there is no stats for it in the storage(i.e., analyze has never been executed before), then its stats status
// would be `!col.IsStatsInitialized()`. In this case we should go the `notNeedLoad` path.
// 2. If there exists stats for it in the storage but its stats status is `col.IsAllEvicted()`, there are two
// sub cases for this case. One is that the column stats have never been used/needed by the optimizer so they have
// never been loaded. The other is that the column stats were loaded and then evicted. For the both sub cases,
// we should go the `notNeedLoad` path.
// 3. If some parts(Histogram/TopN/CMSketch) of stats for it exist in TiDB memory currently, we choose to load all of
// its new stats once we find stats version is updated.
notNeedLoad := h.Lease() > 0 &&
!isHandle &&
(col == nil || !col.IsStatsInitialized() && col.LastUpdateVersion < histVer) &&
(col == nil || ((!col.IsStatsInitialized() || col.IsAllEvicted()) && col.LastUpdateVersion < histVer)) &&
!loadAll
if notNeedLoad {
count, err := h.columnCountFromStorage(reader, table.PhysicalID, histID, statsVer)
Expand Down
26 changes: 26 additions & 0 deletions statistics/integration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -658,3 +658,29 @@ func TestShowHistogramsLoadStatus(t *testing.T) {
}
}
}

func TestColumnStatsLazyLoad(t *testing.T) {
store, dom := testkit.CreateMockStoreAndDomain(t)
tk := testkit.NewTestKit(t, store)
h := dom.StatsHandle()
originLease := h.Lease()
defer h.SetLease(originLease)
// Set `Lease` to `Millisecond` to enable column stats lazy load.
h.SetLease(time.Millisecond)
tk.MustExec("use test")
tk.MustExec("create table t(a int, b int)")
tk.MustExec("insert into t values (1,2), (3,4), (5,6), (7,8)")
require.NoError(t, h.HandleDDLEvent(<-h.DDLEventCh()))
tk.MustExec("analyze table t")
is := dom.InfoSchema()
tbl, err := is.TableByName(model.NewCIStr("test"), model.NewCIStr("t"))
require.NoError(t, err)
tblInfo := tbl.Meta()
c1 := tblInfo.Columns[0]
c2 := tblInfo.Columns[1]
require.True(t, h.GetTableStats(tblInfo).Columns[c1.ID].IsAllEvicted())
require.True(t, h.GetTableStats(tblInfo).Columns[c2.ID].IsAllEvicted())
tk.MustExec("analyze table t")
require.True(t, h.GetTableStats(tblInfo).Columns[c1.ID].IsAllEvicted())
require.True(t, h.GetTableStats(tblInfo).Columns[c2.ID].IsAllEvicted())
}