Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: dynamically update the average column size #6170

Merged
merged 19 commits into from
Apr 13, 2018

Conversation

fipped
Copy link
Contributor

@fipped fipped commented Mar 28, 2018

This PR is about the work of updating the tot_col_size of histograms when executing insert, delete and update statement. Therefore, a correct average column size can be computed when we use show stats_histograms statement.
The tot_col_size we dynamically maintain will be a little different from that computed by analyze statement since we get the size from the value directly instead of computing from the coded bytes, that is we leave out the length byte.
PTAL @lamxTyler @winoros


ctx.GetSessionVars().TxnCtx.UpdateDeltaForTable(t.Meta().ID, 0, 1)
colSize := make(map[int64]int64)
for id, col := range t.WritableCols() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also consider the column state?

ctx.GetSessionVars().TxnCtx.UpdateDeltaForTable(t.Meta().ID, 0, 1)
colSize := make(map[int64]int64)
for id, col := range t.WritableCols() {
colSize[col.ID] = int64(len(newData[id].GetBytes()) - len(oldData[id].GetBytes()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to record the col size for fix length type.

hist.TotColSize = temp

c.Assert(hist.CMSketch, NotNil)
c.Assert(newStatsTbl.Columns[id].CMSketch, NotNil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already checked at CMSketch.Equal.

}
updated := h.ctx.GetSessionVars().StmtCtx.AffectedRows() > 0
_, err = h.ctx.(sqlexec.SQLExecutor).Execute(ctx, "commit")
return updated, errors.Trace(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the error is not nil, then updated will be always true.

version := h.ctx.Txn().StartTS()

for key, val := range delta.ColSize {
sql := fmt.Sprintf("update mysql.stats_histograms set version = %d, tot_col_size = tot_col_size + %d where hist_id = %d and table_id = %d", version, val, key, id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be two histograms with the same hist_id and table_id.

ctx.GetSessionVars().TxnCtx.UpdateDeltaForTable(t.Meta().ID, 0, 1)
colSize := make(map[int64]int64)
for id, col := range t.WritableCols() {
colSize[col.ID] = int64(len(newData[id].GetBytes()) - len(oldData[id].GetBytes()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetBytes is only valid when the Kind of the Datum is KindString or KindBytes.

Maybe we need to add a method for Datum to return the size?

colSize := make(map[int64]int64)
for id, col := range t.WritableCols() {
if col.State == model.StatePublic && newData[id].Size() != 0 {
colSize[col.ID] = int64(newData[id].Size() - oldData[id].Size())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right? If the new data happens to be null, but the old data is not null.

colSize := make(map[int64]int64)
for id, col := range t.WritableCols() {
if col.State == model.StatePublic {
colSize[col.ID] = -int64(data[id].Size())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Size is 0, it is not needed to update it?

colSize := make(map[int64]int64)
for id, col := range t.WritableCols() {
if col.State == model.StatePublic {
colSize[col.ID] = int64(r[id].Size())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

types/datum.go Outdated
@@ -122,6 +122,11 @@ func (d *Datum) IsNull() bool {
return d.k == KindNull
}

// Size gets the length of byte of the datum.
func (d *Datum) Size() int {
return len(d.b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to switch on d.k to determine the size of the datum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if we only care about varlen size, then we don't need this method, use len(d.GetBytes()) will do.

@shenli
Copy link
Member

shenli commented Apr 2, 2018

@fipped Please resolve the conflicts.

@shenli
Copy link
Member

shenli commented Apr 2, 2018

@lamxTyler PTAL

if err = h.dumpTableStatColSizeToKV(id, item); err != nil {
return errors.Trace(err)
}
m := h.globalMap[id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to update the map if the updated is true, just delete it.

Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alivxxx
Copy link
Contributor

alivxxx commented Apr 4, 2018

/run-all-tests

@alivxxx alivxxx added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 4, 2018
@fipped
Copy link
Contributor Author

fipped commented Apr 11, 2018

PTAL @coocood

@@ -101,13 +101,21 @@ type TransactionContext struct {
}

// UpdateDeltaForTable updates the delta info for some table.
func (tc *TransactionContext) UpdateDeltaForTable(tableID int64, delta int64, count int64) {
func (tc *TransactionContext) UpdateDeltaForTable(tableID int64, delta int64, count int64, colSize *map[int64]int64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use pointer for a map.


ctx.GetSessionVars().TxnCtx.UpdateDeltaForTable(t.Meta().ID, 0, 1)
colSize := make(map[int64]int64)
for id, col := range t.WritableCols() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just range over t.Cols(), it will return public columns only.

@@ -392,7 +400,16 @@ func (e *DeleteExec) removeRow(ctx sessionctx.Context, t table.Table, h int64, d
}
ctx.StmtAddDirtyTableOP(DirtyTableDeleteRow, t.Meta().ID, h, nil)
ctx.GetSessionVars().StmtCtx.AddAffectedRows(1)
ctx.GetSessionVars().TxnCtx.UpdateDeltaForTable(t.Meta().ID, -1, 1)
colSize := make(map[int64]int64)
for id, col := range t.WritableCols() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@coocood
Copy link
Member

coocood commented Apr 13, 2018

LGTM

@winoros winoros added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 13, 2018
@zz-jason zz-jason merged commit 9fbf8a6 into pingcap:master Apr 13, 2018
@sre-bot sre-bot added the contribution This PR is from a community contributor. label Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics contribution This PR is from a community contributor. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants