Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(query): fix table stats incorrect when create table with exists snapshot #10424

Merged
merged 3 commits into from
Mar 9, 2023

Conversation

TCeason
Copy link
Collaborator

@TCeason TCeason commented Mar 8, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

fix table stats incorrect when create table with exists snapshot

When create table with options SNAPSHOT_LOCATION,

after create table , will judge whether the SNAPSHOT_LOCATION actually exists.

  • If exists, will regenerate the table statistic and recreate table.

  • if not exists, will drop the table and return err.

Closes #10406

@vercel
Copy link

vercel bot commented Mar 8, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated
databend ⬜️ Ignored (Inspect) Mar 8, 2023 at 11:32AM (UTC)

@mergify mergify bot added the pr-bugfix this PR patches a bug in codebase label Mar 8, 2023
@TCeason TCeason force-pushed the ISSUE-10406 branch 4 times, most recently from 6f104a9 to 84ef1b7 Compare March 8, 2023 05:58
@BohuTANG BohuTANG requested a review from dantengsky March 8, 2023 08:04
@dantengsky
Copy link
Member

https://github.com/datafuselabs/databend/blob/84ef1b7bd63310a6b905f2e4448b59efb52c6b34/src/query/service/src/interpreters/interpreter_table_create.rs#L193-L200

            let snapshot = snapshot_reader.read(&params).await;
            match snapshot {
                Ok(snapshot) => {
                     .... 
                }
                Err(e) => { 
                    catalog
                        .drop_table_by_id(DropTableByIdReq {
                            if_exists: true,
                            tb_id: table.get_table_info().ident.table_id,
                        })
                        .await?;
                    return Err(e);
                }
            }
        }

will a new table be left there with incorrect statistics , if catalog.drop_table_by_id failed, let's say, caused by a network error?


How about this?

@dantengsky
Copy link
Member

dantengsky commented Mar 8, 2023

How about this?
....

O... the operator is not there, how about using the ctx.get_data_operator

https://github.com/datafuselabs/databend/blob/829e236b12bac80157588919feef63938c0b4ce9/src/query/catalog/src/table_context.rs#L125

technically, it is not correct, but

  • it seems to be able to meet the "current requirement"
  • cloning a table by using magic option is more like a tmp workaround

@TCeason
Copy link
Collaborator Author

TCeason commented Mar 8, 2023

https://github.com/datafuselabs/databend/blob/84ef1b7bd63310a6b905f2e4448b59efb52c6b34/src/query/service/src/interpreters/interpreter_table_create.rs#L193-L200

            let snapshot = snapshot_reader.read(&params).await;
            match snapshot {
                Ok(snapshot) => {
                     .... 
                }
                Err(e) => { 
                    catalog
                        .drop_table_by_id(DropTableByIdReq {
                            if_exists: true,
                            tb_id: table.get_table_info().ident.table_id,
                        })
                        .await?;
                    return Err(e);
                }
            }
        }

will a new table be left there with incorrect statistics , if catalog.drop_table_by_id failed, let's say, caused by a network error?
How about this?

O... the operator

We can do this modify

    async fn table_statistics(&self) -> Result<Option<TableStatistics>> {
        let s = &self.table_info.meta.statistics;
        if s.number_of_rows != 0 {
            Ok(Some(TableStatistics {
                num_rows: Some(s.number_of_rows),
                data_size: Some(s.data_bytes),
                data_size_compressed: Some(s.compressed_data_bytes),
                index_size: Some(s.index_data_bytes),
            }))
        } else {
            // add read snapshot in there
        }
    }

If num_rows is 0 means the new table or not stat is incorrect, we can read snapshot.

@dantengsky
Copy link
Member

If num_rows is 0 means the new table or not stat is incorrect, we can read snapshot.

not sure if num_rows is 0 indicates that the stat is incorrect. (seems we should not conclude like this)

besides: there is an implicit constraint for the Table::table_statistics,
this method is used in management mode , which can not access object storage .

@dantengsky
Copy link
Member

If so , how about triggering a backoff: #10424 (comment)

not sure about this, @sundy-li we need your help

@TCeason
Copy link
Collaborator Author

TCeason commented Mar 8, 2023

If so , how about triggering a backoff: #10424 (comment)

not sure about this, @sundy-li we need your help

We can not do this. Because the root cause is stats incorrect, in there we can backoff ,but join or other use stats will also generate err result.

@TCeason TCeason requested review from sundy-li and BohuTANG March 8, 2023 11:35
Copy link
Member

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is enough to fix the issue without too much effort.

@TCeason
Copy link
Collaborator Author

TCeason commented Mar 8, 2023

The current implementation is enough to fix the issue without too much effort.

A hack usage generates a hack modify pr.. How about supporting https://docs.snowflake.com/en/sql-reference/sql/create-clone in a later version?

@sundy-li
Copy link
Member

sundy-li commented Mar 8, 2023

The current implementation is enough to fix the issue without too much effort.

A hack usage generates a hack modify pr.. How about supporting docs.snowflake.com/en/sql-reference/sql/create-clone in a later version?

That could be great.

set hide_options_in_show_create_table = 0; is hard to use.

@BohuTANG BohuTANG merged commit 486944d into databendlabs:main Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: incorrect snapshot stats after clone to a new table
4 participants