Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiFlash crash after dropping a column #5154

Closed
JaySon-Huang opened this issue Jun 15, 2022 · 9 comments · Fixed by #5166
Closed

TiFlash crash after dropping a column #5154

JaySon-Huang opened this issue Jun 15, 2022 · 9 comments · Fixed by #5166

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Jun 15, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

mysql> create table test (A int, B varchar(20), C int, D int, PRIMARY KEY(A,C) CLUSTERED);
# drop column before the column composing clustered index
mysql> insert into test values (1,'1',1,1),(2,'2',2,2);
mysql> alter table test drop column B;
# insert some rows
mysql> insert into test values (3,3,3),(4,4,4);
mysql> insert into test values (5,5,5),(6,6,6);
mysql> insert into test values (7,7,7),(8,8,8);
mysql> alter table test set tiflash replica 1;

2. What did you expect to see? (Required)

TiFlash run normally

3. What did you see instead (Required)

[2022/06/15 17:42:56.574 +08:00] [ERROR] [Exception.cpp:85] ["DB::EngineStoreApplyRes DB::HandleWriteRaftCmd(const DB::EngineStoreServerWrap *, DB::WriteCmdsView, DB::RaftCmdHeader):Code: 12, e.displayText() = DB::Exception:
Parameters start = 0, length = 1 are out of bound in ColumnVector<T>::insertRangeFrom method (data.size() = 0)., e.what() = DB::Exception, Stack trace:


       0x1d272d3    StackTrace::StackTrace() [tiflash+30569171]
                    dbms/src/Common/StackTrace.cpp:23
       0x1d248d6    DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+30558422]
                    dbms/src/Common/Exception.h:41
       0x768eafa    DB::ColumnVector<unsigned long>::insertRangeFrom(DB::IColumn const&, unsigned long, unsigned long) [tiflash+124316410]
                    dbms/src/Columns/ColumnVector.cpp:199
       0x789403b    DB::DM::ColumnFileInMemory::append(DB::DM::DMContext&, DB::Block const&, unsigned long, unsigned long, unsigned long) [tiflash+126435387]
                    dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileInMemory.cpp:82
       0x78bce78    DB::DM::MemTableSet::appendToCache(DB::DM::DMContext&, DB::Block const&, unsigned long, unsigned long) [tiflash+126602872]
                    dbms/src/Storages/DeltaMerge/Delta/MemTableSet.cpp:149
       0x78aa731    DB::DM::DeltaValueSpace::appendToCache(DB::DM::DMContext&, DB::Block const&, unsigned long, unsigned long) [tiflash+126527281]
                    dbms/src/Storages/DeltaMerge/Delta/DeltaValueSpace.cpp:118
       0x77c3aac    DB::DM::Segment::writeToCache(DB::DM::DMContext&, DB::Block const&, unsigned long, unsigned long) [tiflash+125581996]
                    dbms/src/Storages/DeltaMerge/Segment.cpp:306
       0x778ec09    DB::DM::DeltaMergeStore::write(DB::Context const&, DB::Settings const&, DB::Block&) [tiflash+125365257]
                    dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:653
       0x79170f2    DB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*)::$_0::operator()(bool) const [tiflash+126972146]
                    dbms/src/Storages/Transaction/PartitionStreams.cpp:145
       0x7912fa1    DB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*) [tiflash+126955425]
                    dbms/src/Storages/Transaction/PartitionStreams.cpp:178
       0x7912c9c    DB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*, bool) [tiflash+126954652]
                    dbms/src/Storages/Transaction/PartitionStreams.cpp:348
       0x79351ff    DB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&) [tiflash+127095295]
                    dbms/src/Storages/Transaction/Region.cpp:712
       0x7901910    DB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&) [tiflash+126884112]
                    dbms/src/Storages/Transaction/KVStore.cpp:285
       0x791b822    HandleWriteRaftCmd [tiflash+126990370]
                    dbms/src/Storages/Transaction/ProxyFFI.cpp:92
  0x7f0223e736e0    raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::process_raft_cmd::h4c8bba241d5ee4d4 [libtiflash_proxy.so+28567264]
  0x7f0223e70d6c    raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::handle_raft_committed_entries::h478c4bad834ddec7 [libtiflash_proxy.so+28556652]
  0x7f0223e90f91    raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_apply::he4c0b1137ae7cad9 [libtiflash_proxy.so+28688273]
  0x7f0223e93bfe    raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_tasks::h5b138e5889c69e66 [libtiflash_proxy.so+28699646]
  0x7f0223e96510    _$LT$raftstore..store..fsm..apply..ApplyPoller$LT$EK$GT$$u20$as$u20$batch_system..batch..PollHandler$LT$raftstore..store..fsm..apply..ApplyFsm$LT$EK$GT$$C$raftstore..store..fsm..apply..ControlFsm$GT$$GT$::handle_normal::he20c24833a37424f [libtiflash_proxy.so+28710160]
  0x7f0223b0925f    batch_system::batch::Poller$LT$N$C$C$C$Handler$GT$::poll::hec57674a22fdcc30 [libtiflash_proxy.so+24986207]
  0x7f02241b2cd8    std::sys_common::backtrace::__rust_begin_short_backtrace::h189c0b2868fa537c [libtiflash_proxy.so+31972568]
  0x7f02239eaec1    core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h7c53021c905641b6 [libtiflash_proxy.so+23813825]
  0x7f02243c8fca    std::sys::unix::thread::Thread::new::thread_start::hd39c5f08bdcda277 [libtiflash_proxy.so+34160586]
  0x7f0221a12ea5    start_thread [libpthread.so.0+32421]
                    /usr/src/debug/glibc-2.17-c758a686/nptl/pthread_create.c:307
  0x7f0221525b0d    <unknown symbol> [libc.so.6+1043213]
                    /usr/src/debug////////glibc-2.17-c758a686/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:111"] [thread_id=14]

4. What is your TiFlash version? (Required)

v6.1.0

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jun 16, 2022

This is a decoding exception caused by dropping a column that precedes the column that makes up the common handle when the table is created with clustered_index.

This bug is caused by we treat the IndexColumnInfo::offset of TiDB::TableInfo::PrimaryIndexInfo::idx_cols as the offset of ColumnsInfo, but actually it is the offset value of ColumnInfo.

const auto & primary_index_info = table_info_.getPrimaryIndexInfo();
for (size_t i = 0; i < primary_index_info.idx_cols.size(); i++)
{
auto pk_column_id = table_info_.columns[primary_index_info.idx_cols[i].offset].id;
pk_column_ids.emplace_back(pk_column_id);
pk_pos_map.emplace(pk_column_id, reinterpret_cast<size_t>(std::numeric_limits<size_t>::max()));
}

For example, assume a table with [{column A, offset=0, id=1}, {column B, offset=1, id=2}, {column C, offset=2, id=3}, {column D, offset=3, id=4}] and the primary key is composed by [column A, column B] (represented by [offset=0, offset=2] in the PrimaryIndexInfo).
After dropping column B, we unexpected parse and get an incorrect result [column A, columnD] as the primary key. That makes trouble in decoding the value of column C from row and makes this bug happen.

It seems first introduced by #3431 since 5.4.0

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jun 16, 2022

Here is a (complicated) unit test along with some debug loggings to reproduce this case: RegionBlockReaderTestFixture.NullJSON in dbms/src/Storages/Transaction/tests/gtest_region_block_reader.cpp of this branch https://github.com/pingcap/tiflash/compare/master...JaySon-Huang:debug_5154?expand=1

/cc @hongyunyan

@flowbehappy
Copy link
Contributor

Thanks, @JaySon-Huang , great investigation! Let's further follow up and fix this issue @hongyunyan .

@flowbehappy
Copy link
Contributor

/assign hongyunyan

@ti-chi-bot
Copy link
Member

@flowbehappy: GitHub didn't allow me to assign the following users: hongyunyan.

Note that only pingcap members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign hongyunyan

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Mini256
Copy link
Member

Mini256 commented Jun 20, 2022

/assign hongyunyan

@ti-chi-bot
Copy link
Member

@Mini256: GitHub didn't allow me to assign the following users: hongyunyan.

Note that only pingcap members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign hongyunyan

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hongyunyan
Copy link
Contributor

test

@Mini256
Copy link
Member

Mini256 commented Jun 20, 2022

/assign hongyunyan

Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this issue Jul 12, 2022
…s in README (pingcap#5182)

close pingcap#5172, ref pingcap#5178

Enhancement: add a integrated test on DDL module (pingcap#5130)

ref pingcap#5129

Revert "Revise default background threads size" (pingcap#5176)

close pingcap#5177

chore: remove extra dyn cast (pingcap#5186)

close pingcap#5185

Add MPPReceiverSet, which includes ExchangeReceiver and CoprocessorReader (pingcap#5175)

ref pingcap#5095

DDL: Use Column Name Instead of Offset to Find the common handle cluster index (pingcap#5166)

close pingcap#5154

Add random failpoint in critical paths (pingcap#4876)

close pingcap#4807

Segment test framework (pingcap#5150)

close pingcap#5151

optimize ps v3 restore (pingcap#5163)

ref pingcap#4914

Fix build failed (pingcap#5196)

close pingcap#5195

feat: delta tree dispatching (pingcap#5199)

close pingcap#5200

feat: introduce specialized API to write fixed length data rapidly (pingcap#5181)

close pingcap#5183

Add gtest for Limit, TopN, Projection (pingcap#5187) (pingcap#5188)

close pingcap#5187

add `MPPTask::handleError()` (pingcap#5202)

ref pingcap#5095

Check result of starting grpc server (pingcap#5257)

close pingcap#5255

feat: add optimized routines for aarch64 (pingcap#5231)

close pingcap#5240

fix: aarch64-quick-fix (pingcap#5259)

close pingcap#5260

Update client-c to support ipv6 (pingcap#5270)

close pingcap#5247

upgrade prometheus-cpp to v1.0.1 (pingcap#5279)

ref pingcap#2103, close pingcap#5278

Fix README type error (pingcap#5273)

ref pingcap#5178

fix(cmake): make sure libc++ is utilized by tiflash-proxy (pingcap#5281)

close pingcap#5282

fix the wrong order of execution summary for list based executors (pingcap#5242)

close pingcap#5241

Schema: allow loading empty schema diff when the version grows up. (pingcap#5245)

close pingcap#5244

Optimize apply speed under heavy write pressure (pingcap#4883)

ref pingcap#4728

update proxy to raftstore-proxy-6.2 (pingcap#5287)

ref pingcap#4982

Flush segment cache when doing the compaction (pingcap#5284)

close pingcap#5179

metrics: Fix incorrect metrics for delta_merge tasks (pingcap#5061)

close pingcap#5055

dep: upgrade jemalloc (pingcap#5197)

close pingcap#5258

*: TiFlash pagectl/dttool use only-decryption mode (pingcap#5271)

close pingcap#5122

suppresion false positive report from tsan (pingcap#5303)

close pingcap#5088

Refine test framework code and tests (pingcap#5261)

close pingcap#5262

feat: add logical cpu cores and memory into grafana (pingcap#5124)

close pingcap#3821

Implement TimeToSec function push down (pingcap#5235)

close pingcap#5116

feat: implement shiftRight function push down (pingcap#5156)

close pingcap#5100

schema : make update to partition tables when 'set tiflash replica' (pingcap#5267)

close pingcap#5266

Replace initializer_list with vector for planner test framework (pingcap#5307)

close pingcap#5295

KVStore: decouple flush region and CompactLog with a new FFI fn_try_flush_data (pingcap#5283)

ref pingcap#5170

refine error message in mpptask (pingcap#5304)

ref pingcap#5095

Implement ReverseUTF8/Reverse function push down (pingcap#5233)

close pingcap#5111

Optimize comparision for collation `UTF8_BIN` and `UTF8MB4_BIN` (pingcap#5299)

ref pingcap#5294

feat : support set tiflash mode ddl action (pingcap#5256)

ref pingcap#5252

Add non-blocking functions for MPMCQueue (pingcap#5311)

close pingcap#5310

add random segment test for CI weekly (pingcap#5300)

close pingcap#5301

*: tidy FunctionString.cpp (pingcap#5312)

close pingcap#5313

ci: fix check-license github action (pingcap#5318)

close pingcap#5317

update proxy to raftstore-proxy-6.2 (pingcap#5316)

ref pingcap#4982

Change one `additional_input_at_end` to many streams in `ParallelInputsProcessor`  (pingcap#5274)

close pingcap#4856, close pingcap#5263

support fine grained shuffle for window function (pingcap#5048)

close pingcap#5142

feat: pushdown get_format into TiFlash (pingcap#5269)

close pingcap#5115

fix: format throw data truncated error (pingcap#5272)

close pingcap#4891

Print content of columns for gtest (pingcap#5243)

close pingcap#5203

*: also enable O3 for aarch64 (pingcap#5338)

close pingcap#5342

Add debug image build target for CentOS7 (pingcap#5344)

close pingcap#5343

*: mini refactor (pingcap#5326)

close pingcap#4739

Refactor initialize of background pool (pingcap#5190)

close pingcap#5189

delete copy/move ctor of MPMCQueue explicitly (pingcap#5328)

close pingcap#5329

Introduce proxy_server and new-mock-engine-store (pingcap#5319)

ref pingcap#5170

fix: incorrect uptime in grafana panel

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this issue Jul 19, 2022
JaySon-Huang pushed a commit to JaySon-Huang/tiflash that referenced this issue Sep 27, 2022
JaySon-Huang added a commit that referenced this issue Oct 31, 2022
* DDL: Use Column Name Instead of Offset to Find the common handle cluster index (#5166) (#5191)

close #5154

* tests: Fix RegionBlockReaderTest helper functions (#5899) (#5924)

ref #5859

* Fix decode error when "NULL" value in the column with "primary key" flag (#5879) (#5932)

close #5859

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants