Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce PageStorage V3 to reduce CPU usage and write amplification #3594

Closed
58 of 76 tasks
JaySon-Huang opened this issue Dec 3, 2021 · 9 comments
Closed
58 of 76 tasks
Assignees
Labels
type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Dec 3, 2021

Feature Request

Teachability, Documentation, Adoption, Migration Strategy:
To reduce CPU usage and write amplification, we file a proposal for new version of PageStorage: Proposal: PageStorage V3 Design
This issue records the progress of PageStorage V3 development.

Development progress

  • some preparations and refactoration
  • The framework and related data structure for PageStorage V3
  • BlobStore
  • MVCC PageDirectory
    • Basic apply edits, get PageEntry
    • GC for cleaning up removed Page
    • GC with BlobStore data movement
    • GCApply needs to scan all living page ids with a read lock, which could cause high write blocking time. Find a better way for removing external pages; fixed in PageStorage: Ref page lifetime mechanism #4174
    • Stale snapshots detection
      • Record the thread id, created time while creating a snapshot
      • Log warnings when there exist long time snapshot
      • Optimization when stale snapshot(s) exist (MVCC GC, BlobStore GC)
  • WALLog
    • LogFile format
    • Meta format for persisting the PageEntry applied to MVCC PageDirectory
    • Compaction of LogFiles
    • Multi-disks rolling
  • Benchmark vs V2
  • Compatibility with other features
    • Respect WriteLimiter && ReadLimiter
    • Respect Encryption at rest
  • Global PageStorage for one TiFlash instance
    • Cleanup data in global PageStorage instance when dropping table
  • Further improvement/features
    • Tools for debugging PageStorage (page_ctl)
    • Transfering data from old storage to V3
    • Check whether use list instead of set for FilenameSet is better PageStorage: WALStore #3891 (comment)
    • Split the pipeline of persisting write batch into smaller parts and speed up the throughput for WALStore
    • Compression on serialized entries (in WAL)

Split PRs

@JaySon-Huang JaySon-Huang added the type/feature-request Categorizes issue or PR as related to a new feature. label Dec 3, 2021
@jiaqizho
Copy link
Contributor

jiaqizho commented Dec 3, 2021

Transfering part similar like : #3269

And just a reference: we need to update page_ctl at last. :)

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Dec 3, 2021

And just a reference: we need to update page_ctl at last. :)

Added to the list.

@jiaqizho
Copy link
Contributor

2022/1/27 found some bugs when running stress test.

When i running page storage stress(need change code to enable restore) test with this cmd :

./dbms/src/Storages/Page/page_stress_testing -M 16 -V 3

every reader got error :


[2022/01/27 15:04:42.608 +08:00] [ERROR] [<unknown>] ["root:Code: 10015, e.displayText() = DB::Exception: Entry [id=161] at [idx=4] not exist!, e.what() = DB::Exception, Stack trace:\n\n0. 0   page_stress_testing
0x000000010281662c _ZN10StackTraceC2Ev + 36\n1. 1   page_stress_testing
0x000000010281666c _ZN10StackTraceC1Ev + 32\n2. 2   page_stress_testing
0x0000000102476e88 _ZN2DB9ExceptionC2ERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEi + 88\n3. 3   page_stress_testing
0x0000000102472ff4 _ZN2DB9ExceptionC1ERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEi + 48\n4. 4   page_stress_testing
0x000000010272612c _ZNK2DB2PS2V313PageDirectory3getERKNSt3__16vectorIyNS3_9allocatorIyEEEERKNS3_10shared_ptrINS1_21PageDirectorySnapshotEEE + 684\n5. 5   page_stress_testing
0x0000000102725e50 _ZNK2DB2PS2V313PageDirectory3getERKNSt3__16vectorIyNS3_9allocatorIyEEEERKNS3_10shared_ptrINS_19PageStorageSnapshotEEE + 92\n6. 6   page_stress_testing
0x00000001027526f0 _ZN2DB2PS2V315PageStorageImpl4readERKNSt3__16vectorIyNS3_9allocatorIyEEEERKNS3_8functionIFvyRKNS_4PageEEEERKNS3_10shared_ptrINS_11ReadLimiterEEENSI_INS_19PageStorageSnapshotEEE + 140\n7. 7   page_stress_testing
0x00000001024750d4 _ZN8PSReader7runImplEv + 180\n8. 8   page_stress_testing
0x0000000102472594 _ZN10PSRunnable3runEv + 192\n9. 9   page_stress_testing
0x0000000102af4edc _ZN4Poco12PooledThread3runEv + 108\n10. 10  page_stress_testing
0x0000000102aeaca4 _ZN4Poco12_GLOBAL__N_114RunnableHolder3runEv + 40\n11. 11  page_stress_testing
0x0000000102ae91a0 _ZN4Poco10ThreadImpl13runnableEntryEPv + 120\n12. 12  page_stress_testing
0x0000000102aed448 _ZNSt3__1L8__invokeIPFPvS1_EJPN4Poco10ThreadImplEEEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOS7_DpOS8_ + 56\n13. 13  page_stress_testing
0x0000000102aed38c _ZNSt3__1L16__thread_executeINS_10unique_ptrINS_15__thread_structENS_14default_deleteIS2_EEEEPFPvS6_EJPN4Poco10ThreadImplEEJLm2EEEEvRNS_5tupleIJT_T0_DpT1_EEENS_15__tuple_indicesIJXspT2_EEEE + 68\n14. 14  page_stress_testing
0x0000000102aec9f8 _ZNSt3__1L14__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEEPFPvS7_EPN4Poco10ThreadImplEEEEEES7_S7_ + 116\n15. 15  libsystem_pthread.dylib
0x000000018604efd4 _pthread_start + 320\n16. 16  libsystem_pthread.dylib
0x0000000186049d3c thread_start + 8\n"] [thread_id=9]

But before walstore merged. i won't get this error.

When i running page storage stress(need change code to enable restore) test with this cmd :

./dbms/src/Storages/Page/page_stress_testing -M 0 -I true -V 3

after GC always got error:

[2022/01/27 15:08:30.924 +08:00] [ERROR] [<unknown>] ["void PSGc::doGcOnce():Code: 49, e.displayText() = DB::Exception: Write failed, FD is closed which [path=./stress/blobfile_3], BlobFile should also be closed, e.what() = DB::Exception, Stack trace:\n\n0. 0   page_stress_testing
0x0000000100e5262c _ZN10StackTraceC2Ev + 36\n1. 1   page_stress_testing
0x0000000100e5266c _ZN10StackTraceC1Ev + 32\n2. 2   page_stress_testing
0x0000000100ab2e88 _ZN2DB9ExceptionC2ERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEi + 88\n3. 3   page_stress_testing
0x0000000100aaeff4 _ZN2DB9ExceptionC1ERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEi + 48\n4. 4   page_stress_testing
0x0000000100d25ec0 _ZN2DB2PS2V38BlobFile5writeEPcmmRKNSt3__110shared_ptrINS_12WriteLimiterEEE + 408\n5. 5   page_stress_testing
0x0000000100d313b4 _ZZN2DB2PS2V39BlobStore2gcERNSt3__13mapIjNS3_6vectorINS3_4pairIyNS5_INS6_INS1_15PageVersionTypeENS1_11PageEntryV3EEENS3_9allocatorIS9_EEEEEENSA_ISD_EEEENS3_4lessIjEENSA_INS6_IKjSF_EEEEEERKyRKNS3_10shared_ptrINS_12WriteLimiterEEERKNSP_INS_11ReadLimiterEEEENK3$_7clERSI_PcSO_SO_ + 548\n6. 6   page_stress_testing
0x0000000100d30964 _ZN2DB2PS2V39BlobStore2gcERNSt3__13mapIjNS3_6vectorINS3_4pairIyNS5_INS6_INS1_15PageVersionTypeENS1_11PageEntryV3EEENS3_9allocatorIS9_EEEEEENSA_ISD_EEEENS3_4lessIjEENSA_INS6_IKjSF_EEEEEERKyRKNS3_10shared_ptrINS_12WriteLimiterEEERKNSP_INS_11ReadLimiterEEE + 1532\n7. 7   page_stress_testing
0x0000000100d8f380 _ZN2DB2PS2V315PageStorageImpl2gcEbRKNSt3__110shared_ptrINS_12WriteLimiterEEERKNS4_INS_11ReadLimiterEEE + 892\n8. 8   page_stress_testing
0x0000000100aa69b8 _ZN4PSGc8doGcOnceEv + 220\n9. 9   page_stress_testing
0x0000000100aa6d70 _ZN4PSGc6onTimeERN4Poco5TimerE + 28\n10. 10  page_stress_testing
0x0000000100aab5b4 _ZNK4Poco13TimerCallbackI4PSGcE6invokeERNS_5TimerE + 100\n11. 11  page_stress_testing
0x0000000101137ed4 _ZN4Poco5Timer3runEv + 332\n12. 12  page_stress_testing
0x0000000101130edc _ZN4Poco12PooledThread3runEv + 108\n13. 13  page_stress_testing
0x0000000101126ca4 _ZN4Poco12_GLOBAL__N_114RunnableHolder3runEv + 40\n14. 14  page_stress_testing
0x00000001011251a0 _ZN4Poco10ThreadImpl13runnableEntryEPv + 120\n15. 15  page_stress_testing
0x0000000101129448 _ZNSt3__1L8__invokeIPFPvS1_EJPN4Poco10ThreadImplEEEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOS7_DpOS8_ + 56\n16. 16  page_stress_testing
0x000000010112938c _ZNSt3__1L16__thread_executeINS_10unique_ptrINS_15__thread_structENS_14default_deleteIS2_EEEEPFPvS6_EJPN4Poco10ThreadImplEEJLm2EEEEvRNS_5tupleIJT_T0_DpT1_EEENS_15__tuple_indicesIJXspT2_EEEE + 68\n17. 17  page_stress_testing
0x00000001011289f8 _ZNSt3__1L14__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEEPFPvS7_EPN4Poco10ThreadImplEEEEEES7_S7_ + 116\n18. 18  libsystem_pthread.dylib
0x000000018604efd4 _pthread_start + 320\n19. 19  libsystem_pthread.dylib
0x0000000186049d3c thread_start + 8\n"] [thread_id=20]

or truncate error.

I'm guessing it's because when I removed the blob file, the cache wasn't removed.

@jiaqizho
Copy link
Contributor

jiaqizho commented Feb 9, 2022

About Multi-path iops balancing

There is a small problem with the current test version: BlobStore && WALStore used the same default path.Then every time WALStore does GC scan will get a lot of ignore message.

So, we should make sure that the path is not selected by both components at the same time?

@jiaqizho
Copy link
Contributor

[2022/02/10 16:59:30.000 +08:00] [INFO] [DeltaMergeStore.cpp:195] ["DeltaMergeStore[db_3.t_11]:Restore DeltaMerge Store start [db_3.t_11]"] [thread_id=8]
[2022/02/10 16:59:30.001 +08:00] [INFO] [WALStore.cpp:124] ["WALStore:Creating log file for writing [fullname=/data20/jiaqizho/tidb-data-jiaqizho/tiflash-50117/data/t_11/log/log_6_0]"] [thread_id=8]
[2022/02/10 16:59:30.001 +08:00] [INFO] [WALStore.cpp:124] ["WALStore:Creating log file for writing [fullname=/data20/jiaqizho/tidb-data-jiaqizho/tiflash-50117/data/t_11/data/log_6_0]"] [thread_id=8]
[2022/02/10 16:59:30.024 +08:00] [ERROR] [<unknown>] ["DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context&, bool, const String&, const String&, const ColumnDefines&, const DB::DM::ColumnDefine&, bool, size_t, const DB::DM::DeltaMergeStore::Settings&):Code: 49, e.displayText() = DB::Exception: Mark space out of the limit space.[type=STD Map] [block=0], [size=0], e.what() = DB::Exception, Stack trace:\n\n0. bin/tiflash/tiflash(StackTrace::StackTrace()+0x16) [0x36d32b6]\n1. bin/tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x26) [0x36c6e36]\n2. bin/tiflash/tiflash(DB::PS::V3::SpaceMap::markUsed(unsigned long, unsigned long)+0x171) [0x7bee271]\n3. bin/tiflash/tiflash(DB::PS::V3::BlobStore::BlobStats::BlobStat::restoreSpaceMap(unsigned long, unsigned long)+0x22) [0x7bd8d02]\n4. bin/tiflash/tiflash(DB::PS::V3::BlobStore::BlobStats::restore(DB::PS::V3::CollapsingPageDirectory const&)+0x4c) [0x7bda02c]\n5. bin/tiflash/tiflash(DB::PS::V3::PageStorageImpl::restore()+0x5ae) [0x7a708ce]\n6. bin/tiflash/tiflash(DB::DM::StoragePool::restore()+0x1f) [0x768044f]\n7. bin/tiflash/tiflash(DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<DB::DM::ColumnDefine, std::allocator<DB::DM::ColumnDefine> > const&, DB::DM::ColumnDefine const&, bool, unsigned long, DB::DM::DeltaMergeStore::Settings const&)+0x970) [0x7628580]\n8. bin/tiflash/tiflash(DB::StorageDeltaMerge::getAndMaybeInitStore()+0x137) [0x75e20c7]\n9. bin/tiflash/tiflash(DB::AsynchronousMetrics::update()+0x517) [0x7475147]\n10. bin/tiflash/tiflash(DB::AsynchronousMetrics::run()+0x6a) [0x747616a]\n11. bin/tiflash/tiflash() [0x8972c1f]\n12. /lib64/libpthread.so.0(+0x6e63) [0x7f3e23292e63]\n13. /lib64/libc.so.6(clone+0x6d) [0x7f3e22cc755d]\n"] [thread_id=8] 

After I enable v3 into a cluster. Then I clean up all data. And begin use br to restore TPCH 100.

Then got the information.also the WAL log will got 0 bytes and keep rolling..

[root@172-16-5-85 temp]$ll /xxxx/tiflash/data/t_11/log/
total 0
-rw-r--r-- 1 root root 0 Feb 10 16:55 log_1_0
-rw-r--r-- 1 root root 0 Feb 10 17:03 log_10_0
-rw-r--r-- 1 root root 0 Feb 10 17:04 log_11_0
-rw-r--r-- 1 root root 0 Feb 10 17:05 log_12_0
-rw-r--r-- 1 root root 0 Feb 10 16:55 log_2_0
-rw-r--r-- 1 root root 0 Feb 10 16:56 log_3_0
-rw-r--r-- 1 root root 0 Feb 10 16:57 log_4_0
-rw-r--r-- 1 root root 0 Feb 10 16:58 log_5_0
-rw-r--r-- 1 root root 0 Feb 10 16:59 log_6_0
-rw-r--r-- 1 root root 0 Feb 10 17:00 log_7_0
-rw-r--r-- 1 root root 0 Feb 10 17:01 log_8_0
-rw-r--r-- 1 root root 0 Feb 10 17:02 log_9_0
[root@172-16-5-85 temp]$ll /xxxxxxxx/tiflash/data/t_11/data/
total 4
-rw-r--r-- 1 root root 80 Feb 10 16:55 log_1_0
-rw-r--r-- 1 root root  0 Feb 10 17:03 log_10_0
-rw-r--r-- 1 root root  0 Feb 10 17:04 log_11_0
-rw-r--r-- 1 root root  0 Feb 10 17:05 log_12_0
-rw-r--r-- 1 root root  0 Feb 10 16:55 log_2_0
-rw-r--r-- 1 root root  0 Feb 10 16:56 log_3_0
-rw-r--r-- 1 root root  0 Feb 10 16:57 log_4_0
-rw-r--r-- 1 root root  0 Feb 10 16:58 log_5_0
-rw-r--r-- 1 root root  0 Feb 10 16:59 log_6_0
-rw-r--r-- 1 root root  0 Feb 10 17:00 log_7_0
-rw-r--r-- 1 root root  0 Feb 10 17:01 log_8_0
-rw-r--r-- 1 root root  0 Feb 10 17:02 log_9_0

@JaySon-Huang

@jiaqizho
Copy link
Contributor

[2022/02/10 17:23:30.001 +08:00] [ERROR] [<unknown>] [BaseDaemon:########################################] [thread_id=11]
[2022/02/10 17:23:30.001 +08:00] [ERROR] [<unknown>] ["BaseDaemon:(from thread 10) Received signal Segmentation fault (11)."] [thread_id=11]
[2022/02/10 17:23:30.001 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Address: NULL pointer."] [thread_id=11]
[2022/02/10 17:23:30.001 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Access: read."] [thread_id=11]
[2022/02/10 17:23:30.001 +08:00] [ERROR] [<unknown>] ["BaseDaemon:Unknown si_code."] [thread_id=11]
[2022/02/10 17:23:30.016 +08:00] [ERROR] [<unknown>] ["BaseDaemon:0. bin/tiflash/tiflash() [0x37de991]"] [thread_id=11]
[2022/02/10 17:23:30.016 +08:00] [ERROR] [<unknown>] ["BaseDaemon:1. /lib64/libpthread.so.0(+0xf590) [0x7ff2ee63b590]"] [thread_id=11]
[2022/02/10 17:23:30.016 +08:00] [ERROR] [<unknown>] ["BaseDaemon:2. bin/tiflash/tiflash(DB::PS::V2::PageEntriesView::numPages() const+0x216) [0x77b87c6]"] [thread_id=11]
[2022/02/10 17:23:30.016 +08:00] [ERROR] [<unknown>] ["BaseDaemon:3. bin/tiflash/tiflash(DB::DM::DeltaMergeStore::getStat()+0x6f7) [0x7614637]"] [thread_id=11]
[2022/02/10 17:23:30.016 +08:00] [ERROR] [<unknown>] ["BaseDaemon:4. bin/tiflash/tiflash(DB::AsynchronousMetrics::update()+0x522) [0x7475152]"] [thread_id=11]
[2022/02/10 17:23:30.016 +08:00] [ERROR] [<unknown>] ["BaseDaemon:5. bin/tiflash/tiflash(DB::AsynchronousMetrics::run()+0x6a) [0x747616a]"] [thread_id=11]
[2022/02/10 17:23:30.016 +08:00] [ERROR] [<unknown>] ["BaseDaemon:6. bin/tiflash/tiflash() [0x8972c1f]"] [thread_id=11]
[2022/02/10 17:23:30.016 +08:00] [ERROR] [<unknown>] ["BaseDaemon:7. /lib64/libpthread.so.0(+0x6e63) [0x7ff2ee632e63]"] [thread_id=11]

after simply fixed #3594 (comment)

Then I meet nullptr problem.

it was caused by force used V2 snapshotptr in DeltaMergeStore.cpp. Or we can't get some stats in DT.

ti-chi-bot pushed a commit that referenced this issue May 13, 2022
ti-chi-bot pushed a commit that referenced this issue May 18, 2022
@flowbehappy flowbehappy changed the title New version of PageStorage to reduce CPU usage and write amplification Introduce PageStorage V3 to reduce CPU usage and write amplification May 24, 2022
ti-chi-bot pushed a commit that referenced this issue May 24, 2022
@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented May 24, 2022

Close as we reach a milestone for it. Further improvement will be tracked in #4989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants