Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The storaged service keeps sending snapshots and an infinite loop is embedded. #5347

Open
flymysql opened this issue Feb 16, 2023 · 4 comments
Assignees
Labels
affects/none PR/issue: this bug affects none version. need info Solution: need more information (ex. can't reproduce) severity/none Severity of bug type/bug Type: something is unexpected

Comments

@flymysql
Copy link
Contributor

flymysql commented Feb 16, 2023

Please check the FAQ documentation before raising an issue

Describe the bug (required)
nebula version v3.2.1
The storaged log records that the storage keeps synchronizing snapshots but fails. The storage keeps performing operations on the same commitlogid.

this bug is the same as this issue https://discuss.nebula-graph.com.cn/t/topic/11085

 to 10485760, batch size is 1048576
I20230216 10:52:45.735262 3002064 NebulaSnapshotManager.cpp:67] Space 10 Part 41 start send snapshot of commitLogId 72743 commitLogTerm 9, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.735381 3002050 NebulaSnapshotManager.cpp:67] Space 6 Part 94 start send snapshot of commitLogId 67428 commitLogTerm 58, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.736137 3002053 NebulaSnapshotManager.cpp:67] Space 6 Part 96 start send snapshot of commitLogId 66841 commitLogTerm 97, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.736299 3002063 NebulaSnapshotManager.cpp:67] Space 77 Part 61 start send snapshot of commitLogId 3177856 commitLogTerm 5, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.736881 3002064 NebulaSnapshotManager.cpp:67] Space 4 Part 10 start send snapshot of commitLogId 69477 commitLogTerm 14, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.737025 3002050 NebulaSnapshotManager.cpp:67] Space 73 Part 13 start send snapshot of commitLogId 3403616 commitLogTerm 7, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.737371 3002053 NebulaSnapshotManager.cpp:67] Space 76 Part 14 start send snapshot of commitLogId 3489696 commitLogTerm 6, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.737464 3002063 NebulaSnapshotManager.cpp:67] Space 6 Part 98 start send snapshot of commitLogId 63176 commitLogTerm 14, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.738029 3002064 NebulaSnapshotManager.cpp:67] Space 75 Part 92 start send snapshot of commitLogId 2041251 commitLogTerm 12, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.738067 3002050 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.738116 3002053 NebulaSnapshotManager.cpp:67] Space 3 Part 41 start send snapshot of commitLogId 67615 commitLogTerm 16, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.738318 3002064 NebulaSnapshotManager.cpp:67] Space 6 Part 81 start send snapshot of commitLogId 62274 commitLogTerm 27, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.738319 3002063 NebulaSnapshotManager.cpp:67] Space 2 Part 32 start send snapshot of commitLogId 68144 commitLogTerm 16, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.738350 3002063 NebulaSnapshotManager.cpp:67] Space 74 Part 20 start send snapshot of commitLogId 3312870 commitLogTerm 105, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.738353 3002050 NebulaSnapshotManager.cpp:67] Space 72 Part 95 start send snapshot of commitLogId 2172354 commitLogTerm 4, rate limited to 10485760, batch size is 1048576
I20230216 10:52:45.738364 3002053 NebulaSnapshotManager.cpp:67] Space 4 Part 48 start send snapshot of commitLogId 67348 commitLogTerm 29, rate limited to 10485760, batch size is 1048576

[2023-02-16 11:04:46] (10.97.162.200@tysearch)> tail -f logs/nebula-storaged.INFO | grep 67170
I20230216 11:06:36.106393 3002050 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:36.144593 3002064 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:36.185752 3002064 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:36.227294 3002063 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:36.274462 3002063 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:36.316789 3002050 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:40.240387 3002053 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:40.255889 3002053 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:40.297564 3002050 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:40.337299 3002053 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576
I20230216 11:06:40.378110 3002050 NebulaSnapshotManager.cpp:67] Space 8 Part 32 start send snapshot of commitLogId 67170 commitLogTerm 8, rate limited to 10485760, batch size is 1048576

Your Environments (required)

  • os
Linux host-7-219-10-133 3.10.0-862.14.1.1.h224.eulerosv2r7.x86_64 #1 SMP Tue Feb 12 00:00:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • g++
g++ (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  • lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
Stepping:              7
CPU MHz:               3000.000
BogoMIPS:              6000.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              30976K
NUMA node0 CPU(s):     0-15
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
commit id: https://github.com/vesoft-inc/nebula/commit/2e938c767cce4507ee0ea767e8dd2bf7bd1711ca
  • OS: uname -a
  • Compiler: g++ --version or clang++ --version
  • CPU: lscpu
  • Commit id (e.g. a3ffc7d8)

How To Reproduce(required)

Steps to reproduce the behavior:

  1. Step 1
  2. Step 2
  3. Step 3

Expected behavior

Additional context

@flymysql flymysql added the type/bug Type: something is unexpected label Feb 16, 2023
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Feb 16, 2023
@QingZ11
Copy link
Contributor

QingZ11 commented Feb 16, 2023

what's your nebulagraph core version?

@flymysql
Copy link
Contributor Author

what's your nebulagraph core version?

v3.2.1

@github-actions github-actions bot added the process/fixed Process of bug label Feb 16, 2023
@flymysql flymysql reopened this Feb 16, 2023
@github-actions github-actions bot removed the process/fixed Process of bug label Feb 16, 2023
@Sophie-Xie Sophie-Xie added this to the v3.5.0 milestone Feb 27, 2023
@Sophie-Xie Sophie-Xie added the duplicate Solution: this issue or pull request already exists label Mar 6, 2023
@critical27
Copy link
Contributor

It seems some of your node is lagged from other nodes, which triggers snapshot? It may take quite a while to catch up the data, since all data of a partition need to be transferred

@critical27 critical27 added need info Solution: need more information (ex. can't reproduce) and removed duplicate Solution: this issue or pull request already exists labels Mar 20, 2023
@Sophie-Xie Sophie-Xie removed this from the v3.5.0 milestone Mar 30, 2023
@flymysql
Copy link
Contributor Author

This is triggered when the storage is started. It seems that an infinite loop occurs because the same commitLogId is submitted each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/none PR/issue: this bug affects none version. need info Solution: need more information (ex. can't reproduce) severity/none Severity of bug type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

6 participants