Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphd Crashed down with single big query #5615

Closed
VincentSleepless opened this issue Jun 30, 2023 · 0 comments
Closed

Graphd Crashed down with single big query #5615

VincentSleepless opened this issue Jun 30, 2023 · 0 comments
Assignees
Labels
affects/none PR/issue: this bug affects none version. process/fixed Process of bug severity/none Severity of bug type/bug Type: something is unexpected
Milestone

Comments

@VincentSleepless
Copy link

Please check the FAQ documentation before raising an issue

Describe the bug (required)

Your Environments (required)

  • OS: Linux data-cdh6-test02 3.10.0-957.27.2.el7.x86_64 Parser framework #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Compiler: 无
  • CPU: x86_64
  • Commit id: 3.5.0 released

How To Reproduce(required)

Steps to reproduce the behavior:

  1. change graphd watermark config
    --system_memory_high_watermark_ratio=0.5
    --enable_space_level_metrics=false
    --memory_tracker_limit_ratio=0.5
    --memory_tracker_untracked_reserved_memory_mb=1024
    --memory_tracker_detail_log=true
    --memory_tracker_detail_log_interval_ms=3000
    image

  2. Execute sql
    MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888] RETURN DISTINCT p LIMIT 300

  3. graphd crashed
    `
    /var/log/messages-Jun 30 13:42:23 kernel: [ 7847] 0 7847 53675 278 58 0 0 sudo
    /var/log/messages-Jun 30 13:42:23 kernel: [ 7848] 0 7848 28919 159 13 0 0 bash
    /var/log/messages-Jun 30 13:42:23 kernel: [ 8263] 0 8263 4194482 3629679 7166 0 0 nebula-graphd
    /var/log/messages-Jun 30 13:42:23 kernel: [ 8376] 0 8376 27024 29 9 0 0 tail
    /var/log/messages-Jun 30 13:42:23 kernel: Out of memory: Kill process 8263 (nebula-graphd) score 882 or sacrifice child
    /var/log/messages:Jun 30 13:42:23 kernel: Killed process 8263 (nebula-graphd) total-vm:16777928kB, anon-rss:14518716kB, file-rss:0kB, shmem-rss:0kB
    /var/log/messages-Jun 30 13:42:23 kernel: AliYunDunUpdate invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
    /var/log/messages-Jun 30 13:42:23 kernel: AliYunDunUpdate cpuset=/ mems_allowed=0

we found oom killed process!

Expected behavior

when a query need a lot of memory in query steps! stop it!

Additional context

we do not found any information in error log;
the graphd info details here
20230630 13:41:51.127751 8315 MemoryUtils.cpp:227] sys:1.434GiB/15.250GiB 9.41% usr:31.000MiB/7.125GiB 0.42% I20230630 13:41:54.127897 8315 MemoryUtils.cpp:227] sys:1.434GiB/15.250GiB 9.41% usr:31.000MiB/7.125GiB 0.42% I20230630 13:41:57.129904 8315 MemoryUtils.cpp:227] sys:1.434GiB/15.250GiB 9.40% usr:31.000MiB/7.125GiB 0.42% I20230630 13:42:00.128515 8315 MemoryUtils.cpp:227] sys:1.437GiB/15.250GiB 9.42% usr:31.000MiB/7.125GiB 0.42% I20230630 13:42:00.460376 8313 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0 I20230630 13:42:00.460444 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:00.460459 8292 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559 I20230630 13:42:00.461441 8292 MetaClient.cpp:2680] Metad last update time: 1688024786386 I20230630 13:42:02.843515 8282 GraphService.cpp:77] Authenticating user root from 192.168.28.30:53815 I20230630 13:42:02.843633 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:02.843653 8312 MetaClient.cpp:730] Send request to meta 172.20.221.5:9559 I20230630 13:42:02.844628 8282 GraphSessionManager.cpp:139] Create session id: 1688103841596307, for user: root I20230630 13:42:02.844677 8282 GraphService.cpp:111] Create session doFinish I20230630 13:42:02.856063 8282 GraphSessionManager.cpp:40] Find session from cache: 1688103841596307 I20230630 13:42:02.856137 8283 ClientSession.cpp:43] Add query: USE data_asset_10022 epId: 0 I20230630 13:42:02.856153 8283 QueryInstance.cpp:80] Parsing query: USE data_asset_10022; I20230630 13:42:02.856284 8283 Symbols.cpp:48] New variable for: __Start_0 I20230630 13:42:02.856295 8283 PlanNode.cpp:27] New variable: __Start_0 I20230630 13:42:02.856319 8283 Symbols.cpp:48] New variable for: __RegisterSpaceToSession_1 I20230630 13:42:02.856325 8283 PlanNode.cpp:27] New variable: __RegisterSpaceToSession_1 I20230630 13:42:02.856338 8283 Validator.cpp:409] root: RegisterSpaceToSession tail: Start I20230630 13:42:02.856627 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:02.856642 8292 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559 I20230630 13:42:02.856972 8283 SwitchSpaceExecutor.cpp:45] Graph switched to [data_asset_10022,](url) space id: 127 I20230630 13:42:02.856995 8283 QueryInstance.cpp:128] Finish query: USE data_asset_10022; I20230630 13:42:02.857013 8283 ClientSession.cpp:52] Delete query, epId: 0 I20230630 13:42:02.868083 8283 GraphSessionManager.cpp:40] Find session from cache: 1688103841596307 I20230630 13:42:02.868115 8283 ClientSession.cpp:43] Add query: MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888] RETURN DISTINCT p LIMIT 300, epId: 1 I20230630 13:42:02.868131 8283 QueryInstance.cpp:80] Parsing query: MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888] RETURN DISTINCT p LIMIT 300 I20230630 13:42:02.868371 8283 Symbols.cpp:48] New variable for: __Start_0 I20230630 13:42:02.868379 8283 PlanNode.cpp:27] New variable: __Start_0 I20230630 13:42:02.868388 8283 Validator.cpp:350] Space chosen, name: data_asset_10022 id: 127 I20230630 13:42:02.868533 8283 Symbols.cpp:48] New variable for: __VAR_0 I20230630 13:42:02.868541 8283 AnonVarGenerator.h:28] Build anon var: __VAR_0 I20230630 13:42:02.868553 8283 Symbols.cpp:48] New variable for: __PassThrough_1 I20230630 13:42:02.868558 8283 PlanNode.cpp:27] New variable: __PassThrough_1 I20230630 13:42:02.868566 8283 Symbols.cpp:48] New variable for: __Dedup_2 I20230630 13:42:02.868570 8283 PlanNode.cpp:27] New variable: __Dedup_2 I20230630 13:42:02.868579 8283 MatchPathPlanner.cpp:126] Find starts: 0, Pattern has 1 edges, root: __Dedup_2, colNames: _vid I20230630 13:42:02.868587 8283 Symbols.cpp:48] New variable for: __Start_3 I20230630 13:42:02.868590 8283 PlanNode.cpp:27] New variable: __Start_3 I20230630 13:42:02.868599 8283 Symbols.cpp:48] New variable for: __Traverse_4 I20230630 13:42:02.868604 8283 PlanNode.cpp:27] New variable: __Traverse_4 I20230630 13:42:02.868779 8283 Symbols.cpp:48] New variable for: __AppendVertices_5 I20230630 13:42:02.868788 8283 PlanNode.cpp:27] New variable: __AppendVertices_5 I20230630 13:42:02.868871 8283 Symbols.cpp:48] New variable for: __Project_6 I20230630 13:42:02.868876 8283 PlanNode.cpp:27] New variable: __Project_6 I20230630 13:42:02.868893 8283 Symbols.cpp:48] New variable for: __Project_7 I20230630 13:42:02.868897 8283 PlanNode.cpp:27] New variable: __Project_7 I20230630 13:42:02.868913 8283 Symbols.cpp:48] New variable for: __Dedup_8 I20230630 13:42:02.868917 8283 PlanNode.cpp:27] New variable: __Dedup_8 I20230630 13:42:02.868925 8283 Symbols.cpp:48] New variable for: __Limit_9 I20230630 13:42:02.868929 8283 PlanNode.cpp:27] New variable: __Limit_9 I20230630 13:42:02.868935 8283 ReturnClausePlanner.cpp:52] return root: __Limit_9 colNames: p I20230630 13:42:02.868942 8283 MatchPlanner.cpp:172] root(Limit_9): __Limit_9, tail(Start_3): __Start_3 I20230630 13:42:02.868948 8283 Validator.cpp:409] root: Limit tail: Start I20230630 13:42:02.868955 8283 Validator.cpp:409] root: Limit tail: Start I20230630 13:42:02.869010 8283 Symbols.cpp:48] New variable for: __Project_10 I20230630 13:42:02.869016 8283 PlanNode.cpp:27] New variable: __Project_10 I20230630 13:42:02.869575 8312 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.57":9779, trying to create one I20230630 13:42:02.869596 8312 ThriftClientManager-inl.h:74] Connecting to "172.20.221.57":9779 for 1 times I20230630 13:42:02.873762 8292 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.59":9779, trying to create one I20230630 13:42:02.873785 8292 ThriftClientManager-inl.h:74] Connecting to "172.20.221.59":9779 for 1 times I20230630 13:42:02.874086 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.874369 8292 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.58":9779, trying to create one I20230630 13:42:02.874384 8292 ThriftClientManager-inl.h:74] Connecting to "172.20.221.58":9779 for 2 times I20230630 13:42:02.913040 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.913278 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779 I20230630 13:42:02.913506 8312 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.58":9779, trying to create one I20230630 13:42:02.913524 8312 ThriftClientManager-inl.h:74] Connecting to "172.20.221.58":9779 for 2 times I20230630 13:42:02.937062 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779 I20230630 13:42:02.937290 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.937518 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779 I20230630 13:42:02.961143 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.961340 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779 I20230630 13:42:02.961562 8312 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.59":9779, trying to create one I20230630 13:42:02.961585 8312 ThriftClientManager-inl.h:74] Connecting to "172.20.221.59":9779 for 3 times I20230630 13:42:02.975271 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.975541 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779 I20230630 13:42:02.975596 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779 I20230630 13:42:02.979934 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779 I20230630 13:42:02.980085 8292 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.57":9779, trying to create one I20230630 13:42:02.980099 8292 ThriftClientManager-inl.h:74] Connecting to "172.20.221.57":9779 for 3 times I20230630 13:42:02.980264 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779 I20230630 13:42:03.128459 8315 MemoryUtils.cpp:227] sys:1.539GiB/15.250GiB 10.09% usr:129.000MiB/7.125GiB 1.77% I20230630 13:42:06.127883 8315 MemoryUtils.cpp:227] sys:3.634GiB/15.250GiB 23.83% usr:2.183GiB/7.125GiB 30.63% I20230630 13:42:09.127878 8315 MemoryUtils.cpp:227] sys:5.809GiB/15.250GiB 38.09% usr:4.315GiB/7.125GiB 60.57% I20230630 13:42:10.471643 8313 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0 I20230630 13:42:10.471740 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:10.471755 8292 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559 I20230630 13:42:10.472738 8292 MetaClient.cpp:2680] Metad last update time: 1688024786386 I20230630 13:42:12.128513 8315 MemoryUtils.cpp:227] sys:8.024GiB/15.250GiB 52.62% usr:6.502GiB/7.125GiB 91.25% I20230630 13:42:15.128669 8315 MemoryUtils.cpp:227] sys:10.238GiB/15.250GiB 67.13% usr:8.684GiB/7.125GiB 121.87% I20230630 13:42:18.129509 8315 MemoryUtils.cpp:227] sys:12.467GiB/15.250GiB 81.75% usr:10.882GiB/7.125GiB 152.72% I20230630 13:42:20.483639 8313 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0 I20230630 13:42:20.483723 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:20.483740 8312 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559 I20230630 13:42:20.485126 8312 MetaClient.cpp:2680] Metad last update time: 1688024786386 I20230630 13:42:21.128391 8315 MemoryUtils.cpp:227] sys:14.712GiB/15.250GiB 96.47% usr:13.093GiB/7.125GiB 183.76%
the dashbord memory info pic
image

nebula disscus url :
https://discuss.nebula-graph.com.cn/t/topic/13424/15

@VincentSleepless VincentSleepless added the type/bug Type: something is unexpected label Jun 30, 2023
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Jun 30, 2023
@Sophie-Xie Sophie-Xie added this to the v3.6.0 milestone Jul 3, 2023
@github-actions github-actions bot added the process/fixed Process of bug label Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/none PR/issue: this bug affects none version. process/fixed Process of bug severity/none Severity of bug type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

3 participants