Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix update sessions when leader change happens #5225

Merged
merged 4 commits into from
Jan 11, 2023
Merged

Conversation

Aiee
Copy link
Contributor

@Aiee Aiee commented Jan 9, 2023

What type of PR is this?

  • bug
  • feature
  • enhancement

What problem(s) does this PR solve?

Issue(s) number:

Close https://github.com/vesoft-inc/nebula-ent/issues/2152, maybe relates to https://github.com/vesoft-inc/nebula-ent/issues/2176

Description:

UpdateSessions() should deal with errors like leader change.

How do you solve it?

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Checklist:

Tests:

  • Unit test(positive and negative cases)
  • Function test
  • Performance test
  • N/A

Affects:

  • Documentation affected (Please add the label if documentation needs to be modified.)
  • Incompatibility (If it breaks the compatibility, please describe it and add the label.)
  • If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Release notes:

Please confirm whether to be reflected in release notes and how to describe:

ex. Fixed the bug .....

HarrisChu
HarrisChu previously approved these changes Jan 9, 2023
@shanlai
Copy link

shanlai commented Jan 10, 2023

graph crash when i check the issue
graph info:

I20230110 03:31:20.485074 331532 MetaClient.cpp:3131] Load leader of "192.168.8.202":10015 in 2 space
I20230110 03:31:20.485116 331532 MetaClient.cpp:3131] Load leader of "192.168.8.202":10020 in 2 space
I20230110 03:31:20.485142 331532 MetaClient.cpp:3131] Load leader of "192.168.8.202":10025 in 2 space
I20230110 03:31:20.485155 331532 MetaClient.cpp:3137] Load leader ok
E20230110 03:31:24.075230 331047 MetaClient.cpp:758] Send request to "192.168.8.202":10005, exceed retry limit
E20230110 03:31:24.075510 331047 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20230110 03:31:24.075748 331533 GraphSessionManager.cpp:291] Update sessions failed: 0
E20230110 03:31:24.075820 331533 GraphSessionManager.cpp:260] Update sessions failed: RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
F20230110 03:31:24.075868 331533 StatusOr.h:253] Check failed: ok()
E20230110 03:31:24.498812 331531 MetaClient.cpp:758] Send request to "192.168.8.202":10005, exceed retry limit
E20230110 03:31:24.498855 331531 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20230110 03:31:24.498910 331532 MetaClient.cpp:192] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
(END)

graph core
image

critical27
critical27 previously approved these changes Jan 10, 2023
@Aiee Aiee dismissed stale reviews from critical27 and HarrisChu via 68f75b8 January 10, 2023 04:51
@Aiee Aiee force-pushed the fix-update-session branch from 6bac386 to 68f75b8 Compare January 10, 2023 04:51
LOG(ERROR) << "Update sessions failed: " << resp.status();
return Status::Error("Update sessions failed: %s", resp.status().toString().c_str());
}
DCHECK(resp.ok());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we convince that resp is always valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/vesoft-inc/nebula/pull/5225/files#diff-a9174d4e2b477c826e5e4e23a7bf43ffe74a5853249f4de3aeebf3d95cf70364R286-R288

Here we checked the response from meta service, and this guarantees handleKilledQueries() will receive a valid response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit dependent precondition of one piece of code depends on another piece is not a good design, alghough it do works. Self-contained code is better choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit dependent precondition of one piece of code depends on another piece is not a good design, alghough it do works. Self-contained code is better choice.

Makes sense to me. I'll move the check into the lambda functions.

@codecov-commenter
Copy link

Codecov Report

Base: 76.77% // Head: 78.63% // Increases project coverage by +1.86% 🎉

Coverage data is based on head (68f75b8) compared to base (ff4cb8c).
Patch coverage: 70.67% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5225      +/-   ##
==========================================
+ Coverage   76.77%   78.63%   +1.86%     
==========================================
  Files        1109     1110       +1     
  Lines       82965    82984      +19     
==========================================
+ Hits        63694    65258    +1564     
+ Misses      19271    17726    -1545     
Impacted Files Coverage Δ
src/clients/meta/MetaClient.h 92.30% <ø> (ø)
src/codec/RowReaderWrapper.cpp 79.38% <0.00%> (+1.60%) ⬆️
src/codec/RowWriterV2.cpp 84.90% <0.00%> (+0.64%) ⬆️
src/common/base/Arena.cpp 95.00% <0.00%> (+4.52%) ⬆️
src/common/base/Status.cpp 57.83% <0.00%> (-2.17%) ⬇️
src/common/base/Status.h 89.09% <0.00%> (-3.37%) ⬇️
src/common/datatypes/Value.cpp 74.91% <0.00%> (+0.50%) ⬆️
src/common/datatypes/ValueOps-inl.h 61.09% <ø> (+0.18%) ⬆️
src/common/expression/ArithmeticExpression.cpp 91.66% <0.00%> (+3.66%) ⬆️
src/common/expression/Expression.cpp 52.94% <0.00%> (+0.46%) ⬆️
... and 342 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

yixinglu
yixinglu previously approved these changes Jan 10, 2023
@Aiee
Copy link
Contributor Author

Aiee commented Jan 10, 2023

Please hold on review, there are some issues with the log.

Update:
The SessionManagerProcessor keeps printing log and it has nothing to do with this PR, ready to merge.
This problem only presents in the community version.

I20230110 18:47:13.540433 3401657 HBProcessor.cpp:33] Receive heartbeat from "127.0.0.1":39735, role = STORAGE
I20230110 18:47:13.546798 3401657 SessionManagerProcessor.cpp:134] resp list session: 1673344969031695
I20230110 18:47:13.546813 3401657 SessionManagerProcessor.cpp:134] resp list session: 1672741538884805
I20230110 18:47:13.546815 3401657 SessionManagerProcessor.cpp:134] resp list session: 1672741459642317
I20230110 18:47:13.546818 3401657 SessionManagerProcessor.cpp:134] resp list session: 1672754053596151

@Aiee Aiee requested review from HarrisChu and critical27 January 10, 2023 10:48
@Sophie-Xie Sophie-Xie merged commit ea0155c into master Jan 11, 2023
@Sophie-Xie Sophie-Xie deleted the fix-update-session branch January 11, 2023 03:33
Sophie-Xie pushed a commit that referenced this pull request Jan 28, 2023
* Fix udpate sessions when leader change happens

* Handle errors on the graph side

* Address comments

* Address comments
Sophie-Xie added a commit that referenced this pull request Jan 29, 2023
* optimize match node label (#5176)

* revert strange return (#5183)

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* fix stderr  save error log (#5188)

* fix processor_test timeout (#5180)

* fix processor_test timeout

* ...

Co-authored-by: Yee <2520865+yixinglu@users.noreply.github.com>
Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* fix error code (#5186)

* rename the "test" space to "ngdata". (#5197)

* rename the "test" space to "ngdata".

* add ngdata

* Revise the usages of FATAL, DFATAL, LOG, DLOG. (#5181)

* Revise the usages of FATAL, DFATAL, LOG, DLOG.

* fix.

* revise dfatal.

* Meta upgrade (#5174)

* Meta upgrade
remove all fulltext index when upgrade from V3 to V3_4 because of refacting of
fulltext index

* fix bug

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Fix pattern expression with same edge variable (#5192)

* Fix pattern expression with same edge variable

add tck

fmt

* add tck

* Fix memory leak, remove toss gflag (#5204)

* remove toss gflag

* fix memory leak

* loose wait job finish time

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Add max_sessions_per_ip_per_user to default config file (#5207)

* minor bug for adminTaskManager (#5195)

* modify jobmanager ut (#5175)

* modify jobmanager ut

* add expired ut

* avoid recover expired job

* add ut

* address review

* move status

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Add more match test cases on paths. (#5189)

* improve memtracker, add missed check & remove unnecessary thenError&tryCatch check (#5199)

* [memtracker] check code run with memoery check on all works

refine code

all code memory checked

fix lint

refine code & fix build with gcc+sanitize

* fix build break

* fix lint

* refine code

* remove debug code

* fix test fail build with debug

* fix test fail build with debug

* restore commented test

* minor

* minor

* fix bug (#5214)

* fix bug

* fix bug

Co-authored-by: Doodle <13706157+critical27@users.noreply.github.com>

* handle rpc error task status (#5212)

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* chore: community badges refined (#5202)

* chore: community badges refined

* Update README-CN.md

* Update README-CN.md

* Update README-CN.md

remove sifou and zhihu as aligned with the team

* update linkedin URL

Co-authored-by: Yee <2520865+yixinglu@users.noreply.github.com>

* Fix extend whtie space char. (#5213)

* Fix extend whtie space char.

* Format.

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Add lack tests of no role user. (#5196)

Co-authored-by: Yee <2520865+yixinglu@users.noreply.github.com>

* remove memtracker DLOG (#5224)

* Add tck cases for DDL (#5220)

* more TCK tests for variable pattern match clause (#5215)

* cleanup

* same src/dst for variable length pattern

* variable pattern in where clause

* variable scope tests in path pattern

* More tests

* More tests

Co-authored-by: jimingquan <mingquan.ji@vesoft.com>

* Resumed the evaluation fo vertices in AttributeExpression (UTs included) (#5229)

* add memtracker flags to conf (#5231)

* add memtracker flags to conf

* typo

* refine

* refine

* add balance job type to filter when create backup (#5228)

* add more job type to filter when create backup

* log add job

* add log before acquire snapshot lock

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Fix update sessions when leader change happens (#5225)

* Fix udpate sessions when leader change happens

* Handle errors on the graph side

* Address comments

* Address comments

* fix match step range (#5216)

* use smart pointer change raw pointer

* fix error

* fix test error

* address comment

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Update response message when adding schema historically existed (#5227)

* update the error code and message for checking history schemas

* update tck

* update comment

* change to log error

* fix ddl tck

* increase wait time in schema.feature

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* fix error code (#5233)

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* print memory stats default to false (#5234)

* print memory stats default to false

* update conf

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* fix bug of extract prop expr visitor (#5238)

* forbid invalid prop expr used in cypher (#5242)

* Fix mistake push down limit with skip. (#5241)

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* fix delete fulltext index (#5239)

* fix delete fulltext index

* fix es delete error
1. remove get Rowreader if op is delete
2. delete es data when value is null

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Change the default value of session_reclaim_interval_secs to 60 seconds (#5246)

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Enhance attribute-accessing expression to ensure self-consistency (#5230)

* Revert "Remove all UNKNOWN_PROP as a type of null. (#4907)" (#5149)

This reverts commit aa62416.

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Enhance attribute-accessing expression to ensure self-consistency

Fix tck

Fix parser

small delete

Fix tck

tck fmt

fix ut

fix ut

Fix ut

Fix tck

Delete v.tag.prop check

Fix tck

Skip some tck cases related ngdata

add test case

Co-authored-by: Cheng Xuntao <7731943+xtcyclist@users.noreply.github.com>
Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* fix ft index of fixed string (#5251)

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Add tck test (#5253)

* add allpath test

* add shortest path test case

* add subgraph test case

* add go test case

* add go test case

* Add more session tests (#5256)

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Revert "do not check term for leader info by default para" (#5266)

This reverts commit 593bffc.

* modify ft index default limit size (#5260)

* modify ft index default limit size

* fix test

Co-authored-by: Doodle <13706157+critical27@users.noreply.github.com>

* Test/yield (#5267)

* Add some tests about yield.

* Add more tests.

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Add another cert to test CA don't match. (#5247)

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* fix baton miss reset in StorageJobExecutor (#5269)

* Report errors on where clauses in optional match queries. (#5273)

* [test case] Check DML cases (#5264)

* Check DML cases

* Add chinses char tests

Add more tests

Add mero delete edge tests

* Revert cases

* fix third party version in package.sh (#5281)

The dump_syms tool path should be match with third party version.

* Test/user (#5139)

* Add some tests about user management.

* Add tests about user roles.

* Format.

* Fix tck fixture name.

* Fix step name.

* Change step name.

---------

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* fix https (#5283)

* fix memtracker bugs during stress test on graphd and storaged (#5276)

* fix memtracker bugs during stress test on graphd and storaged

* fix lint

* fix RocksEngine memory leak of raw pointer iter

* add ENABLE_MEMORY_TRACKER build option & support adaptive limit for MemoryTracker

* delete debug log

* refine log

* refine log

* fix build

* refine error log

* print warning if memtracker is off

* fix rocksdb leak by turn off memcheck

* refine synamic-self-adaptive

* fix cmake check

* minor

* minor

* minor

* minor

* minor

* refine double equel compare

---------

Co-authored-by: jimingquan <mingquan.ji@vesoft.com>
Co-authored-by: jie.wang <38901892+jievince@users.noreply.github.com>
Co-authored-by: Harris.Chu <1726587+HarrisChu@users.noreply.github.com>
Co-authored-by: Doodle <13706157+critical27@users.noreply.github.com>
Co-authored-by: Yee <2520865+yixinglu@users.noreply.github.com>
Co-authored-by: canon <87342612+caton-hpg@users.noreply.github.com>
Co-authored-by: Cheng Xuntao <7731943+xtcyclist@users.noreply.github.com>
Co-authored-by: hs.zhang <22708345+cangfengzhs@users.noreply.github.com>
Co-authored-by: kyle.cao <kyle.cao@vesoft.com>
Co-authored-by: Yichen Wang <18348405+Aiee@users.noreply.github.com>
Co-authored-by: liwenhui-soul <38217397+liwenhui-soul@users.noreply.github.com>
Co-authored-by: Alex Xing <90179377+SuperYoko@users.noreply.github.com>
Co-authored-by: codesigner <codesigner.huang@vesoft.com>
Co-authored-by: Wey Gu <weyl.gu@gmail.com>
Co-authored-by: shylock <33566796+Shylock-Hg@users.noreply.github.com>
Co-authored-by: pengwei.song <90180021+pengweisong@users.noreply.github.com>
Co-authored-by: haowen <19355821+wenhaocs@users.noreply.github.com>
Co-authored-by: George <58841610+Shinji-IkariG@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants