debug print errors #1559

drmingdrmer · 2021-08-21T15:56:35Z

I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

Summary

debug print errors

add log of error for install_snapshot

add more logs to sled tree get

[store] fix: fix flaky test: generic-kv: get-unexpired

[store] refactor: remove span for tree.flush(), record source an backtrace when converting anyhow::Error to ErrorCode

[ci] on failure of test, upload the log and state data file for debugging

[store] refactor: refine tracing log

[test]: refactor: run unittest with RUST_BACKTRACE=full

[store] refactor: elaborate error log

[query] fix dep revision

[store] test: fix the RaftError::Shutdown issue

When running unittest, with heavy cpu load, heartbeat message has chance
to be delayed too much.
When this happens, a follower tries to elect itself and revert the
current leader to a follower state.
This causes the leader to discard every in progress request it has
received. And the client receives an inappropriate error
RaftError::Shuttingdown.

This is actually not a bug.

In such case in practise, a client should re-fetch the latest leader
and retry.

For a unittest, we just extend the timeout to let tests pass happily,
even on a poor CI VM.

[store] refactor: refine tracing

Every test case creates its own span as a root.
This way the logging/tracing belonging to a single test is easy to
grep.
Upgrade to async-raft 0.6.2-alpha.11, with enhanced tracing:
a span is send along with a message through channels.
Thus a complete workflow can be tracked.

Changelog

Bug Fix
Improvement
Build/Testing/CI

Related Issues

databend-bot · 2021-08-21T15:56:38Z

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

databend-bot · 2021-08-21T15:56:39Z

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

- Every test case creates its own span as a root. This way the logging/tracing belonging to a single test is easy to grep. - Upgrade to async-raft 0.6.2-alpha.11, with enhanced tracing: a span is send along with a message through channels. Thus a complete workflow can be tracked.

When running unittest, with heavy cpu load, heartbeat message has chance to be delayed too much. When this happens, a follower tries to elect itself and revert the current leader to a follower state. This causes the leader to discard every in progress request it has received. And the client receives an **inappropriate** error `RaftError::Shuttingdown`. This is actually not a bug. In such case in practise, a client should re-fetch the latest leader and retry. For a unittest, we just extend the timeout to let tests pass happily, even on a poor CI VM.

…ging

…trace when converting anyhow::Error to ErrorCode

codecov-commenter · 2021-08-21T16:21:02Z

Codecov Report

Merging #1559 (78baf22) into master (8606f17) will increase coverage by 0%.
The diff coverage is 94%.

@@           Coverage Diff           @@
##           master   #1559    +/-   ##
=======================================
  Coverage      73%     73%            
=======================================
  Files         535     535            
  Lines       32862   32997   +135     
=======================================
+ Hits        24076   24308   +232     
+ Misses       8786    8689    -97

Impacted Files	Coverage Δ
store/src/meta_service/sled_tree.rs	`91% <47%> (-4%)`	⬇️
store/src/meta_service/raftmeta.rs	`93% <85%> (+1%)`	⬆️
common/stoppable/src/stoppable_test.rs	`95% <100%> (ø)`
...c/interpreters/interpreter_database_create_test.rs	`87% <100%> (ø)`
store/src/api/rpc/flight_service_test.rs	`96% <100%> (+<1%)`	⬆️
store/src/api/rpc/tls_flight_service_test.rs	`93% <100%> (+<1%)`	⬆️
store/src/api/rpc_service.rs	`93% <100%> (+5%)`	⬆️
store/src/executor/action_handler_test.rs	`94% <100%> (+<1%)`	⬆️
store/src/meta_service/meta_service_impl.rs	`95% <100%> (+20%)`	⬆️
store/src/meta_service/meta_service_impl_test.rs	`88% <100%> (+<1%)`	⬆️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8606f17...78baf22. Read the comment docs.

drmingdrmer · 2021-08-21T20:25:15Z

Just for test. Closed

drmingdrmer added this to the v0.5 milestone Aug 21, 2021

drmingdrmer added the fusestore label Aug 21, 2021

databend-bot added pr-bugfix this PR patches a bug in codebase pr-build this PR changes build/testing/ci steps pr-improvement labels Aug 21, 2021

drmingdrmer requested review from ariesdevil and dantengsky August 21, 2021 15:56

drmingdrmer added 13 commits August 22, 2021 00:17

[query] fix dep revision

4a739bb

[store] refactor: elaborate error log

4120feb

[test]: refactor: run unittest with RUST_BACKTRACE=full

3bba954

[store] refactor: refine tracing log

4ee53a7

[ci] on failure of test, upload the log and state data file for debug…

ec80ea9

…ging

[store] refactor: remove span for tree.flush(), record source an back…

aac42f7

…trace when converting anyhow::Error to ErrorCode

[store] fix: fix flaky test: generic-kv: get-unexpired

e9b0c82

add more logs to sled tree get

134886a

add log of error for install_snapshot

4501539

debug print errors

047f853

run 10 times

f707491

fix logging deps

78baf22

drmingdrmer force-pushed the f3 branch from 6b15c52 to 78baf22 Compare August 21, 2021 16:31

drmingdrmer closed this Aug 21, 2021

drmingdrmer deleted the f3 branch August 21, 2021 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debug print errors #1559

debug print errors #1559

drmingdrmer commented Aug 21, 2021

databend-bot commented Aug 21, 2021

databend-bot commented Aug 21, 2021

codecov-commenter commented Aug 21, 2021 •

edited

Loading

drmingdrmer commented Aug 21, 2021

debug print errors #1559

debug print errors #1559

Conversation

drmingdrmer commented Aug 21, 2021

Summary

debug print errors

add log of error for install_snapshot

add more logs to sled tree get

[store] fix: fix flaky test: generic-kv: get-unexpired

[store] refactor: remove span for tree.flush(), record source an backtrace when converting anyhow::Error to ErrorCode

[ci] on failure of test, upload the log and state data file for debugging

[store] refactor: refine tracing log

[test]: refactor: run unittest with RUST_BACKTRACE=full

[store] refactor: elaborate error log

[query] fix dep revision

[store] test: fix the RaftError::Shutdown issue

[store] refactor: refine tracing

Changelog

Related Issues

databend-bot commented Aug 21, 2021

databend-bot commented Aug 21, 2021

codecov-commenter commented Aug 21, 2021 • edited Loading

Codecov Report

drmingdrmer commented Aug 21, 2021

codecov-commenter commented Aug 21, 2021 •

edited

Loading