Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug print errors #1559

Closed
wants to merge 14 commits into from
Closed

debug print errors #1559

wants to merge 14 commits into from

Conversation

drmingdrmer
Copy link
Member

I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

Summary

debug print errors
add log of error for install_snapshot
add more logs to sled tree get
[store] fix: fix flaky test: generic-kv: get-unexpired
[store] refactor: remove span for tree.flush(), record source an backtrace when converting anyhow::Error to ErrorCode
[ci] on failure of test, upload the log and state data file for debugging
[store] refactor: refine tracing log
[test]: refactor: run unittest with RUST_BACKTRACE=full
[store] refactor: elaborate error log
[query] fix dep revision
[store] test: fix the RaftError::Shutdown issue

When running unittest, with heavy cpu load, heartbeat message has chance
to be delayed too much.
When this happens, a follower tries to elect itself and revert the
current leader to a follower state.
This causes the leader to discard every in progress request it has
received. And the client receives an inappropriate error
RaftError::Shuttingdown.

This is actually not a bug.

In such case in practise, a client should re-fetch the latest leader
and retry.

For a unittest, we just extend the timeout to let tests pass happily,
even on a poor CI VM.

[store] refactor: refine tracing
  • Every test case creates its own span as a root.
    This way the logging/tracing belonging to a single test is easy to
    grep.

  • Upgrade to async-raft 0.6.2-alpha.11, with enhanced tracing:
    a span is send along with a message through channels.
    Thus a complete workflow can be tracked.

Changelog

  • Bug Fix

  • Improvement

  • Build/Testing/CI

Related Issues

@drmingdrmer drmingdrmer added this to the v0.5 milestone Aug 21, 2021
@databend-bot databend-bot added pr-bugfix this PR patches a bug in codebase pr-build this PR changes build/testing/ci steps pr-improvement labels Aug 21, 2021
@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

1 similar comment
@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

- Every test case creates its own span as a root.
  This way the logging/tracing belonging to a single test is easy to
  grep.

- Upgrade to async-raft 0.6.2-alpha.11, with enhanced tracing:
  a span is send along with a message through channels.
  Thus a complete workflow can be tracked.
When running unittest, with heavy cpu load, heartbeat message has chance
to be delayed too much.
When this happens, a follower tries to elect itself and revert the
current leader to a follower state.
This causes the leader to discard every in progress request it has
received. And the client receives an **inappropriate** error
`RaftError::Shuttingdown`.

This is actually not a bug.

In such case in practise, a client should re-fetch the latest leader
and retry.

For a unittest, we just extend the timeout to let tests pass happily,
even on a poor CI VM.
…trace when converting anyhow::Error to ErrorCode
@codecov-commenter
Copy link

codecov-commenter commented Aug 21, 2021

Codecov Report

Merging #1559 (78baf22) into master (8606f17) will increase coverage by 0%.
The diff coverage is 94%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #1559    +/-   ##
=======================================
  Coverage      73%     73%            
=======================================
  Files         535     535            
  Lines       32862   32997   +135     
=======================================
+ Hits        24076   24308   +232     
+ Misses       8786    8689    -97     
Impacted Files Coverage Δ
store/src/meta_service/sled_tree.rs 91% <47%> (-4%) ⬇️
store/src/meta_service/raftmeta.rs 93% <85%> (+1%) ⬆️
common/stoppable/src/stoppable_test.rs 95% <100%> (ø)
...c/interpreters/interpreter_database_create_test.rs 87% <100%> (ø)
store/src/api/rpc/flight_service_test.rs 96% <100%> (+<1%) ⬆️
store/src/api/rpc/tls_flight_service_test.rs 93% <100%> (+<1%) ⬆️
store/src/api/rpc_service.rs 93% <100%> (+5%) ⬆️
store/src/executor/action_handler_test.rs 94% <100%> (+<1%) ⬆️
store/src/meta_service/meta_service_impl.rs 95% <100%> (+20%) ⬆️
store/src/meta_service/meta_service_impl_test.rs 88% <100%> (+<1%) ⬆️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8606f17...78baf22. Read the comment docs.

@drmingdrmer
Copy link
Member Author

Just for test. Closed

@drmingdrmer drmingdrmer deleted the f3 branch August 21, 2021 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase pr-build this PR changes build/testing/ci steps
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants