-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After workernode restart, cluster started behaving improperly. #15579
Comments
Hi @Tejaswini5327 I will need some more information to help this issue.
From my past experience, 3 etcd members' MVCC revision could have diverged before the issue happened. The compaction request with higher revision was sent to the "slow" member.
|
Hi @chaochn47 ,
Our service provides following configurable parameters: env.service_name.DEFRAGMENT_PERIODIC_INTERVAL which help perform compaction process every 5 minutes with defragmentation activity periodically executing at intervals set through env.service_name.DEFRAGMENT_PERIODIC_INTERVAL. Thanks And Regards |
Hi @chaochn47 ,
ubuntu@sioccesltdir001: TYPE etcd_cluster_version gaugeetcd_cluster_version{cluster_version="3.5"} 1 HELP etcd_debugging_auth_revision The current revision of auth store.TYPE etcd_debugging_auth_revision gaugeetcd_debugging_auth_revision 21 HELP etcd_debugging_disk_backend_commit_rebalance_duration_seconds The latency distributions of commit.rebalance called by bboltdb backend.TYPE etcd_debugging_disk_backend_commit_rebalance_duration_seconds histogrametcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket{le="0.001"} 41715 HELP etcd_debugging_disk_backend_commit_spill_duration_seconds The latency distributions of commit.spill called by bboltdb backend.TYPE etcd_debugging_disk_backend_commit_spill_duration_seconds histogrametcd_debugging_disk_backend_commit_spill_duration_seconds_bucket{le="0.001"} 41709 HELP etcd_debugging_disk_backend_commit_write_duration_seconds The latency distributions of commit.write called by bboltdb backend.TYPE etcd_debugging_disk_backend_commit_write_duration_seconds histogrametcd_debugging_disk_backend_commit_write_duration_seconds_bucket{le="0.001"} 0 HELP etcd_debugging_lease_granted_total The total number of granted leases.TYPE etcd_debugging_lease_granted_total counteretcd_debugging_lease_granted_total 6 HELP etcd_debugging_lease_renewed_total The number of renewed leases seen by the leader.TYPE etcd_debugging_lease_renewed_total counteretcd_debugging_lease_renewed_total 203685 HELP etcd_debugging_lease_revoked_total The total number of revoked leases.TYPE etcd_debugging_lease_revoked_total counteretcd_debugging_lease_revoked_total 6 HELP etcd_debugging_lease_ttl_total Bucketed histogram of lease TTLs.TYPE etcd_debugging_lease_ttl_total histogrametcd_debugging_lease_ttl_total_bucket{le="1"} 0 HELP etcd_debugging_mvcc_compact_revision The revision of the last compaction in store.TYPE etcd_debugging_mvcc_compact_revision gaugeetcd_debugging_mvcc_compact_revision 71559 HELP etcd_debugging_mvcc_current_revision The current revision of store.TYPE etcd_debugging_mvcc_current_revision gaugeetcd_debugging_mvcc_current_revision 71668 HELP etcd_debugging_mvcc_db_compaction_keys_total Total number of db keys compacted.TYPE etcd_debugging_mvcc_db_compaction_keys_total counteretcd_debugging_mvcc_db_compaction_keys_total 42209 HELP etcd_debugging_mvcc_db_compaction_last The unix time of the last db compaction. Resets to 0 on start.TYPE etcd_debugging_mvcc_db_compaction_last gaugeetcd_debugging_mvcc_db_compaction_last 1.680832793e+09 HELP etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds Bucketed histogram of db compaction pause duration.TYPE etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds histogrametcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="1"} 0 HELP etcd_debugging_mvcc_db_compaction_total_duration_milliseconds Bucketed histogram of db compaction total duration.TYPE etcd_debugging_mvcc_db_compaction_total_duration_milliseconds histogrametcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="100"} 4009 HELP etcd_debugging_mvcc_events_total Total number of events sent by this member.TYPE etcd_debugging_mvcc_events_total counteretcd_debugging_mvcc_events_total 0 HELP etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds Bucketed histogram of index compaction pause duration.TYPE etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds histogrametcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="0.5"} 4002 HELP etcd_debugging_mvcc_keys_total Total number of keys.TYPE etcd_debugging_mvcc_keys_total gaugeetcd_debugging_mvcc_keys_total 491 HELP etcd_debugging_mvcc_pending_events_total Total number of pending events to be sent.TYPE etcd_debugging_mvcc_pending_events_total gaugeetcd_debugging_mvcc_pending_events_total 0 HELP etcd_debugging_mvcc_range_total Total number of ranges seen by this member.TYPE etcd_debugging_mvcc_range_total counteretcd_debugging_mvcc_range_total 2.16009e+06 HELP etcd_debugging_mvcc_slow_watcher_total Total number of unsynced slow watchers.TYPE etcd_debugging_mvcc_slow_watcher_total gaugeetcd_debugging_mvcc_slow_watcher_total 0 HELP etcd_debugging_mvcc_total_put_size_in_bytes The total size of put kv pairs seen by this member.TYPE etcd_debugging_mvcc_total_put_size_in_bytes gaugeetcd_debugging_mvcc_total_put_size_in_bytes 4.7429957e+07 HELP etcd_debugging_mvcc_watch_stream_total Total number of watch streams.TYPE etcd_debugging_mvcc_watch_stream_total gaugeetcd_debugging_mvcc_watch_stream_total 2 HELP etcd_debugging_mvcc_watcher_total Total number of watchers.TYPE etcd_debugging_mvcc_watcher_total gaugeetcd_debugging_mvcc_watcher_total 2 HELP etcd_debugging_server_lease_expired_total The total number of expired leases.TYPE etcd_debugging_server_lease_expired_total counteretcd_debugging_server_lease_expired_total 0 HELP etcd_debugging_snap_save_marshalling_duration_seconds The marshalling cost distributions of save called by snapshot.TYPE etcd_debugging_snap_save_marshalling_duration_seconds histogrametcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.001"} 11 HELP etcd_debugging_snap_save_total_duration_seconds The total latency distributions of save called by snapshot.TYPE etcd_debugging_snap_save_total_duration_seconds histogrametcd_debugging_snap_save_total_duration_seconds_bucket{le="0.001"} 0 HELP etcd_debugging_store_expires_total Total number of expired keys.TYPE etcd_debugging_store_expires_total counteretcd_debugging_store_expires_total 0 HELP etcd_debugging_store_reads_total Total number of reads action by (get/getRecursive), local to this member.TYPE etcd_debugging_store_reads_total counteretcd_debugging_store_reads_total{action="get"} 1 HELP etcd_debugging_store_watch_requests_total Total number of incoming watch requests (new or reestablished).TYPE etcd_debugging_store_watch_requests_total counteretcd_debugging_store_watch_requests_total 0 HELP etcd_debugging_store_watchers Count of currently active watchers.TYPE etcd_debugging_store_watchers gaugeetcd_debugging_store_watchers 0 HELP etcd_debugging_store_writes_total Total number of writes (e.g. set/compareAndDelete) seen by this member.TYPE etcd_debugging_store_writes_total counteretcd_debugging_store_writes_total{action="set"} 7 HELP etcd_disk_backend_commit_duration_seconds The latency distributions of commit called by backend.TYPE etcd_disk_backend_commit_duration_seconds histogrametcd_disk_backend_commit_duration_seconds_bucket{le="0.001"} 0 HELP etcd_disk_backend_defrag_duration_seconds The latency distribution of backend defragmentation.TYPE etcd_disk_backend_defrag_duration_seconds histogrametcd_disk_backend_defrag_duration_seconds_bucket{le="0.1"} 328 HELP etcd_disk_backend_snapshot_duration_seconds The latency distribution of backend snapshots.TYPE etcd_disk_backend_snapshot_duration_seconds histogrametcd_disk_backend_snapshot_duration_seconds_bucket{le="0.01"} 0 HELP etcd_disk_defrag_inflight Whether or not defrag is active on the member. 1 means active, 0 means not.TYPE etcd_disk_defrag_inflight gaugeetcd_disk_defrag_inflight 0 HELP etcd_disk_wal_fsync_duration_seconds The latency distributions of fsync called by WAL.TYPE etcd_disk_wal_fsync_duration_seconds histogrametcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 1042 HELP etcd_disk_wal_write_bytes_total Total number of bytes written in WAL.TYPE etcd_disk_wal_write_bytes_total gaugeetcd_disk_wal_write_bytes_total 1.08027696e+08 HELP etcd_grpc_proxy_cache_hits_total Total number of cache hitsTYPE etcd_grpc_proxy_cache_hits_total gaugeetcd_grpc_proxy_cache_hits_total 0 HELP etcd_grpc_proxy_cache_keys_total Total number of keys/ranges cachedTYPE etcd_grpc_proxy_cache_keys_total gaugeetcd_grpc_proxy_cache_keys_total 0 HELP etcd_grpc_proxy_cache_misses_total Total number of cache missesTYPE etcd_grpc_proxy_cache_misses_total gaugeetcd_grpc_proxy_cache_misses_total 0 HELP etcd_grpc_proxy_events_coalescing_total Total number of events coalescingTYPE etcd_grpc_proxy_events_coalescing_total counteretcd_grpc_proxy_events_coalescing_total 0 HELP etcd_grpc_proxy_watchers_coalescing_total Total number of current watchers coalescingTYPE etcd_grpc_proxy_watchers_coalescing_total gaugeetcd_grpc_proxy_watchers_coalescing_total 0 HELP etcd_mvcc_db_open_read_transactions The number of currently open read transactionsTYPE etcd_mvcc_db_open_read_transactions gaugeetcd_mvcc_db_open_read_transactions 1 HELP etcd_mvcc_db_total_size_in_bytes Total size of the underlying database physically allocated in bytes.TYPE etcd_mvcc_db_total_size_in_bytes gaugeetcd_mvcc_db_total_size_in_bytes 794624 HELP etcd_mvcc_db_total_size_in_use_in_bytes Total size of the underlying database logically in use in bytes.TYPE etcd_mvcc_db_total_size_in_use_in_bytes gaugeetcd_mvcc_db_total_size_in_use_in_bytes 794624 HELP etcd_mvcc_delete_total Total number of deletes seen by this member.TYPE etcd_mvcc_delete_total counteretcd_mvcc_delete_total 188 HELP etcd_mvcc_hash_duration_seconds The latency distribution of storage hash operation.TYPE etcd_mvcc_hash_duration_seconds histogrametcd_mvcc_hash_duration_seconds_bucket{le="0.01"} 0 HELP etcd_mvcc_hash_rev_duration_seconds The latency distribution of storage hash by revision operation.TYPE etcd_mvcc_hash_rev_duration_seconds histogrametcd_mvcc_hash_rev_duration_seconds_bucket{le="0.01"} 0 HELP etcd_mvcc_put_total Total number of puts seen by this member.TYPE etcd_mvcc_put_total counteretcd_mvcc_put_total 42048 HELP etcd_mvcc_range_total Total number of ranges seen by this member.TYPE etcd_mvcc_range_total counteretcd_mvcc_range_total 2.16009e+06 HELP etcd_mvcc_txn_total Total number of txns seen by this member.TYPE etcd_mvcc_txn_total counteretcd_mvcc_txn_total 3 HELP etcd_network_active_peers The current number of active peer connections.TYPE etcd_network_active_peers gaugeetcd_network_active_peers{Local="c654b4887e9b4d78",Remote="f2c6156a7274ff64"} 1 HELP etcd_network_client_grpc_received_bytes_total The total number of bytes received from grpc clients.TYPE etcd_network_client_grpc_received_bytes_total counteretcd_network_client_grpc_received_bytes_total 1.03267599e+08 HELP etcd_network_client_grpc_sent_bytes_total The total number of bytes sent to grpc clients.TYPE etcd_network_client_grpc_sent_bytes_total counteretcd_network_client_grpc_sent_bytes_total 3.63795452e+08 HELP etcd_network_disconnected_peers_total The total number of disconnected peers.TYPE etcd_network_disconnected_peers_total counteretcd_network_disconnected_peers_total{Local="c654b4887e9b4d78",Remote="f2c6156a7274ff64"} 1 HELP etcd_network_peer_received_bytes_total The total number of bytes received from peers.TYPE etcd_network_peer_received_bytes_total counteretcd_network_peer_received_bytes_total{From="0"} 8.659704e+07 HELP etcd_network_peer_round_trip_time_seconds Round-Trip-Time histogram between peersTYPE etcd_network_peer_round_trip_time_seconds histogrametcd_network_peer_round_trip_time_seconds_bucket{To="f2c6156a7274ff64",le="0.0001"} 1 HELP etcd_network_peer_sent_bytes_total The total number of bytes sent to peers.TYPE etcd_network_peer_sent_bytes_total counteretcd_network_peer_sent_bytes_total{To="f2c6156a7274ff64"} 7.86397649e+08 HELP etcd_network_peer_sent_failures_total The total number of send failures from peers.TYPE etcd_network_peer_sent_failures_total counteretcd_network_peer_sent_failures_total{To="f2c6156a7274ff64"} 1 HELP etcd_server_apply_duration_seconds The latency distributions of v2 apply called by backend.TYPE etcd_server_apply_duration_seconds histogrametcd_server_apply_duration_seconds_bucket{op="Alarm",success="true",version="v3",le="0.0001"} 12024 HELP etcd_server_client_requests_total The total number of client requests per client version.TYPE etcd_server_client_requests_total counteretcd_server_client_requests_total{client_api_version="3.5",type="stream"} 8 HELP etcd_server_go_version Which Go version server is running with. 1 for 'server_go_version' label with current version.TYPE etcd_server_go_version gaugeetcd_server_go_version{server_go_version="go1.16.15"} 1 HELP etcd_server_has_leader Whether or not a leader exists. 1 is existence, 0 is not.TYPE etcd_server_has_leader gaugeetcd_server_has_leader 1 HELP etcd_server_health_failures The total number of failed health checksTYPE etcd_server_health_failures counteretcd_server_health_failures 0 HELP etcd_server_health_success The total number of successful health checksTYPE etcd_server_health_success counteretcd_server_health_success 1 HELP etcd_server_heartbeat_send_failures_total The total number of leader heartbeat send failures (likely overloaded from slow disk).TYPE etcd_server_heartbeat_send_failures_total counteretcd_server_heartbeat_send_failures_total 12 HELP etcd_server_id Server or member ID in hexadecimal format. 1 for 'server_id' label with current ID.TYPE etcd_server_id gaugeetcd_server_id{server_id="c654b4887e9b4d78"} 1 HELP etcd_server_is_leader Whether or not this member is a leader. 1 if is, 0 otherwise.TYPE etcd_server_is_leader gaugeetcd_server_is_leader 0 HELP etcd_server_is_learner Whether or not this member is a learner. 1 if is, 0 otherwise.TYPE etcd_server_is_learner gaugeetcd_server_is_learner 0 HELP etcd_server_leader_changes_seen_total The number of leader changes seen.TYPE etcd_server_leader_changes_seen_total counteretcd_server_leader_changes_seen_total 3 HELP etcd_server_learner_promote_successes The total number of successful learner promotions while this member is leader.TYPE etcd_server_learner_promote_successes counteretcd_server_learner_promote_successes 0 HELP etcd_server_proposals_applied_total The total number of consensus proposals applied.TYPE etcd_server_proposals_applied_total gaugeetcd_server_proposals_applied_total 98662 HELP etcd_server_proposals_committed_total The total number of consensus proposals committed.TYPE etcd_server_proposals_committed_total gaugeetcd_server_proposals_committed_total 98662 HELP etcd_server_proposals_failed_total The total number of failed proposals seen.TYPE etcd_server_proposals_failed_total counteretcd_server_proposals_failed_total 1 HELP etcd_server_proposals_pending The current number of pending proposals to commit.TYPE etcd_server_proposals_pending gaugeetcd_server_proposals_pending 0 HELP etcd_server_quota_backend_bytes Current backend storage quota size in bytes.TYPE etcd_server_quota_backend_bytes gaugeetcd_server_quota_backend_bytes 2.68435456e+08 HELP etcd_server_read_indexes_failed_total The total number of failed read indexes seen.TYPE etcd_server_read_indexes_failed_total counteretcd_server_read_indexes_failed_total 2 HELP etcd_server_slow_apply_total The total number of slow apply requests (likely overloaded from slow disk).TYPE etcd_server_slow_apply_total counteretcd_server_slow_apply_total 303 HELP etcd_server_slow_read_indexes_total The total number of pending read indexes not in sync with leader's or timed out read index requests.TYPE etcd_server_slow_read_indexes_total counteretcd_server_slow_read_indexes_total 3 HELP etcd_server_snapshot_apply_in_progress_total 1 if the server is applying the incoming snapshot. 0 if none.TYPE etcd_server_snapshot_apply_in_progress_total gaugeetcd_server_snapshot_apply_in_progress_total 0 HELP etcd_server_version Which version is running. 1 for 'server_version' label with current version.TYPE etcd_server_version gaugeetcd_server_version{server_version="3.5.5"} 1 HELP etcd_snap_db_fsync_duration_seconds The latency distributions of fsyncing .snap.db fileTYPE etcd_snap_db_fsync_duration_seconds histogrametcd_snap_db_fsync_duration_seconds_bucket{le="0.001"} 0 HELP etcd_snap_db_save_total_duration_seconds The total latency distributions of v3 snapshot saveTYPE etcd_snap_db_save_total_duration_seconds histogrametcd_snap_db_save_total_duration_seconds_bucket{le="0.1"} 0 HELP etcd_snap_fsync_duration_seconds The latency distributions of fsync called by snap.TYPE etcd_snap_fsync_duration_seconds histogrametcd_snap_fsync_duration_seconds_bucket{le="0.001"} 0 HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.TYPE go_gc_duration_seconds summarygo_gc_duration_seconds{quantile="0"} 6.2073e-05 HELP go_goroutines Number of goroutines that currently exist.TYPE go_goroutines gaugego_goroutines 135 HELP go_info Information about the Go environment.TYPE go_info gaugego_info{version="go1.16.15"} 1 HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.TYPE go_memstats_alloc_bytes gaugego_memstats_alloc_bytes 3.703412e+07 HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.TYPE go_memstats_alloc_bytes_total countergo_memstats_alloc_bytes_total 8.29085368824e+12 HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.TYPE go_memstats_buck_hash_sys_bytes gaugego_memstats_buck_hash_sys_bytes 2.53702e+06 HELP go_memstats_frees_total Total number of frees.TYPE go_memstats_frees_total countergo_memstats_frees_total 5.3985721311e+10 HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.TYPE go_memstats_gc_cpu_fraction gaugego_memstats_gc_cpu_fraction 0.0001143697751305955 HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.TYPE go_memstats_gc_sys_bytes gaugego_memstats_gc_sys_bytes 8.436832e+06 HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.TYPE go_memstats_heap_alloc_bytes gaugego_memstats_heap_alloc_bytes 3.703412e+07 HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.TYPE go_memstats_heap_idle_bytes gaugego_memstats_heap_idle_bytes 8.953856e+07 HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.TYPE go_memstats_heap_inuse_bytes gaugego_memstats_heap_inuse_bytes 4.1402368e+07 HELP go_memstats_heap_objects Number of allocated objects.TYPE go_memstats_heap_objects gaugego_memstats_heap_objects 97905 HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.TYPE go_memstats_heap_released_bytes gaugego_memstats_heap_released_bytes 6.5961984e+07 HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.TYPE go_memstats_heap_sys_bytes gaugego_memstats_heap_sys_bytes 1.30940928e+08 HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.TYPE go_memstats_last_gc_time_seconds gaugego_memstats_last_gc_time_seconds 1.6808329412625122e+09 HELP go_memstats_lookups_total Total number of pointer lookups.TYPE go_memstats_lookups_total countergo_memstats_lookups_total 0 HELP go_memstats_mallocs_total Total number of mallocs.TYPE go_memstats_mallocs_total countergo_memstats_mallocs_total 5.3985819216e+10 HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.TYPE go_memstats_mcache_inuse_bytes gaugego_memstats_mcache_inuse_bytes 43200 HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.TYPE go_memstats_mcache_sys_bytes gaugego_memstats_mcache_sys_bytes 49152 HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.TYPE go_memstats_mspan_inuse_bytes gaugego_memstats_mspan_inuse_bytes 583984 HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.TYPE go_memstats_mspan_sys_bytes gaugego_memstats_mspan_sys_bytes 1.081344e+06 HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.TYPE go_memstats_next_gc_bytes gaugego_memstats_next_gc_bytes 5.8364256e+07 HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.TYPE go_memstats_other_sys_bytes gaugego_memstats_other_sys_bytes 5.87302e+06 HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.TYPE go_memstats_stack_inuse_bytes gaugego_memstats_stack_inuse_bytes 3.2768e+06 HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.TYPE go_memstats_stack_sys_bytes gaugego_memstats_stack_sys_bytes 3.2768e+06 HELP go_memstats_sys_bytes Number of bytes obtained from system.TYPE go_memstats_sys_bytes gaugego_memstats_sys_bytes 1.52195096e+08 HELP go_threads Number of OS threads created.TYPE go_threads gaugego_threads 40 HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.TYPE grpc_server_handled_total countergrpc_server_handled_total{grpc_code="Aborted",grpc_method="Alarm",grpc_service="etcdserverpb.Maintenance",grpc_type="unary"} 0 HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.TYPE grpc_server_msg_received_total countergrpc_server_msg_received_total{grpc_method="Alarm",grpc_service="etcdserverpb.Maintenance",grpc_type="unary"} 4006 HELP grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server.TYPE grpc_server_msg_sent_total countergrpc_server_msg_sent_total{grpc_method="Alarm",grpc_service="etcdserverpb.Maintenance",grpc_type="unary"} 4006 HELP grpc_server_started_total Total number of RPCs started on the server.TYPE grpc_server_started_total countergrpc_server_started_total{grpc_method="Alarm",grpc_service="etcdserverpb.Maintenance",grpc_type="unary"} 4006 HELP os_fd_limit The file descriptor limit.TYPE os_fd_limit gaugeos_fd_limit 1.048576e+06 HELP os_fd_used The number of used file descriptors.TYPE os_fd_used gaugeos_fd_used 37 HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.TYPE process_cpu_seconds_total counterprocess_cpu_seconds_total 47557.93 HELP process_max_fds Maximum number of open file descriptors.TYPE process_max_fds gaugeprocess_max_fds 1.048576e+06 HELP process_open_fds Number of open file descriptors.TYPE process_open_fds gaugeprocess_open_fds 37 HELP process_resident_memory_bytes Resident memory size in bytes.TYPE process_resident_memory_bytes gaugeprocess_resident_memory_bytes 9.5535104e+07 HELP process_start_time_seconds Start time of the process since unix epoch in seconds.TYPE process_start_time_seconds gaugeprocess_start_time_seconds 1.67963017501e+09 HELP process_virtual_memory_bytes Virtual memory size in bytes.TYPE process_virtual_memory_bytes gaugeprocess_virtual_memory_bytes 1.360265216e+09 HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.TYPE process_virtual_memory_max_bytes gaugeprocess_virtual_memory_max_bytes 1.8446744073709552e+19 HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.TYPE promhttp_metric_handler_requests_in_flight gaugepromhttp_metric_handler_requests_in_flight 1 HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.TYPE promhttp_metric_handler_requests_total counterpromhttp_metric_handler_requests_total{code="200"} 235970 Please find the above attached metrics and please answer our ticket. |
Thanks @Tejaswini5327 for providing the information. Could you try upgrade the etcd version to latest 3.5 version? I suspect it is related to Revision inconsistency caused by panic during defrag due to the per hour defrag activity in your cluster. The fix has been released in v3.5.6 onwards. |
What happened?
we have service of 3 pods which are mounted on different worker nodes. One of the Worker node was restarted. Pod-0 was the leader at that time but once pod-1 became not available, the cluster did not behave properly after the pod-1 become unavailable.The following logs are seen in pod-2 : -revision is continuously changing in the logs
{"severity":"warning","timestamp":"2023-02-27T11:12:25.069Z","caller":"etcdserver/util.go:123","message":"failed to apply request","took":"13.502µs","request":"header:<ID:18402981799651370292 > compaction:<revision:91082 > ","response":"","error":"mvcc: required revision is a future revision"}
{"severity":"warning","timestamp":"2023-02-27T11:17:25.079Z","caller":"etcdserver/util.go:123","message":"failed to apply request","took":"11.687µs","request":"header:<ID:18402981799651370294 > compaction:<revision:91092 > ","response":"","error":"mvcc: required revision is a future revision"}
Finished defragmenting etcd member[:2379]
102 > ","response":"","error":"mvcc: required revision is a future revision"}
{"severity":"warning","timestamp":"2023-02-27T11:27:25.112Z","caller":"etcdserver/util.go:123","message":"failed to apply request","took":"11.278µs","request":"header:<ID:18402981799651370303 > compaction:<revision:91112 > ","response":"","error":"mvcc: required revision is a future revision"}
{"severity":"warning","timestamp":"2023-02-27T11:32:25.125Z","caller":"etcdserver/util.go:123","message":"failed to apply request","took":"11.614µs","request":"header:<ID:18402981799651370309 > compaction:<revision:91122 > ","response":"","error":"mvcc: required revision is a future revision"}
{"severity":"warning","timestamp":"2023-02-27T11:37:25.139Z","caller":"etcdserver/util.go:123","message":"failed to apply request","took":"9.236µs","request":"header:<ID:18402981799651370312 > compaction:<revision:91132 > ","response":"","error":"mvcc: required revision is a future revision"}
{"severity":"warning","timestamp":"2023-02-27T11:42:25.149Z","caller":"etcdserver/util.go:123","message":"failed to apply request","took":"20.507µs","request":"header:<ID:18402981799651370317 > compaction:<revision:91143 > ","response":"","error":"mvcc: required revision is a future revision"}
{"severity":"warning","timestamp":"2023-02-27T11:47:25.165Z","caller":"etcdserver/util.go:123","message":"failed to apply request","took":"11.72µs","request":"header:<ID:18402981799651370319 > compaction:<revision:91153 > ","response":"","error":"mvcc: required revision is a future revision"}
What did you expect to happen?
The weird behavior of the cluster after worker node restart and also the revision continuously changing in logs should not happen.
How can we reproduce it (as minimally and precisely as possible)?
Restarting worker node and after checking the behavior of cluster and pod logs may help to reproduce.
Anything else we need to know?
No response
Etcd version (please run commands below)
bash-4.4$ etcd --version
etcd Version: 3.5.5
Git SHA: 19002cf
Go Version: go1.16.15
Go OS/Arch: linux/amd64
bash-4.4$ etcdctl version
etcdctl version: 3.5.5
API version: 3.5
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
The text was updated successfully, but these errors were encountered: