[Thanos Receive] --receive.replication-factor=2 leads to remote write unavailability #7274
-
Hi everyone!
Config of Ingestor receives:
hashrings.json:
When I set --receive.replication-factor=2 every time I delete any Ingesting receive pod I get a lot of "backing off forward request for endpoint" errors and the successful remote write to this Thanos Receive instance drops to 0 until the pod is ready again.
I saw a discussion with the same topic, but seems like there is no answer currently |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
Hey, quroum in the code is currently this https://github.com/thanos-io/thanos/blob/8227108dba098a6cf4aa7c00c13ed1ae42c2d088/pkg/receive/handler.go#L990C1-L994C1, (rf/2 + 1) which would be 2 for rf=2, so rf=2 cannot tolerate one node going away. |
Beta Was this translation helpful? Give feedback.
Ah, if you send a write it will get chopped up and fanned out to all other nodes if its big enough (we hash the series and route it to the node that owns that hash in accordance with the hashring - for every series). So any request that contains enough series will probably touch the node that is going away and that particular series will fail to reach quorum.