Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core) make sure that cassandra handles cluster event consistency #5032

Closed
wants to merge 2 commits into from
Closed

fix(core) make sure that cassandra handles cluster event consistency #5032

wants to merge 2 commits into from

Conversation

scaat
Copy link
Contributor

@scaat scaat commented Sep 13, 2019

Summary

The default is local_one, which does not match the strong consistency of cassandra, which may result in inaccurate data.

This PR takes the consistent configuration from the configuration of kong to ensure that it meets the user's expectations.

@bungle bungle requested a review from thibaultcha September 13, 2019 09:08
@thibaultcha
Copy link
Member

Hello,

Thank you for proposing the patch. The decision to not use the user-specified Cassandra consistency in the cluster events module is a deliberate one.

Putting aside the fact that the Cassandra consistency settings need a lot more granularity than today's configuration properties offer (cc @bungle, with whom we are working on this), the cluster events module deserves its own consistency settings. In order to avoid over-configuration and, most of all, misconfiguration from users unfamiliar with Cassandra, the consistency settings were hard-coded to the most appropriate default: a datacenter-aware, high availability consistency setting. Given the chattiness of the cluster events insertion and polling, we believe the value is appropriate, and we decided to overcome eventual consistency scenarios by introducing the db_update_propagation configuration property. What this property does is querying the cluster_events table with a "grace period" (or backwards window) so that if an event pre-dating the current window was received by the replica node, it could still be picked up and processed by polling nodes (more details in the original PR introducing the current caching and invalidation mechanisms: #2561).

As such, I don't think that we should merge such a patch. For now, you can configure db_update_propagation to a value higher than 0, and a "safety" grace period should, imho, be introduced in order to always give a chance for eventually consistent past events to be polled and processed. In fact, it was my assumption that this was already the case, but I must have been mistaken thinking it was already implemented when it must have been plans I had in mind but never got a chance to address.

@scaat
Copy link
Contributor Author

scaat commented Sep 18, 2019

Thank you for your reply @thibaultcha , I have two questions I would like to ask:

  1. The default setting is local_one. Even if db_update_propagation is added, is it possible that some events are lost when a node pulls a cluster event?

  2. When my usage scenario is multiple data centers, and each data center has 3 nodes, each data center has a repl_factor of 3, and the consistency is local_quorum, the default configuration local_one of this cluster event can guarantee strong Consistency and no loss of cluster events?

@thibaultcha
Copy link
Member

  1. The default setting is local_one. Even if db_update_propagation is added, is it possible that some events are lost when a node pulls a cluster event?

By using LOCAL_ONE, the cluster events polling will only contact a single peer, so if the peer has not received the event (i.e. the replica row) between the instants t and t + db_update_propagation + 0.001 (see https://github.com/Kong/kong/blob/master/kong/cluster_events.lua#L220), it may be lost. However, using a higher consistency setting may be too expensive for a cluster at scale handling a lot of CRUD operations, so this change is somewhat risky. A lower consistency with a grace period is a more efficient alternative imho, and also helps covering the cross-datacenter latency issues (without having to reply on a non-local consistency setting). Besides, note that when an invalidation cluster event is received, the associated updated entity will be queried as well. Until we can offer more granularity between proxy server and Admin API queries (the former requiring availability and the latter requiring consistency), the db_update_propagation property is a helpful solution.

  1. When my usage scenario is multiple data centers, and each data center has 3 nodes, each data center has a repl_factor of 3, and the consistency is local_quorum, the default configuration local_one of this cluster event can guarantee strong Consistency and no loss of cluster events?

With db_update_propagation, yes, but not with guarantees as strong as using a higher consistency setting. And I explained above several times, we may still prefer using LOCAL_ONE for the cluster events queries. Increasing the grace period (and always having one by default) is very much needed imho. And again, granular consistency settings would allow for a healthy consistency setting between proxy server queries, Admin API queries, and cluster events.

If you are running a large C*-backed production cluster, we'd be interested in hearing your feedback if you chose to run your cluster events queries with a higher consistency setting.

@scaat
Copy link
Contributor Author

scaat commented Sep 20, 2019

Thank you for your reply. @thibaultcha

If the loss of the event cannot be avoided, what should be done when the event is lost?

@thibaultcha
Copy link
Member

@scaat The /cache Admin API endpoint was implemented for this purpose. I don't believe that it has yet been properly documented, but you can browse the source here: https://github.com/Kong/kong/blob/master/kong/api/routes/cache.lua

@scaat
Copy link
Contributor Author

scaat commented Sep 24, 2019

Thank you for your reply. @thibaultcha

In addition to manually using the /cache API, can I directly set db_cache_ttl to automatically clean up the cache?

In short, is the effect of using /cache API the same as setting db_cache_ttl?

@thibaultcha
Copy link
Member

In addition to manually using the /cache API, can I directly set db_cache_ttl to automatically clean up the cache?

Yes, this is also the use-case behing db_cache_ttl (a safety measure). Beware of possible spikes in DB traffic when a significant portion of your cache reaches TTL at the same time, it can happen.

@kikito
Copy link
Member

kikito commented Oct 7, 2019

Hi, since the conversation seems to have ended and we decided not to merge this PR, I'm going to close it down. If anyone feels the need to reopen, please do so.

@kikito kikito closed this Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants