Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LWTRetryPolicy interface #213

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

Lorak-mmk
Copy link

@Lorak-mmk Lorak-mmk commented Jul 7, 2024

RetryPolicy may want to return different decisions from GetRetryType depending on whether query is LWT or not - to prevent executing LWT on non-primary replica.
This is not possible with current interface, because GetRetryType doesn't know the query type.

This PR introduces new interface, LWTRetryPolicy. If configured retry policy also implements LWTRetryPolicy, then methods from LWTRetryPolicy will be called for LWT statements instead of methods from RetryPolicy.

There are alternative approaches to this issue:

  • Introduce LWTRetryPolicy as another field in Session and Query: requires changes in user code.
  • For LWT queries ignore GetRetryType and retry on the same host: it's less configurable

@Lorak-mmk Lorak-mmk requested a review from dkropachev July 7, 2024 18:33
// This is a similar interface to RetryPolicy
// If a query is recognized as LWT query it, and its RetryPolicy satisfies this
// interface, then this interface will be used instead of RetryPolicy.
type LWTRetryPolicy interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since RetryableQuery and ExecutableQuery are tightly connected to each other, I would suggest to add IsLWT() bool to RetryableQuery and change RetryPolicy.GetRetryType to GetRetryType(RetryableQuery, error) RetryType

Yes, it is going to be a breaking change, but very light in a sence of fixing it and if user is using built-in retrypolicy it is going to be autoresolved.

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about doing breaking changes. It is a high priority issue and we want to be able to merge it and release as quickly as possible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what is the policy about breaking changes in gocql, because I don't usually work on this driver and I'm not a maintainer. If you want me to I can implement proposed changes.
My thinking with avoiding breaking changes was:

  • Afaik Golang itself maintains backwards compatibility, so I assume its expected in the libraries too
  • Even without breaking changes customers often stay on older version. If updating requires changes it will be harder to move users to current version
  • It's not just incompatibility with our old version, but with upstream gocql too, so it makes migration harder (and possibly merges with upstream too)
  • If it is acceptable to introduce a breaking change, then we can do that in the future too, removing LWTRetryPolicy and applying your proposed change to RetryPolicy.

@wprzytula wprzytula self-requested a review July 8, 2024 07:38
Comment on lines +194 to +200
func (s *SimpleRetryPolicy) GetRetryTypeLWT(err error) RetryType {
return Retry
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really do this unconditionally, even if error returned by the DB signifies unability to serve requests? E.g., ServerError or Overloaded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that as long as we have open connections to the node, it may (at some point) perform the request, so the policy can keep trying.
If we loose connection, then the driver will move to the next node.
It seemed safer to avoid LWT contention at all cost.

cc @vladzcloudius

Comment on lines +238 to +248
func (e *ExponentialBackoffRetryPolicy) GetRetryTypeLWT(err error) RetryType {
return Retry
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

query_executor.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one more policy left to cover - DowngradingConsistencyRetryPolicy

policies.go Outdated Show resolved Hide resolved
policies.go Show resolved Hide resolved
@Lorak-mmk
Copy link
Author

There is one more policy left to cover - DowngradingConsistencyRetryPolicy

There is. I'm not sure how it should be modify, I'd really like some advice

@wprzytula
Copy link
Collaborator

There is one more policy left to cover - DowngradingConsistencyRetryPolicy

There is. I'm not sure how it should be modify, I'd really like some advice

IMO the only change needed for LWT case is replacing the only occurence of RetryNextHost in GetRetryType with Retry.

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 8, 2024

There is one more policy left to cover - DowngradingConsistencyRetryPolicy

There is. I'm not sure how it should be modify, I'd really like some advice

IMO the only change needed for LWT case is replacing the only occurence of RetryNextHost in GetRetryType with Retry.

AFAIK LWT Queries are run in SERIAL or LOCAL_SERIAL CL, so AttemptLWT should avoid q.SetConsistency( for LWT
And as @wprzytula suggested GetRetryType should not go for Retry where it does RetryNextHost

@wprzytula
Copy link
Collaborator

There is one more policy left to cover - DowngradingConsistencyRetryPolicy

There is. I'm not sure how it should be modify, I'd really like some advice

IMO the only change needed for LWT case is replacing the only occurence of RetryNextHost in GetRetryType with Retry.

AFAIK LWT Queries are run in SERIAL or LOCAL_SERIAL CL, so AttemptLWT should avoid q.SetConsistency( for LWT And as @wprzytula suggested GetRetryType should not go for Retry where it does RetryNextHost

Good catch about this q.SetConsistency.

@Lorak-mmk
Copy link
Author

There is one more policy left to cover - DowngradingConsistencyRetryPolicy

There is. I'm not sure how it should be modify, I'd really like some advice

IMO the only change needed for LWT case is replacing the only occurence of RetryNextHost in GetRetryType with Retry.

AFAIK LWT Queries are run in SERIAL or LOCAL_SERIAL CL, so AttemptLWT should avoid q.SetConsistency( for LWT And as @wprzytula suggested GetRetryType should not go for Retry where it does RetryNextHost

Good catch about this q.SetConsistency.

But in that case it's no longer DowngradingConsistencyRetryPolicy, that's why I thought it would be best to skip it for now.
Btw what is the use case for this policy? If someone is fine with using lower consistency level, then why use the higher one at all?

@Lorak-mmk
Copy link
Author

@wprzytula failed test is flaky according to @sylwiaszunejko , could you rerun CI?

@wprzytula
Copy link
Collaborator

wprzytula commented Jul 8, 2024

There is one more policy left to cover - DowngradingConsistencyRetryPolicy

There is. I'm not sure how it should be modify, I'd really like some advice

IMO the only change needed for LWT case is replacing the only occurence of RetryNextHost in GetRetryType with Retry.

AFAIK LWT Queries are run in SERIAL or LOCAL_SERIAL CL, so AttemptLWT should avoid q.SetConsistency( for LWT And as @wprzytula suggested GetRetryType should not go for Retry where it does RetryNextHost

Good catch about this q.SetConsistency.

But in that case it's no longer DowngradingConsistencyRetryPolicy, that's why I thought it would be best to skip it for now. Btw what is the use case for this policy? If someone is fine with using lower consistency level, then why use the higher one at all?

The idea is to use a higher consistency level normally, but as an exception allow lower consistency when higher cannot be immediately obtained because of write/read timeout. This is to cope with spikes of load over short time that would otherwise result in timeouts due to requested consistency not reached in time.

Datastax recommends in their drivers using DowngradingConsistencyRetryPolicy always with logging its decisions turned on, to debug issues caused by its too consistency downgrades.

@Lorak-mmk
Copy link
Author

There is one more policy left to cover - DowngradingConsistencyRetryPolicy

There is. I'm not sure how it should be modify, I'd really like some advice

IMO the only change needed for LWT case is replacing the only occurence of RetryNextHost in GetRetryType with Retry.

AFAIK LWT Queries are run in SERIAL or LOCAL_SERIAL CL, so AttemptLWT should avoid q.SetConsistency( for LWT And as @wprzytula suggested GetRetryType should not go for Retry where it does RetryNextHost

Good catch about this q.SetConsistency.

But in that case it's no longer DowngradingConsistencyRetryPolicy, that's why I thought it would be best to skip it for now. Btw what is the use case for this policy? If someone is fine with using lower consistency level, then why use the higher one at all?

The idea is to use a higher consistency level normally, but as an exception allow lower consistency when higher cannot be immediately obtained because of write/read timeout. This is to cope with spikes of load over short time that would otherwise result in timeouts due to requested consistency not reached in time.

Applications choose a consistency level used that is required for their correctness. If the application is still correct with lower consistency level, then it can use this lower level. If it is not, then downgrading consistency will make the application not correct.
Anyway, I don't see a good semantics for LWT for this policy. Imo we should skip this, it's not important to solve now.

@wprzytula
Copy link
Collaborator

There is one more policy left to cover - DowngradingConsistencyRetryPolicy

There is. I'm not sure how it should be modify, I'd really like some advice

IMO the only change needed for LWT case is replacing the only occurence of RetryNextHost in GetRetryType with Retry.

AFAIK LWT Queries are run in SERIAL or LOCAL_SERIAL CL, so AttemptLWT should avoid q.SetConsistency( for LWT And as @wprzytula suggested GetRetryType should not go for Retry where it does RetryNextHost

Good catch about this q.SetConsistency.

But in that case it's no longer DowngradingConsistencyRetryPolicy, that's why I thought it would be best to skip it for now. Btw what is the use case for this policy? If someone is fine with using lower consistency level, then why use the higher one at all?

The idea is to use a higher consistency level normally, but as an exception allow lower consistency when higher cannot be immediately obtained because of write/read timeout. This is to cope with spikes of load over short time that would otherwise result in timeouts due to requested consistency not reached in time.

Applications choose a consistency level used that is required for their correctness. If the application is still correct with lower consistency level, then it can use this lower level. If it is not, then downgrading consistency will make the application not correct. Anyway, I don't see a good semantics for LWT for this policy. Imo we should skip this, it's not important to solve now.

See java driver documentation for reference.

sylwiaszunejko
sylwiaszunejko previously approved these changes Jul 8, 2024
policies.go Outdated Show resolved Hide resolved
policies.go Outdated Show resolved Hide resolved
RetryPolicy may want to return different decisions from `GetRetryType`
depending on whether query is LWT or not - to prevent executing LWT
on non-primary replica.
This is not possible with current interface, because `GetRetryType`
doesn't know the query type.

This commit introduces new interface, `LWTRetryPolicy`.
If configured retry policy also implements `LWTRetryPolicy`, then methods
from `LWTRetryPolicy` will be called fro LWT statements instead of
methods from `RetryPolicy`.
@sylwiaszunejko sylwiaszunejko merged commit a00403c into scylladb:master Jul 8, 2024
1 check passed
@Lorak-mmk Lorak-mmk self-assigned this Jul 9, 2024
@wprzytula wprzytula self-assigned this Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants