-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: log DirectPath misconfiguration #2105
Conversation
@blakeli0 This log will happen in every channel creation. I am worried this PR will make too many logs, especially if the customer has a large channel pool size. Does Java open source has something similar to LOG_EVERY_N_SEC as C++ absl? |
[java_showcase_unit_tests] SonarCloud Quality Gate failed. 0 Bugs 46.7% Coverage Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling in a colleague from Dataproc team to weigh in; Thanks for tackling this @mohanli-ml;
// Case 2: just enable DirectPath xDS | ||
if (!isDirectPathOptionSet && (isDirectPathXdsEnvSet || isDirectPathXdsOptionSet)) { | ||
LOG.log( | ||
Level.WARNING, "DirectPath is misconfigured. Please set the attemptDirectPath option."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@singhravidutt this PR is w.r.t R.1 for observability.
Would hadoop connector customers be able to see these warnings emitted when running the connector or would this require a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should be able to surface these logs.
// Case 2: just enable DirectPath xDS | ||
if (!isDirectPathOptionSet && (isDirectPathXdsEnvSet || isDirectPathXdsOptionSet)) { | ||
LOG.log( | ||
Level.WARNING, "DirectPath is misconfigured. Please set the attemptDirectPath option."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add values of isDirectPathOptionSet
, isDirectPathXdsEnvSet
and isDirectPathXdsOptionSet
as part of log statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case user enables DirectPathXds either by env or option, but DirectPath itself is not enabled. I think the log is already clear that the user needs to set attemptDirectPath
and there is no need to log the values for judging the case. WDYT?
@@ -266,6 +270,40 @@ boolean isDirectPathXdsEnabled() { | |||
return false; | |||
} | |||
|
|||
private void logDirectPathMisconfig() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this logic is added specifically for log statements. How do we make sure it is in parity with directPath decision making?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the bit that actually handles connecting to DirectPath in gRPC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DirectPath decision making is L363:
if (isDirectPathEnabled() && isNonDefaultServiceAccountAllowed() && isOnComputeEngine()) {
And I came up with the 4 cases based on it. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was that this logic will be invoked only when directPath is misconfigured. But this is not the logic which decides if directPath is misconfigured. This code snippet could become stale if there is change in logic which is actually deciding when directpath is misconfigured.
Just as an e.g. this is the code snippet which decides directPath misconfiguration
if (isDirectPathEnabled() && isNonDefaultServiceAccountAllowed() && isOnComputeEngine()) {
Here we check for isNonDefaultServiceAccountAllowed
. Hypothetically in future if isNonDefaultServiceAccountAllowed
check is removed there is no easy way to know that fun:logDirectPathMisconfig
needs to be updated as well.
I guess there is no easy way to enforce it and this is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your inputs! isNonDefaultServiceAccountAllowed()
and isOnComputeEngine()
should be kept even after DirectPath rolls out, and then we will need to cleanup env/options, and this warning logs should also be cleaned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same concern as @singhravidutt, the logic in logDirectPathMisconfig()
could evolve to be different than (isDirectPathEnabled() && isNonDefaultServiceAccountAllowed() && isOnComputeEngine())
. Ideally, we should check if we should use DirectPath or not during the initialization of this class, set the value to a field useDirectPath
, and use it to check if we should create a DirectPath channel afterwards.
See the following code that demonstrates my idea:
public InstantiatingGrpcChannelProvider build() {
if (shouldUseDirectPath()) {
setUseDirectPath(true);
}
return new InstantiatingGrpcChannelProvider(this);
}
private boolean shouldUseDirectPath() {
if (isDirectPathEnabled()) {
return isOnComputeWithValidCredential();
} else {
if (isDirectPathXdsEnabled()) {
//log
isOnComputeWithValidCredential();
}
}
return false;
}
private boolean isOnComputeWithValidCredential() {
if (!isNonDefaultServiceAccountAllowed()) {
//log
return false;
}
if (!isOnComputeEngine()) {
//log
return false;
}
return true;
}
This also makes the new code much more testable as SonarCheck is failing due to lacking test coverage of the new code.
@mohanli-ml Let me know if it makes sense, if you think it might be too much at this moment, our team can take it as a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! Moving the DirectPath enabling logic to the builder instead of checking it every time when construct a channel sounds good to me. However, pay attention to the logic here:
private boolean shouldUseDirectPath() {
if (isDirectPathEnabled()) { <- This should be `if (isDirectPathEnabled() && !isDirectPathXdsEnabled())`
return isOnComputeWithValidCredential();
} else {
if (isDirectPathXdsEnabled()) {
//log
isOnComputeWithValidCredential();
}
}
return false;
}
In short I feel the logic is already complex and I would suggest to do this with the cleanup of removing DirectPath gRPCLB, which is blocked by CBT and is expected to be done in Q4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short I feel the logic is already complex and I would suggest to do this with the cleanup of removing DirectPath gRPCLB, which is blocked by CBT and is expected to be done in Q4.
I think that's reasonable. For now, can you please add some unit tests for these logging statement? So that the test coverage for new code meets our least standard and gives us more confidence when we refactor it. For example, one way to test it is to create a FakeLogger and assert the log message captured in the FakeLogger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
It's a valid concern, I would suggest refactor the code so that we don't have to log it on channel creation in |
} | ||
|
||
// The following WARNING logs are GCS only. | ||
if (serviceAddress.contains("storage.googleapis.com") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely not allowed. Gax must not be aware of anything related to a specific service. Looks like we need two logging strategies for opt-in and opt-out, can we make it a flag and Storage can pass the flag into gax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this generally useful though? This log helps call out that attempting directpath wasn't configured correctly to developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed your comment above, about adding this into Constructor which sounds like a good path for this; just calling out this is generally useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree this log is generally useful. But hardcoding the storage specific endpoint storage.googleapis.com
in Gax is not allowed, sorry if I didn't make this clear in previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a new idea. Let’s only make the WARNING log if DirectPath xDS env/option is set. DirectPath xDS is opt-in for GCS/CBT/CS, so we do not need to have two strategies. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these warnings should be emitted for both cases; a user is attempting DirectPath and it would be helpful to know they're doing something wrong in configuring it. Otherwise, it's a blackbox and users are left to weeks of debugging what went wrong. The goal of this is to make it easier to understand a user did something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Your concern is that what if a customer set attemptDirectPath
but not attemptDirectPathXds
option/env, there will be no warnings.
Let me provide some context here. attemptDirectPath
is originally created for DirectPath gRPCLB (gRPCLB is the name resolver). Then DirectPath xDS is developed (TD is the name resolver) and we decide to migrate DirectPath gRPCLB to DirectPath xDS. To differentiate the two DirectPath infrastructures, attemptDirectPathXds
option/env are created:
- If
attemptDirectPath
is NOT set, CloudPath will be used; - If
attemptDirectPath
is set ANDattemptDirectPathXds
option/env is NOT set, DirectPath gRPCLB is used; - If
attemptDirectPath
is set ANDattemptDirectPathXds
option/env is set, DirectPath xDS is used.
Currently the migration is still ongoing as CBT still has DirectPath gRPCLB traffic (CBT's timeline is Q4). After that, I will cleanup DirectPath gRPCLB code in the gax library. For example, attemptDirectPath
will be removed and we just need to keep attemptDirectPathXds
option/env.
With that being said, I can change the DirectPath enabling logic to:
- If
attemptDirectPath
is NOT set ANDattemptDirectPathXds
option/env is NOT set, CloudPath will be used; - If
attemptDirectPath
is set ANDattemptDirectPathXds
option/env is NOT set, DirectPath gRPCLB is used; - If
attemptDirectPathXds
option/env is set, DirectPath xDS is used.
In this way, GCS customers just need to set attemptDirectPathXds
option/env, and they do not need to care about attemptDirectPath
. Then the current log logic should be good. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the way the options work sounds like a reasonable thing to do after migration is complete.
In java-storage, the attemptDirectPath and attemptDirectPathXds issue has been addressed by enabling both, but these warnings would have helped save the team time. Moreover, we don't expose configuration of each flag attemptDirectPath and attemptDirectPathXds directly to users instead only expose 1 flag that enables both.
My original goal is to add warnings to let GCS users know they have misconfigured the client to use DirectPath. This also lends to supporting future warnings if new conditions are added.
The three main cases surfaced in this logic (ignoring gRPCLB case because it's not ready yet):
- attemptDirectPath vs. attemptDirectPathXds is one way;
- non GCE credentials; this is likely to bite a customer
- Not using GCE; not as much a concern, but we don't document DirectPath in public documentation so a user could do something silly and use the our clients outside GCE without understanding that DP is not available.
I think the warnings would be generally useful outside of GCS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, we don't expose configuration of each flag attemptDirectPath and attemptDirectPathXds directly to users instead only expose 1 flag that enables both.
Sorry I am confused. What are the two cases in the previous comment I think these warnings should be emitted for both cases
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we are speaking past each other?
I have a new idea. Let’s only make the WARNING log if DirectPath xDS env/option is set. DirectPath xDS is opt-in for GCS/CBT/CS, so we do not need to have two strategies. WDYT?
The point i was driving at is that there's nothing telling a user they configured DirectPath incorrectly when they use attemptDirectPath or attemptDirectPathXds; and selecting the cases only when attemptDirectPathXds is set to emit these warnings misses this point.
@mohanli-ml Thanks for opening this PR! The logic to determine different logs seems pretty complicated and people have different opinions. Can you please create an issue and/or an internal doc, list all the scenarios there, and make sure we all agree on the approach first? |
Yes I moved the method to the channel pool creation, so no matter how many channels customers want, the log should only be logged once. Can you double check if this is the right place? |
Before we do this, can you take a look at the latest code? |
LOG.log( | ||
Level.WARNING, | ||
"DirectPath is misconfigured. Please make sure the credential is an instance of" | ||
+ " ComputeEngineCredentials."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use fully qualified class name ComputeEngineCredentials.class.getName()
TIL you need GCE credentials for DirectPath; what is a customer is using Workload Identity would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
GKE workload identity is compatible with DirectPath.
// Case 1: does not enable DirectPath | ||
if (!isDirectPathOptionSet) { | ||
LOG.log( | ||
Level.WARNING, "DirectPath is misconfigured. Please set the attemptDirectPath option."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend:
DirectPath is misconfigured. Please enable the attemptDirectPath option along with the attemptDirectPathXds option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Case 3: not running on GCE | ||
if (!isOnComputeEngine()) { | ||
LOG.log( | ||
Level.WARNING, "DirectPath is misconfigured. Please run in the GCE environment. "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
DirectPath is misconfigured. DirectPath is only available in a GCE environment.
You also have whitespace at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
+ " attemptDirectPathXds option."); | ||
} else { | ||
// Case 2: credential is not correctly set | ||
if (!isNonDefaultServiceAccountAllowed()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have a ComputeEngineCredentials
but not on ComputeEngine? I though auth would only return ComputeEngineCredentials
if on ComputeEngine, so isOnComputeEngine()
may not be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is possible, this is the original fix for this problem, googleapis/gax-java#1250.
@@ -233,6 +237,7 @@ public TransportChannel getTransportChannel() throws IOException { | |||
} | |||
|
|||
private TransportChannel createChannel() throws IOException { | |||
logDirectPathMisconfig(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createChannel()
could still be called multiple times, I would suggest to log the info in the builder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not do this because logDirectPathMisconfig()
is a non-static function so it can not be used before the construction. Can you suggest more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move logDirectPathMisconfig()
into the Builder class then it should be fine, ideally we should do it but then we would need to move all related logics into it as well. The quickest solution would be add it to the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
boolean isDirectPathXdsEnvSet = | ||
Boolean.parseBoolean(envProvider.getenv(DIRECT_PATH_ENV_ENABLE_XDS)); | ||
boolean isDirectPathXdsOptionSet = Boolean.TRUE.equals(attemptDirectPathXds); | ||
if (isDirectPathXdsEnvSet || isDirectPathXdsOptionSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have isDirectPathXdsEnabled() for the same purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
boolean isDirectPathXdsOptionSet = Boolean.TRUE.equals(attemptDirectPathXds); | ||
if (isDirectPathXdsEnvSet || isDirectPathXdsOptionSet) { | ||
// Case 1: does not enable DirectPath | ||
if (!isDirectPathOptionSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use isDirectPathEnabled()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -266,6 +270,40 @@ boolean isDirectPathXdsEnabled() { | |||
return false; | |||
} | |||
|
|||
private void logDirectPathMisconfig() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same concern as @singhravidutt, the logic in logDirectPathMisconfig()
could evolve to be different than (isDirectPathEnabled() && isNonDefaultServiceAccountAllowed() && isOnComputeEngine())
. Ideally, we should check if we should use DirectPath or not during the initialization of this class, set the value to a field useDirectPath
, and use it to check if we should create a DirectPath channel afterwards.
See the following code that demonstrates my idea:
public InstantiatingGrpcChannelProvider build() {
if (shouldUseDirectPath()) {
setUseDirectPath(true);
}
return new InstantiatingGrpcChannelProvider(this);
}
private boolean shouldUseDirectPath() {
if (isDirectPathEnabled()) {
return isOnComputeWithValidCredential();
} else {
if (isDirectPathXdsEnabled()) {
//log
isOnComputeWithValidCredential();
}
}
return false;
}
private boolean isOnComputeWithValidCredential() {
if (!isNonDefaultServiceAccountAllowed()) {
//log
return false;
}
if (!isOnComputeEngine()) {
//log
return false;
}
return true;
}
This also makes the new code much more testable as SonarCheck is failing due to lacking test coverage of the new code.
@mohanli-ml Let me know if it makes sense, if you think it might be too much at this moment, our team can take it as a follow up PR.
[gapic-generator-java-root] Kudos, SonarCloud Quality Gate passed! |
[java_showcase_integration_tests] SonarCloud Quality Gate failed. 0 Bugs 26.3% Coverage Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
@blakeli0 Hi Blake, I just realized that this PR could be misleading. For example, a customer's application may contains a GCS client and a Bigquery client. The customer use cc @frankyn |
Warnings started being emitted in recent version of java-storage w.r.t "incorrect" GCE credentials; however requests are still using GCE ADC, was there a change recently to credentials that could trigger this incorrectly? |
@mohanli-ml I see, the problem is that @frankyn Thanks for reporting this issue. #2281 might caused it, looking into it. |
We met issues where customers wanted to use DirectPath, but DirectPath was misconfigured and it took 2 weeks for debugging. So we want to add WARNING logs if DirectPath flag/option is set but DirectPath is not used.
Issue: #2164.
@blakeli0 @frankyn