-
Notifications
You must be signed in to change notification settings - Fork 417
[CELEBORN-1257] Adds a secured port in Celeborn Master for secure communication with LifecycleManager #2281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2281 +/- ##
==========================================
+ Coverage 47.84% 48.61% +0.77%
==========================================
Files 200 208 +8
Lines 12449 12825 +376
Branches 1088 1104 +16
==========================================
+ Hits 5955 6233 +278
- Misses 6102 6190 +88
- Partials 392 402 +10 ☔ View full report in Codecov by Sentry. |
client/src/main/scala/org/apache/celeborn/client/LifecycleManager.scala
Outdated
Show resolved
Hide resolved
| val internalPortEnabled = get(INTERNAL_PORT_ENABLED) | ||
| if (authEnabled && !internalPortEnabled) { | ||
| throw new IllegalArgumentException( | ||
| s"${AUTH_ENABLED.key} is true, but ${INTERNAL_PORT_ENABLED.key} is false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are all valid combinations of AUTH_ENABLED and INTERNAL_PORT_ENABLED?
- true, true
- false, false
- and others?
what if we eliminate INTERNAL_PORT_ENABLED and just respect AUTH_ENABLED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another valid combination is auth_enabled = false and internal_port_enabled =true.
Having Masters and workers communicate on a separate port is a distinct feature from authentication. In a prior discussion with @waitinfuture, they were considering adding a separate port for internal communication for different reasons. However, it's important to note that this separate internal port is a prerequisite for authentication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for explanation
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
Outdated
Show resolved
Hide resolved
master/src/main/scala/org/apache/celeborn/service/deploy/master/MasterArguments.scala
Outdated
Show resolved
Hide resolved
pan3793
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code makes sense to me, thanks
| /** | ||
| * Secured RPC endpoint used by the Master to communicate with the Clients. | ||
| */ | ||
| private[celeborn] class SecuredRpcEndpoint( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When auth is enabled, LifecycleManager should connect to this endpoint instead of the unsecured one, right? I guess it's needed to change how masterClient is created in LifecycleManager, will add in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I missed making that change because in our internal POC we didn't use a dedicated secure port. I'll add that change with this PR. Thanks for pointing it out.
| logDebug( | ||
| s"Received ApplicationLost request $requestId, $appId from ${context.senderAddress}.") | ||
| executeWithLeaderChecker(context, handleApplicationLost(context, appId, requestId)) | ||
| if (checkAuthStatus(appId, context)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ApplicationLost is triggered by both LifecycleManager (when unregister shuffle) and Master (when timeout dead applications). I guess we don't need to check for the later one? We can check whether senderAddress equals to rpcEnv's address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Merging to main(v0.5.0)
…munication with LifecycleManager ### What changes were proposed in this pull request? This adds a secured port to Celeborn Master which is used for secure communication with LifecycleManager. This is part of adding authentication support in Celeborn (see CELEBORN-1011). This change targets just adding the secured port to Master. The following items from the proposal are still pending: 1. Persisting the app secrets in Ratis. 2. Forwarding secrets to Workers and having ability for the workers to pull registration info from the Master. 3. Secured and internal port in Workers. 4. Secured communication between workers and clients. In addition, since we are supporting both secured and unsecured communication for backward compatibility and seamless rolling upgrades, there is an additional change needed. An app which registers with the Master can try to talk to the workers on unsecured ports which is a security breach. So, the workers need to know whether an app registered with Master or not and for that Master has to propagate list of un-secured apps to Celeborn workers as well. We can discuss this more with https://issues.apache.org/jira/browse/CELEBORN-1261 ### Why are the changes needed? It is needed for adding authentication support to Celeborn (CELEBORN-1011) ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Added a simple UT. Closes apache#2281 from otterc/CELEBORN-1257. Authored-by: Chandni Singh <singh.chandni@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
### What changes were proposed in this pull request? Fix MasterClient construct method use in MasterClientSuiteJ. ### Why are the changes needed? MasterClient's construct method has changed by #2281 on main. It's a feature to support authentication on branch-0.5. #2466 's backport on branch-0.4 here caused a conflict in MasterClientSuiteJ.java:319. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Local compile test. Closes #2475 from onebox-li/branch-0.4-fix-compile. Authored-by: onebox-li <lyh-36@163.com> Signed-off-by: SteNicholas <programgeek@163.com>
### What changes were proposed in this pull request? Fix MasterClient construct method use in MasterClientSuiteJ. ### Why are the changes needed? MasterClient's construct method has changed by apache#2281 on main. It's a feature to support authentication on branch-0.5. apache#2466 's backport on branch-0.4 here caused a conflict in MasterClientSuiteJ.java:319. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Local compile test. Closes apache#2475 from onebox-li/branch-0.4-fix-compile. Authored-by: onebox-li <lyh-36@163.com> Signed-off-by: SteNicholas <programgeek@163.com>
What changes were proposed in this pull request?
This adds a secured port to Celeborn Master which is used for secure communication with LifecycleManager.
This is part of adding authentication support in Celeborn (see CELEBORN-1011).
This change targets just adding the secured port to Master. The following items from the proposal are still pending:
In addition, since we are supporting both secured and unsecured communication for backward compatibility and seamless rolling upgrades, there is an additional change needed. An app which registers with the Master can try to talk to the workers on unsecured ports which is a security breach. So, the workers need to know whether an app registered with Master or not and for that Master has to propagate list of un-secured apps to Celeborn workers as well. We can discuss this more with https://issues.apache.org/jira/browse/CELEBORN-1261
Why are the changes needed?
It is needed for adding authentication support to Celeborn (CELEBORN-1011)
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Added a simple UT.