Skip to content

Conversation

@otterc
Copy link
Contributor

@otterc otterc commented Feb 2, 2024

What changes were proposed in this pull request?

This adds a secured port to Celeborn Master which is used for secure communication with LifecycleManager.
This is part of adding authentication support in Celeborn (see CELEBORN-1011).

This change targets just adding the secured port to Master. The following items from the proposal are still pending:

  1. Persisting the app secrets in Ratis.
  2. Forwarding secrets to Workers and having ability for the workers to pull registration info from the Master.
  3. Secured and internal port in Workers.
  4. Secured communication between workers and clients.

In addition, since we are supporting both secured and unsecured communication for backward compatibility and seamless rolling upgrades, there is an additional change needed. An app which registers with the Master can try to talk to the workers on unsecured ports which is a security breach. So, the workers need to know whether an app registered with Master or not and for that Master has to propagate list of un-secured apps to Celeborn workers as well. We can discuss this more with https://issues.apache.org/jira/browse/CELEBORN-1261

Why are the changes needed?

It is needed for adding authentication support to Celeborn (CELEBORN-1011)

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Added a simple UT.

@codecov
Copy link

codecov bot commented Feb 2, 2024

Codecov Report

Attention: 24 lines in your changes are missing coverage. Please review.

Comparison is base (277d060) 47.84% compared to head (fa0b02d) 48.61%.
Report is 3 commits behind head on main.

Files Patch % Lines
...cala/org/apache/celeborn/common/CelebornConf.scala 76.93% 14 Missing and 1 partial ⚠️
...rg/apache/celeborn/common/client/MasterClient.java 43.75% 7 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2281      +/-   ##
==========================================
+ Coverage   47.84%   48.61%   +0.77%     
==========================================
  Files         200      208       +8     
  Lines       12449    12825     +376     
  Branches     1088     1104      +16     
==========================================
+ Hits         5955     6233     +278     
- Misses       6102     6190      +88     
- Partials      392      402      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

val internalPortEnabled = get(INTERNAL_PORT_ENABLED)
if (authEnabled && !internalPortEnabled) {
throw new IllegalArgumentException(
s"${AUTH_ENABLED.key} is true, but ${INTERNAL_PORT_ENABLED.key} is false")
Copy link
Member

@pan3793 pan3793 Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are all valid combinations of AUTH_ENABLED and INTERNAL_PORT_ENABLED?

  • true, true
  • false, false
  • and others?

what if we eliminate INTERNAL_PORT_ENABLED and just respect AUTH_ENABLED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another valid combination is auth_enabled = false and internal_port_enabled =true.
Having Masters and workers communicate on a separate port is a distinct feature from authentication. In a prior discussion with @waitinfuture, they were considering adding a separate port for internal communication for different reasons. However, it's important to note that this separate internal port is a prerequisite for authentication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for explanation

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code makes sense to me, thanks

/**
* Secured RPC endpoint used by the Master to communicate with the Clients.
*/
private[celeborn] class SecuredRpcEndpoint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When auth is enabled, LifecycleManager should connect to this endpoint instead of the unsecured one, right? I guess it's needed to change how masterClient is created in LifecycleManager, will add in future?

Copy link
Contributor Author

@otterc otterc Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I missed making that change because in our internal POC we didn't use a dedicated secure port. I'll add that change with this PR. Thanks for pointing it out.

logDebug(
s"Received ApplicationLost request $requestId, $appId from ${context.senderAddress}.")
executeWithLeaderChecker(context, handleApplicationLost(context, appId, requestId))
if (checkAuthStatus(appId, context)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplicationLost is triggered by both LifecycleManager (when unregister shuffle) and Master (when timeout dead applications). I guess we don't need to check for the later one? We can check whether senderAddress equals to rpcEnv's address.

Copy link
Contributor

@waitinfuture waitinfuture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Merging to main(v0.5.0)

tiny-dust pushed a commit to tiny-dust/incubator-celeborn that referenced this pull request Feb 20, 2024
…munication with LifecycleManager

### What changes were proposed in this pull request?
This adds a secured port to Celeborn Master which is used for secure communication with LifecycleManager.
This is part of adding authentication support in Celeborn (see CELEBORN-1011).

This change targets just adding the secured port to Master. The following items from the proposal are still pending:
1. Persisting the app secrets in Ratis.
2. Forwarding secrets to Workers and having ability for the workers to pull registration info from the Master.
3. Secured and internal port in Workers.
4. Secured communication between workers and clients.

In addition, since we are supporting both secured and unsecured communication for backward compatibility and seamless rolling upgrades, there is an additional change needed. An app which registers with the Master can try to talk to the workers on unsecured ports which is a security breach. So, the workers need to know whether an app registered with Master or not and for that Master has to propagate list of un-secured apps to Celeborn workers as well. We can discuss this more with https://issues.apache.org/jira/browse/CELEBORN-1261

### Why are the changes needed?
It is needed for adding authentication support to Celeborn (CELEBORN-1011)

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added a simple UT.

Closes apache#2281 from otterc/CELEBORN-1257.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>
SteNicholas pushed a commit that referenced this pull request Apr 23, 2024
### What changes were proposed in this pull request?
Fix MasterClient construct method use in MasterClientSuiteJ.

### Why are the changes needed?
MasterClient's construct method has changed by #2281 on main. It's a feature to support authentication on branch-0.5.

#2466 's backport on branch-0.4 here caused a conflict in MasterClientSuiteJ.java:319.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Local compile test.

Closes #2475 from onebox-li/branch-0.4-fix-compile.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
cfmcgrady pushed a commit to cfmcgrady/incubator-celeborn that referenced this pull request Aug 21, 2025
### What changes were proposed in this pull request?
Fix MasterClient construct method use in MasterClientSuiteJ.

### Why are the changes needed?
MasterClient's construct method has changed by apache#2281 on main. It's a feature to support authentication on branch-0.5.

apache#2466 's backport on branch-0.4 here caused a conflict in MasterClientSuiteJ.java:319.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Local compile test.

Closes apache#2475 from onebox-li/branch-0.4-fix-compile.

Authored-by: onebox-li <lyh-36@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants