Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple CC Authorizer support CCManager #2396

Merged
merged 45 commits into from
Mar 19, 2024
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
14d6b69
WIP: tdx_cc integration.
yhwen Feb 21, 2024
bc6b21e
fixed toke_file read.
yhwen Feb 21, 2024
9f6b715
WIP: added info for CC add client tokens.:
yhwen Feb 21, 2024
dd281f5
Fixed an error when client does not have CC token reported.
yhwen Feb 21, 2024
4d87722
Added handle for client does not have CC_INFO.
yhwen Feb 21, 2024
8830069
Added CLIENT_QUIT event for CCManager to remove client token.
yhwen Feb 22, 2024
9fbdde3
Added _add_client_token client token logging info.
yhwen Feb 22, 2024
dd9fe74
Added peer_ctx for client quit.
yhwen Feb 22, 2024
2f44cef
set_peer_context for client quit.
yhwen Feb 22, 2024
0f87eb3
Changed the AUTHORIZATION_REASON set_prop sticky to False.
yhwen Feb 26, 2024
6ec9ae6
WIP: TokenPundit interface change.
yhwen Feb 26, 2024
044201a
WIP: added cc_authorizer_ids config.
yhwen Feb 26, 2024
3c593c0
Added cc_issuer_id for CCManager.
yhwen Feb 27, 2024
7f74d68
renamed the TokenPundit to CCAutorizer.
yhwen Feb 28, 2024
ff55554
Added CC token adding through client heartbeat.
yhwen Feb 29, 2024
2c5f3bd
Added function to stop current running job if CC verify fail.
yhwen Feb 29, 2024
2f383d6
if CC failed to get toke, don't allow the system to start.
yhwen Feb 29, 2024
ea5ae61
Added exceptions None check.
yhwen Feb 29, 2024
13e0b6b
Address the client side CC check before job scheduled.
yhwen Mar 1, 2024
68d8d91
fixed the PEER_FL_CONTEXT error.
yhwen Mar 1, 2024
b9942e3
Added CCManager support to have multiple cc_issuers.
yhwen Mar 2, 2024
91e3f40
optimized CCManager.
yhwen Mar 4, 2024
6206452
updated the _verify_participants() logic.
yhwen Mar 4, 2024
51226d6
set up the proper fl_ctx for admin send_requests().
yhwen Mar 4, 2024
ed770ac
Add proper fl_ctx.
yhwen Mar 4, 2024
f76fa1e
Refactor the CCManager.
yhwen Mar 5, 2024
74a6059
Refactor the CCManager and TDX_authorizer.
yhwen Mar 5, 2024
313ed21
Added TOKEN_EXPIRATION for each cc_issue in CCManager.
yhwen Mar 6, 2024
40d70c6
Fixed CC TOKEN_EXPIRATION error.
yhwen Mar 6, 2024
0c3c188
refactor the CCManager _prepare_cc_info()
yhwen Mar 6, 2024
affda4b
Refactor.
yhwen Mar 6, 2024
2dc7df3
refactor the cc tokens periodic verification.
yhwen Mar 7, 2024
6934884
added critical_level for CCManager.
yhwen Mar 7, 2024
199eb1e
codestyle fix.
yhwen Mar 8, 2024
0936733
removed no used import.
yhwen Mar 8, 2024
0b28480
removed no use import.
yhwen Mar 8, 2024
000735c
Fixed the unitest.
yhwen Mar 8, 2024
95edf8e
Added CCManager unit tests.
yhwen Mar 12, 2024
ecaa7c6
Added CCTokenGenerateError and CCTokenVerifyError. Updated CCAuthoriz…
yhwen Mar 12, 2024
2b05931
Merge branch 'main' into tdx_cc
yhwen Mar 13, 2024
60a5c9d
Addressed some PR reviews.
yhwen Mar 14, 2024
9244fbf
Added exception catch for TDXAuthorizer.
yhwen Mar 15, 2024
dbc87d1
renamed some events.
yhwen Mar 18, 2024
78b52e6
renamed event names.
yhwen Mar 19, 2024
108a1ec
renamed event names.
yhwen Mar 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions nvflare/apis/event_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,10 @@ class EventType(object):
BEFORE_CLIENT_REGISTER = "_before_client_register"
AFTER_CLIENT_REGISTER = "_after_client_register"
CLIENT_REGISTERED = "_client_registered"
CLIENT_QUIT = "_client_quit"
SYSTEM_BOOTSTRAP = "_system_bootstrap"
BEFORE_CLIENT_HEARTBEAT = "_before_client_heartbeat"
AFTER_CLIENT_HEARTBEAT = "_after_client_heartbeat"

AUTHORIZE_COMMAND_CHECK = "_authorize_command_check"
BEFORE_BUILD_COMPONENT = "_before_build_component"
1 change: 1 addition & 0 deletions nvflare/apis/fl_constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ class FLContextKey(object):
COMMUNICATION_ERROR = "Flare_communication_error__"
UNAUTHENTICATED = "Flare_unauthenticated__"
CLIENT_RESOURCE_SPECS = "__client_resource_specs"
CLIENT_RESOURCE_RESULT = "__client_resource_result"
JOB_PARTICIPANTS = "__job_participants"
JOB_BLOCK_REASON = "__job_block_reason" # why the job should be blocked from scheduling
SSID = "__ssid__"
Expand Down
6 changes: 4 additions & 2 deletions nvflare/apis/server_engine_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,12 +154,13 @@ def restore_components(self, snapshot: RunSnapshot, fl_ctx: FLContext):
pass

@abstractmethod
def start_client_job(self, job_id, client_sites):
def start_client_job(self, job_id, client_sites, fl_ctx: FLContext):
"""To send the start client run commands to the clients

Args:
client_sites: client sites
job_id: job_id
fl_ctx: FLContext

Returns:

Expand Down Expand Up @@ -187,14 +188,15 @@ def check_client_resources(

@abstractmethod
def cancel_client_resources(
self, resource_check_results: Dict[str, Tuple[bool, str]], resource_reqs: Dict[str, dict]
self, resource_check_results: Dict[str, Tuple[bool, str]], resource_reqs: Dict[str, dict], fl_ctx: FLContext
):
"""Cancels the request resources for the job.

Args:
resource_check_results: A dict of {client_name: client_check_result}
where client_check_result is a tuple of (is_resource_enough, resource reserve token if any)
resource_reqs: A dict of {client_name: resource requirements dict}
fl_ctx: FLContext
"""
pass

Expand Down
6 changes: 5 additions & 1 deletion nvflare/app_common/job_schedulers/job_scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def _cancel_resources(
if not isinstance(engine, ServerEngineSpec):
raise RuntimeError(f"engine inside fl_ctx should be of type ServerEngineSpec, but got {type(engine)}.")

engine.cancel_client_resources(resource_check_results, resource_reqs)
engine.cancel_client_resources(resource_check_results, resource_reqs, fl_ctx)
self.log_debug(fl_ctx, f"cancel client resources using check results: {resource_check_results}")
return False, None

Expand Down Expand Up @@ -164,7 +164,11 @@ def _try_job(self, job: Job, fl_ctx: FLContext) -> (int, Optional[Dict[str, Disp
self.log_info(fl_ctx, f"Job {job.job_id} can't be scheduled: {block_reason}")
return SCHEDULE_RESULT_NO_RESOURCE, None, block_reason

PEER_CTX_CC_TOKEN = "_peer_ctx_cc_token"
yhwen marked this conversation as resolved.
Show resolved Hide resolved
cc_peer_ctx = fl_ctx.get_prop(key=PEER_CTX_CC_TOKEN)
yhwen marked this conversation as resolved.
Show resolved Hide resolved
self.logger.info(f"++++++++++ {cc_peer_ctx}")
yhwen marked this conversation as resolved.
Show resolved Hide resolved
resource_check_results = self._check_client_resources(job=job, resource_reqs=resource_reqs, fl_ctx=fl_ctx)
fl_ctx.set_prop(FLContextKey.CLIENT_RESOURCE_RESULT, resource_check_results, private=True, sticky=False)
yhwen marked this conversation as resolved.
Show resolved Hide resolved
self.fire_event(EventType.AFTER_CHECK_CLIENT_RESOURCES, fl_ctx)

if not resource_check_results:
Expand Down
59 changes: 59 additions & 0 deletions nvflare/app_opt/confidential_computing/cc_authorizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# import os.path


class CCAuthorizer:
def can_generate(self) -> bool:
yhwen marked this conversation as resolved.
Show resolved Hide resolved
"""This indicates if the authorizer can generate a CC token or not.

Returns: bool

"""
pass

def can_verify(self) -> bool:
yhwen marked this conversation as resolved.
Show resolved Hide resolved
"""This indicates if the authorizer can verify a CC token or not.

Returns: bool

"""
pass

def get_namespace(self) -> str:
"""This returns the namespace of the CCAuthorizer.

Returns: namespace string

"""
pass

def generate(self) -> str:
yhwen marked this conversation as resolved.
Show resolved Hide resolved
"""To generate and return the active CCAuthorizer token.

Returns: token string

"""
pass

def verify(self, token: str) -> bool:
yhwen marked this conversation as resolved.
Show resolved Hide resolved
"""To return the token verification result.

Args:
token: bool

Returns:

"""
pass
Loading
Loading