Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kuscia 创建无实际计算节点的Domain执行作业显示等待审批,但是另一个参与方无法看到作业也无法根据id审批 #603

Open
Spark-Liang opened this issue Mar 5, 2025 · 10 comments

Comments

@Spark-Liang
Copy link

Issue Type

Api Usage

Search for existing issues similar to yours

Yes

Kuscia Version

0.14.0b0

Link to Relevant Documentation

https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.14.0b0/reference/concepts/domain_cn

Question Details

“创建无实际计算节点的Domain”的目的是为了对同一机构下的不同用户进行逻辑隔离。

测试流程:
1. 按照 https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.14.0b0/getting_started/quickstart_cn#x 文档部署“中心化 x 中心化组网模式”集群
2. 进入`root-kuscia-master-cxc-alice`容器手动创建新的domain模拟需要隔离资源的用户

apiVersion: kuscia.secretflow/v1alpha1
kind: Domain
metadata:
  name: alice-internal-domain
spec:
  resourceQuota:
    podMaxCount: 100

3. 进入`root-kuscia-master-cxc-alice`容器创建domaindata和给另一个参与方`bob`创建数据授权

apiVersion: kuscia.secretflow/v1alpha1
kind: DomainData
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kuscia.secretflow/v1alpha1","kind":"DomainData","metadata":{"annotations":{},"labels":{"kuscia.secretflow/domaindata-type":"table","kuscia.secretflow/domaindata-vendor":"manual","kuscia.secretflow/initiator":"alice-internal-domain","kuscia.secretflow/interconn-protocol-type":"kuscia"},"name":"alice-table","namespace":"alice-internal-domain"},"spec":{"attributes":{"description":"alice-internal-domain demo data"},"author":"alice-internal-domain","columns":[{"comment":"","name":"id1","type":"str"},{"comment":"","name":"age","type":"float"},{"comment":"","name":"education","type":"float"},{"comment":"","name":"default","type":"float"},{"comment":"","name":"balance","type":"float"},{"comment":"","name":"housing","type":"float"},{"comment":"","name":"loan","type":"float"},{"comment":"","name":"day","type":"float"},{"comment":"","name":"duration","type":"float"},{"comment":"","name":"campaign","type":"float"},{"comment":"","name":"pdays","type":"float"},{"comment":"","name":"previous","type":"float"},{"comment":"","name":"job_blue-collar","type":"float"},{"comment":"","name":"job_entrepreneur","type":"float"},{"comment":"","name":"job_housemaid","type":"float"},{"comment":"","name":"job_management","type":"float"},{"comment":"","name":"job_retired","type":"float"},{"comment":"","name":"job_self-employed","type":"float"},{"comment":"","name":"job_services","type":"float"},{"comment":"","name":"job_student","type":"float"},{"comment":"","name":"job_technician","type":"float"},{"comment":"","name":"job_unemployed","type":"float"},{"comment":"","name":"marital_divorced","type":"float"},{"comment":"","name":"marital_married","type":"float"},{"comment":"","name":"marital_single","type":"float"}],"dataSource":"default-data-source","name":"alice.csv","relativeURI":"alice.csv","type":"table","vendor":"manual"}}
  creationTimestamp: "2025-03-05T02:30:02Z"
  generation: 1
  labels:
    kuscia.secretflow/domaindata-type: table
    kuscia.secretflow/domaindata-vendor: manual
    kuscia.secretflow/initiator: alice-internal-domain
    kuscia.secretflow/interconn-protocol-type: kuscia
  name: alice-table
  namespace: alice-internal-domain
  resourceVersion: "115675"
  uid: 644218ff-f812-4399-8822-bd715397698f
spec:
  attributes:
    description: alice-internal-domain demo data
  author: alice-internal-domain
  columns:
  - comment: ""
    name: id1
    type: str
  - comment: ""
    name: age
    type: float
  - comment: ""
    name: education
    type: float
  - comment: ""
    name: default
    type: float
  - comment: ""
    name: balance
    type: float
  - comment: ""
    name: housing
    type: float
  - comment: ""
    name: loan
    type: float
  - comment: ""
    name: day
    type: float
  - comment: ""
    name: duration
    type: float
  - comment: ""
    name: campaign
    type: float
  - comment: ""
    name: pdays
    type: float
  - comment: ""
    name: previous
    type: float
  - comment: ""
    name: job_blue-collar
    type: float
  - comment: ""
    name: job_entrepreneur
    type: float
  - comment: ""
    name: job_housemaid
    type: float
  - comment: ""
    name: job_management
    type: float
  - comment: ""
    name: job_retired
    type: float
  - comment: ""
    name: job_self-employed
    type: float
  - comment: ""
    name: job_services
    type: float
  - comment: ""
    name: job_student
    type: float
  - comment: ""
    name: job_technician
    type: float
  - comment: ""
    name: job_unemployed
    type: float
  - comment: ""
    name: marital_divorced
    type: float
  - comment: ""
    name: marital_married
    type: float
  - comment: ""
    name: marital_single
    type: float
  dataSource: default-data-source
  name: alice.csv
  relativeURI: alice.csv
  type: table
  vendor: manual
---
apiVersion: kuscia.secretflow/v1alpha1
kind: DomainDataGrant
metadata:
  name: to-bob-for-alice-table
  namespace: alice-internal-domain
spec:
  author: alice-internal-domain
  domainDataID: alice-table
  grantDomain: bob

4. 进入`root-kuscia-master-cxc-bob`容器给`alice-internal-domain`授权访问`bob`的数据表

apiVersion: kuscia.secretflow/v1alpha1
kind: DomainDataGrant
metadata:
  name: to-alice-internal-domain-for-bob-table
  namespace: bob
spec:
  author: bob
  domainDataID: bob-table
  grantDomain: alice-internal-domain

5. 在`root-kuscia-master-cxc-alice`容器运行作业

apiVersion: kuscia.secretflow/v1alpha1
kind: KusciaJob
metadata:
  name: test-job-for-alice-internal-domain
  namespace: cross-domain
spec:
  initiator: alice-internal-domain
  scheduleMode: BestEffort
  maxParallelism: 2
  tasks:
    - taskID: job-psi
      alias: job-psi
      priority: 100
      taskInputConfig: '{"sf_datasource_config":{"alice-internal-domain":{"id":"default-data-source"},"bob":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice-internal-domain","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice-internal-domain","bob"],"config":"{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"},{"name":"heu","type":"heu","parties":["alice-internal-domain","bob"],"config":"{\"mode\": \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}"}],"ray_fed_config":{"cross_silo_comm_backend":"brpc_link"}},"sf_node_eval_param":{"domain":"data_prep","name":"psi","version":"0.0.5","attr_paths":["protocol","sort_result","allow_duplicate_keys","allow_duplicate_keys/yes/join_type","allow_duplicate_keys/yes/join_type/left_join/left_side","input/receiver_input/key","input/sender_input/key"],"attrs":[{"s":"PROTOCOL_RR22"},{"b":true},{"s":"yes"},{"s":"left_join"},{"ss":["alice-internal-domain"]},{"ss":["id1"]},{"ss":["id2"]}]},"sf_input_ids":["alice-table","bob-table"],"sf_output_ids":["psi-output"],"sf_output_uris":["psi-output.csv"]}'
      appImage: secretflow-image
      parties:
        - domainID: alice-internal-domain
        - domainID: bob
    - taskID: job-split
      alias: job-split
      priority: 100
      dependencies: ['job-psi']
      taskInputConfig: '{"sf_datasource_config":{"alice-internal-domain":{"id":"default-data-source"},"bob":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice-internal-domain","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice-internal-domain","bob"],"config":"{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"},{"name":"heu","type":"heu","parties":["alice-internal-domain","bob"],"config":"{\"mode\": \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}"}],"ray_fed_config":{"cross_silo_comm_backend":"brpc_link"}},"sf_node_eval_param":{"domain":"data_prep","name":"train_test_split","version":"0.0.1","attr_paths":["train_size","test_size","random_state","shuffle"],"attrs":[{"f":0.75},{"f":0.25},{"i64":1234},{"b":true}]},"sf_output_uris":["train-dataset.csv","test-dataset.csv"],"sf_output_ids":["train-dataset","test-dataset"],"sf_input_ids":["psi-output"]}'
      appImage: secretflow-image
      parties:
        - domainID: alice-internal-domain
        - domainID: bob


6. 进入`root-kuscia-master-cxc-alice`容器作业显示需要审批

apiVersion: kuscia.secretflow/v1alpha1
kind: KusciaJob
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kuscia.secretflow/v1alpha1","kind":"KusciaJob","metadata":{"annotations":{},"name":"test-job-for-alice-internal-domain","namespace":"cross-domain"},"spec":{"initiator":"alice-internal-domain","maxParallelism":2,"scheduleMode":"BestEffort","tasks":[{"alias":"job-psi","appImage":"secretflow-image","parties":[{"domainID":"alice-internal-domain"},{"domainID":"bob"}],"priority":100,"taskID":"job-psi","taskInputConfig":"{\"sf_datasource_config\":{\"alice-internal-domain\":{\"id\":\"default-data-source\"},\"bob\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice-internal-domain\",\"bob\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice-internal-domain\",\"bob\"],\"config\":\"{\\\"runtime_config\\\":{\\\"protocol\\\":\\\"REF2K\\\",\\\"field\\\":\\\"FM64\\\"},\\\"link_desc\\\":{\\\"connect_retry_times\\\":60,\\\"connect_retry_interval_ms\\\":1000,\\\"brpc_channel_protocol\\\":\\\"http\\\",\\\"brpc_channel_connection_type\\\":\\\"pooled\\\",\\\"recv_timeout_ms\\\":1200000,\\\"http_timeout_ms\\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice-internal-domain\",\"bob\"],\"config\":\"{\\\"mode\\\": \\\"PHEU\\\", \\\"schema\\\": \\\"paillier\\\", \\\"key_size\\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"data_prep\",\"name\":\"psi\",\"version\":\"0.0.5\",\"attr_paths\":[\"protocol\",\"sort_result\",\"allow_duplicate_keys\",\"allow_duplicate_keys/yes/join_type\",\"allow_duplicate_keys/yes/join_type/left_join/left_side\",\"input/receiver_input/key\",\"input/sender_input/key\"],\"attrs\":[{\"s\":\"PROTOCOL_RR22\"},{\"b\":true},{\"s\":\"yes\"},{\"s\":\"left_join\"},{\"ss\":[\"alice-internal-domain\"]},{\"ss\":[\"id1\"]},{\"ss\":[\"id2\"]}]},\"sf_input_ids\":[\"alice-table\",\"bob-table\"],\"sf_output_ids\":[\"psi-output\"],\"sf_output_uris\":[\"psi-output.csv\"]}"},{"alias":"job-split","appImage":"secretflow-image","dependencies":["job-psi"],"parties":[{"domainID":"alice-internal-domain"},{"domainID":"bob"}],"priority":100,"taskID":"job-split","taskInputConfig":"{\"sf_datasource_config\":{\"alice-internal-domain\":{\"id\":\"default-data-source\"},\"bob\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice-internal-domain\",\"bob\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice-internal-domain\",\"bob\"],\"config\":\"{\\\"runtime_config\\\":{\\\"protocol\\\":\\\"REF2K\\\",\\\"field\\\":\\\"FM64\\\"},\\\"link_desc\\\":{\\\"connect_retry_times\\\":60,\\\"connect_retry_interval_ms\\\":1000,\\\"brpc_channel_protocol\\\":\\\"http\\\",\\\"brpc_channel_connection_type\\\":\\\"pooled\\\",\\\"recv_timeout_ms\\\":1200000,\\\"http_timeout_ms\\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice-internal-domain\",\"bob\"],\"config\":\"{\\\"mode\\\": \\\"PHEU\\\", \\\"schema\\\": \\\"paillier\\\", \\\"key_size\\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"data_prep\",\"name\":\"train_test_split\",\"version\":\"0.0.1\",\"attr_paths\":[\"train_size\",\"test_size\",\"random_state\",\"shuffle\"],\"attrs\":[{\"f\":0.75},{\"f\":0.25},{\"i64\":1234},{\"b\":true}]},\"sf_output_uris\":[\"train-dataset.csv\",\"test-dataset.csv\"],\"sf_output_ids\":[\"train-dataset\",\"test-dataset\"],\"sf_input_ids\":[\"psi-output\"]}"}]}}
    kuscia.secretflow/initiator: alice-internal-domain
    kuscia.secretflow/interconn-kuscia-parties: bob
    kuscia.secretflow/interconn-self-parties: alice-internal-domain
    kuscia.secretflow/self-cluster-as-initiator: "true"
  creationTimestamp: "2025-03-05T03:00:52Z"
  generation: 1
  name: test-job-for-alice-internal-domain
  namespace: cross-domain
  resourceVersion: "119426"
  uid: 685b2d92-9c91-4439-bcd2-1255c6a4805e
spec:
  initiator: alice-internal-domain
  maxParallelism: 2
  scheduleMode: BestEffort
  tasks:
  - alias: job-psi
    appImage: secretflow-image
    parties:
    - domainID: alice-internal-domain
    - domainID: bob
    priority: 100
    taskID: job-psi
    taskInputConfig: '{"sf_datasource_config":{"alice-internal-domain":{"id":"default-data-source"},"bob":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice-internal-domain","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice-internal-domain","bob"],"config":"{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"},{"name":"heu","type":"heu","parties":["alice-internal-domain","bob"],"config":"{\"mode\":
      \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}"}],"ray_fed_config":{"cross_silo_comm_backend":"brpc_link"}},"sf_node_eval_param":{"domain":"data_prep","name":"psi","version":"0.0.5","attr_paths":["protocol","sort_result","allow_duplicate_keys","allow_duplicate_keys/yes/join_type","allow_duplicate_keys/yes/join_type/left_join/left_side","input/receiver_input/key","input/sender_input/key"],"attrs":[{"s":"PROTOCOL_RR22"},{"b":true},{"s":"yes"},{"s":"left_join"},{"ss":["alice-internal-domain"]},{"ss":["id1"]},{"ss":["id2"]}]},"sf_input_ids":["alice-table","bob-table"],"sf_output_ids":["psi-output"],"sf_output_uris":["psi-output.csv"]}'
    tolerable: false
  - alias: job-split
    appImage: secretflow-image
    dependencies:
    - job-psi
    parties:
    - domainID: alice-internal-domain
    - domainID: bob
    priority: 100
    taskID: job-split
    taskInputConfig: '{"sf_datasource_config":{"alice-internal-domain":{"id":"default-data-source"},"bob":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice-internal-domain","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice-internal-domain","bob"],"config":"{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"},{"name":"heu","type":"heu","parties":["alice-internal-domain","bob"],"config":"{\"mode\":
      \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}"}],"ray_fed_config":{"cross_silo_comm_backend":"brpc_link"}},"sf_node_eval_param":{"domain":"data_prep","name":"train_test_split","version":"0.0.1","attr_paths":["train_size","test_size","random_state","shuffle"],"attrs":[{"f":0.75},{"f":0.25},{"i64":1234},{"b":true}]},"sf_output_uris":["train-dataset.csv","test-dataset.csv"],"sf_output_ids":["train-dataset","test-dataset"],"sf_input_ids":["psi-output"]}'
    tolerable: false
status:
  approveStatus:
    alice-internal-domain: JobAccepted
  conditions:
  - lastTransitionTime: "2025-03-05T03:00:52Z"
    status: "True"
    type: JobValidated
  lastReconcileTime: "2025-03-05T03:00:52Z"
  phase: AwaitingApproval
  stageStatus:
    alice-internal-domain: JobCreateStageSucceeded
  startTime: "2025-03-05T03:00:52Z"

7. 但是在`root-kuscia-master-cxc-bob`容器无法查看到作业

bash-5.2# kubectl get kj -A
NAMESPACE      NAME                             STARTTIME   COMPLETIONTIME   LASTRECONCILETIME   PHASE
cross-domain   secretflow-task-20250304194503   15h         15h              15h                 Succeeded
cross-domain   secretflow-task-20250304194703   15h         15h              15h                 Succeeded
cross-domain   secretflow-task-20250304194804   15h         15h              15h                 Succeeded
cross-domain   secretflow-task-20250304194835   15h         15h              15h                 Succeeded

8. 也无法用id授权作业

bash-5.2# export CTR_CERTS_ROOT=/home/kuscia/var/certs
bash-5.2# curl -k -X POST 'https://localhost:8082/api/v1/job/query' \
>  --header "Token: $(cat ${CTR_CERTS_ROOT}/token)" \
>  --header 'Content-Type: application/json' \
>  --cert ${CTR_CERTS_ROOT}/kusciaapi-server.crt \
>  --key ${CTR_CERTS_ROOT}/kusciaapi-server.key \
>  --cacert ${CTR_CERTS_ROOT}/ca.crt \
>  -d '{
>   "job_id": "test-job-for-alice-internal-domain"
> }'
{"status":{"code":11202,"message":"kusciajobs.kuscia.secretflow \"test-job-for-alice-internal-domain\" not found","details":[]},"data":null}bash-5.2#
bash-5.2#
@gaoyonglong
Copy link

有没有测试过不开启审批,创建一个双方任务,看看是否可以在参入方中看到对应的kj?

@Spark-Liang
Copy link
Author

Spark-Liang commented Mar 5, 2025

有没有测试过不开启审批,创建一个双方任务,看看是否可以在参入方中看到对应的kj?

已经修改为所有节点都禁用了,还是需要审批

# grep enableWorkloadApprove */kuscia.yaml
root-kuscia-lite-cxc-alice/kuscia.yaml:enableWorkloadApprove: false
root-kuscia-lite-cxc-bob/kuscia.yaml:enableWorkloadApprove: false
root-kuscia-master-cxc-alice/kuscia.yaml:enableWorkloadApprove: false
root-kuscia-master-cxc-bob/kuscia.yaml:enableWorkloadApprove: false

还有一个发现是在root-kuscia-master-cxc-alice容器中能够看到两个同名的job,一个在cross-domain,另一个在 master-cxc-bob ,并且在 alice 这边的节点能够审批这个作业,但是审批不起作用:

bash-5.2# kubectl get kj -A
NAMESPACE        NAME                             STARTTIME   COMPLETIONTIME   LASTRECONCILETIME   PHASE
master-cxc-bob   secretflow-task-20250304194503
cross-domain     secretflow-task-20250304194503   18h         18h              18h                 Succeeded
master-cxc-bob   secretflow-task-20250304194703
cross-domain     secretflow-task-20250304194703   18h         18h              18h                 Succeeded
master-cxc-bob   secretflow-task-20250304194804
cross-domain     secretflow-task-20250304194804   18h         18h              18h                 Succeeded
master-cxc-bob   secretflow-task-20250304194835
cross-domain     secretflow-task-20250304194835   18h         18h              18h                 Succeeded
bash-5.2#
bash-5.2#
bash-5.2# kubectl apply -f test-job-for-alice-internal-domain.yaml
kusciajob.kuscia.secretflow/test-job-for-alice-internal-domain created
bash-5.2# kubectl get kj -A
NAMESPACE        NAME                                 STARTTIME   COMPLETIONTIME   LASTRECONCILETIME   PHASE
master-cxc-bob   secretflow-task-20250304194503
cross-domain     secretflow-task-20250304194503       18h         18h              18h                 Succeeded
master-cxc-bob   secretflow-task-20250304194703
cross-domain     secretflow-task-20250304194703       18h         18h              18h                 Succeeded
master-cxc-bob   secretflow-task-20250304194804
cross-domain     secretflow-task-20250304194804       18h         18h              18h                 Succeeded
master-cxc-bob   secretflow-task-20250304194835
cross-domain     secretflow-task-20250304194835       18h         18h              18h                 Succeeded
cross-domain     test-job-for-alice-internal-domain   2s                           2s                  AwaitingApproval
master-cxc-bob   test-job-for-alice-internal-domain
bash-5.2# export CTR_CERTS_ROOT=/home/kuscia/var/certs
bash-5.2# curl -k -X POST 'https://localhost:8082/api/v1/job/query' \
>  --header "Token: $(cat ${CTR_CERTS_ROOT}/token)" \
>  --header 'Content-Type: application/json' \
>  --cert ${CTR_CERTS_ROOT}/kusciaapi-server.crt \
>  --key ${CTR_CERTS_ROOT}/kusciaapi-server.key \
>  --cacert ${CTR_CERTS_ROOT}/ca.crt \
>  -d '{
>   "job_id": "test-job-for-alice-internal-domain"
> }'
{"status":{"code":0,"message":"success","details":[]},"data":{"job_id":"test-job-for-alice-internal-domain","initiator":"alice-internal-domain","max_parallelism":2,"tasks":[{"app_image":"secretflow-image","parties":[{"domain_id":"alice-internal-domain","role":"","resources":null,"bandwidth_limits":[]},{"domain_id":"bob","role":"","resources":null,"bandwidth_limits":[]}],"alias":"job-psi","task_id":"job-psi","dependencies":[],"task_input_config":"{\"sf_datasource_config\":{\"alice-internal-domain\":{\"id\":\"default-data-source\"},\"bob\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice-internal-domain\",\"bob\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice-internal-domain\",\"bob\"],\"config\":\"{\\\"runtime_config\\\":{\\\"protocol\\\":\\\"REF2K\\\",\\\"field\\\":\\\"FM64\\\"},\\\"link_desc\\\":{\\\"connect_retry_times\\\":60,\\\"connect_retry_interval_ms\\\":1000,\\\"brpc_channel_protocol\\\":\\\"http\\\",\\\"brpc_channel_connection_type\\\":\\\"pooled\\\",\\\"recv_timeout_ms\\\":1200000,\\\"http_timeout_ms\\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice-internal-domain\",\"bob\"],\"config\":\"{\\\"mode\\\": \\\"PHEU\\\", \\\"schema\\\": \\\"paillier\\\", \\\"key_size\\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"data_prep\",\"name\":\"psi\",\"version\":\"0.0.5\",\"attr_paths\":[\"protocol\",\"sort_result\",\"allow_duplicate_keys\",\"allow_duplicate_keys/yes/join_type\",\"allow_duplicate_keys/yes/join_type/left_join/left_side\",\"input/receiver_input/key\",\"input/sender_input/key\"],\"attrs\":[{\"s\":\"PROTOCOL_RR22\"},{\"b\":true},{\"s\":\"yes\"},{\"s\":\"left_join\"},{\"ss\":[\"alice-internal-domain\"]},{\"ss\":[\"id1\"]},{\"ss\":[\"id2\"]}]},\"sf_input_ids\":[\"alice-table\",\"bob-table\"],\"sf_output_ids\":[\"psi-output\"],\"sf_output_uris\":[\"psi-output.csv\"]}","priority":100,"schedule_config":null},{"app_image":"secretflow-image","parties":[{"domain_id":"alice-internal-domain","role":"","resources":null,"bandwidth_limits":[]},{"domain_id":"bob","role":"","resources":null,"bandwidth_limits":[]}],"alias":"job-split","task_id":"job-split","dependencies":["job-psi"],"task_input_config":"{\"sf_datasource_config\":{\"alice-internal-domain\":{\"id\":\"default-data-source\"},\"bob\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice-internal-domain\",\"bob\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice-internal-domain\",\"bob\"],\"config\":\"{\\\"runtime_config\\\":{\\\"protocol\\\":\\\"REF2K\\\",\\\"field\\\":\\\"FM64\\\"},\\\"link_desc\\\":{\\\"connect_retry_times\\\":60,\\\"connect_retry_interval_ms\\\":1000,\\\"brpc_channel_protocol\\\":\\\"http\\\",\\\"brpc_channel_connection_type\\\":\\\"pooled\\\",\\\"recv_timeout_ms\\\":1200000,\\\"http_timeout_ms\\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice-internal-domain\",\"bob\"],\"config\":\"{\\\"mode\\\": \\\"PHEU\\\", \\\"schema\\\": \\\"paillier\\\", \\\"key_size\\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"data_prep\",\"name\":\"train_test_split\",\"version\":\"0.0.1\",\"attr_paths\":[\"train_size\",\"test_size\",\"random_state\",\"shuffle\"],\"attrs\":[{\"f\":0.75},{\"f\":0.25},{\"i64\":1234},{\"b\":true}]},\"sf_output_uris\":[\"train-dataset.csv\",\"test-dataset.csv\"],\"sf_output_ids\":[\"train-dataset\",\"test-dataset\"],\"sf_input_ids\":[\"psi-output\"]}","priority":100,"schedule_config":null}],"status":{"state":"AwaitingApproval","err_msg":"","create_time":"2025-03-05T06:25:59Z","start_time":"2025-03-05T06:25:59Z","end_time":"","tasks":[{"task_id":"job-psi","state":"Pending","err_msg":"","create_time":"","start_time":"","end_time":"","parties":[],"alias":"job-psi","progress":0},{"task_id":"job-split","state":"Pending","err_msg":"","create_time":"","start_time":"","end_time":"","parties":[],"alias":"job-split","progress":0}],"stage_status_list":[{"domain_id":"alice-internal-domain","state":"JobCreateStageSucceeded"}],"approve_status_list":[{"domain_id":"alice-internal-domain","state":"JobAccepted"}]},"custom_fields":{}}}
bash-5.2# kubectl get kj -A
NAMESPACE        NAME                                 STARTTIME   COMPLETIONTIME   LASTRECONCILETIME   PHASE
master-cxc-bob   secretflow-task-20250304194503
cross-domain     secretflow-task-20250304194503       18h         18h              18h                 Succeeded
master-cxc-bob   secretflow-task-20250304194703
cross-domain     secretflow-task-20250304194703       18h         18h              18h                 Succeeded
master-cxc-bob   secretflow-task-20250304194804
cross-domain     secretflow-task-20250304194804       18h         18h              18h                 Succeeded
master-cxc-bob   secretflow-task-20250304194835
cross-domain     secretflow-task-20250304194835       18h         18h              18h                 Succeeded
cross-domain     test-job-for-alice-internal-domain   2m25s                        2m25s               AwaitingApproval
master-cxc-bob   test-job-for-alice-internal-domain

@gaoyonglong
Copy link

修改了审批禁用之后是否重启了kuscia?

@Spark-Liang
Copy link
Author

重启了,我这边直接整个重启了docker,怎么验证 kuscia 的配置是否实际生效呢?

修改了审批禁用之后是否重启了kuscia?

@gaoyonglong
Copy link

你上面创建的任务的参与方的 domainID 为啥是 bob 呢?你如果是使用我们文档中的 ./kuscia.sh cxc 创建的中心化×中心化网络模式的话,master-cxc-bob 节点的 domainID 应该是 master-cxc-bob。请确认 bob domainID是否存在。

@Spark-Liang
Copy link
Author

你上面创建的任务的参与方的 domainID 为啥是 bob 呢?你如果是使用我们文档中的 ./kuscia.sh cxc 创建的中心化×中心化网络模式的话,master-cxc-bob 节点的 domainID 应该是 master-cxc-bob。请确认 bob domainID是否存在。

我看跑成功的话也是bob作为domainID

bash-5.2# kubectl get kj -n cross-domain secretflow-task-20250304194804 -o yaml
apiVersion: kuscia.secretflow/v1alpha1
kind: KusciaJob
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kuscia.secretflow/v1alpha1","kind":"KusciaJob","metadata":{"annotations":{},"name":"secretflow-task-20250304194804","namespace":"cross-domain"},"spec":{"initiator":"alice","maxParallelism":2,"scheduleMode":"BestEffort","tasks":[{"alias":"single-psi","appImage":"secretflow-image","parties":[{"bandwidthLimits":[{"destinationID":"bob","limitKBps":100}],"domainID":"alice"},{"bandwidthLimits":[{"destinationID":"alice","limitKBps":100}],"domainID":"bob"}],"priority":100,"taskID":"secretflow-task-20250304194804-single-psi","taskInputConfig":"{\"sf_datasource_config\":{\"alice\":{\"id\":\"default-data-source\"},\"bob\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice\",\"bob\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice\",\"bob\"],\"config\":\"{\\\"runtime_config\\\":{\\\"protocol\\\":\\\"REF2K\\\",\\\"field\\\":\\\"FM64\\\"},\\\"link_desc\\\":{\\\"connect_retry_times\\\":60,\\\"connect_retry_interval_ms\\\":1000,\\\"brpc_channel_protocol\\\":\\\"http\\\",\\\"brpc_channel_connection_type\\\":\\\"pooled\\\",\\\"recv_timeout_ms\\\":1200000,\\\"http_timeout_ms\\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice\",\"bob\"],\"config\":\"{\\\"mode\\\": \\\"PHEU\\\", \\\"schema\\\": \\\"paillier\\\", \\\"key_size\\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"data_prep\",\"name\":\"psi\",\"version\":\"0.0.5\",\"attr_paths\":[\"protocol\",\"sort_result\",\"allow_duplicate_keys\",\"allow_duplicate_keys/yes/join_type\",\"allow_duplicate_keys/yes/join_type/left_join/left_side\",\"input/receiver_input/key\",\"input/sender_input/key\"],\"attrs\":[{\"s\":\"PROTOCOL_RR22\"},{\"b\":true},{\"s\":\"yes\"},{\"s\":\"left_join\"},{\"ss\":[\"alice\"]},{\"ss\":[\"id1\"]},{\"ss\":[\"id2\"]}]},\"sf_input_ids\":[\"alice-table\",\"bob-table\"],\"sf_output_ids\":[\"psi-output\"],\"sf_output_uris\":[\"psi-output.csv\"]}"}]}}
    kuscia.secretflow/initiator: alice
    kuscia.secretflow/interconn-kuscia-parties: bob
    kuscia.secretflow/interconn-self-parties: alice
    kuscia.secretflow/self-cluster-as-initiator: "true"
  creationTimestamp: "2025-03-04T11:48:04Z"
  generation: 1
  name: secretflow-task-20250304194804
  namespace: cross-domain
  resourceVersion: "9434"
  uid: fceef6b4-988c-48ec-a7e5-698b1e9cea7d
spec:
  initiator: alice
  maxParallelism: 2
  scheduleMode: BestEffort
  tasks:
  - alias: single-psi
    appImage: secretflow-image
    parties:
    - bandwidthLimits:
      - destinationID: bob
        limitKBps: 100
      domainID: alice
    - bandwidthLimits:
      - destinationID: alice
        limitKBps: 100
      domainID: bob
    priority: 100
    taskID: secretflow-task-20250304194804-single-psi
    taskInputConfig: '{"sf_datasource_config":{"alice":{"id":"default-data-source"},"bob":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice","bob"],"config":"{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"},{"name":"heu","type":"heu","parties":["alice","bob"],"config":"{\"mode\":
      \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}"}],"ray_fed_config":{"cross_silo_comm_backend":"brpc_link"}},"sf_node_eval_param":{"domain":"data_prep","name":"psi","version":"0.0.5","attr_paths":["protocol","sort_result","allow_duplicate_keys","allow_duplicate_keys/yes/join_type","allow_duplicate_keys/yes/join_type/left_join/left_side","input/receiver_input/key","input/sender_input/key"],"attrs":[{"s":"PROTOCOL_RR22"},{"b":true},{"s":"yes"},{"s":"left_join"},{"ss":["alice"]},{"ss":["id1"]},{"ss":["id2"]}]},"sf_input_ids":["alice-table","bob-table"],"sf_output_ids":["psi-output"],"sf_output_uris":["psi-output.csv"]}'
    tolerable: false
status:
  approveStatus:
    alice: JobAccepted
    bob: JobAccepted
  completionTime: "2025-03-04T11:48:31Z"
  conditions:
  - lastTransitionTime: "2025-03-04T11:48:04Z"
    status: "True"
    type: JobValidated
  lastReconcileTime: "2025-03-04T11:48:31Z"
  phase: Succeeded
  stageStatus:
    alice: JobCreateStageSucceeded
    bob: JobCreateStageSucceeded
  startTime: "2025-03-04T11:48:04Z"
  taskStatus:
    secretflow-task-20250304194804-single-psi: Succeeded

我试了下好像是少了 cdr,像这种没有节点的domain怎么配置 cdr 呢

kuscia diagnose network alice-internal-domain bob
Diagnose Config:
--Command: network
--Source: alice-internal-domain
--Destination: bob
--CRD:
--ReportFile:
--Manual: false
--TestSpeed: true, Threshold: 10
--TestRTT: true, Threshold: 50
--TestProxyTimeout: false, Threshold: 600
--TestProxyBuffer: true
--TestRequestBodySize: true, Threshold: 1
--BidrectionMode: true
diagnose <alice-internal-domain-bob> network statitsics
diagnose crd config
REPORT:
CRD CONFIG CHECK:
+---------------------------+------+--------+-----------------------------------------------+
|           NAME            | TYPE | RESULT |                  INFORMATION                  |
+---------------------------+------+--------+-----------------------------------------------+
| alice-internal-domain-bob | cdr  | [FAIL] | query cdr failed, code:11404,                 |
|                           |      |        | message:clusterdomainroutes.kuscia.secretflow |
|                           |      |        | "alice-internal-domain-bob" not found         |
| bob-alice-internal-domain | cdr  | [FAIL] | query cdr failed, code:11404,                 |
|                           |      |        | message:clusterdomainroutes.kuscia.secretflow |
|                           |      |        | "bob-alice-internal-domain" not found         |
+---------------------------+------+--------+-----------------------------------------------+

@gaoyonglong
Copy link

我上面看错了。domianID bob 是 lite 节点的。创建对应的 cdr 可参考:https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.14.0b0/reference/concepts/domainroute_cn

@Spark-Liang
Copy link
Author

Spark-Liang commented Mar 5, 2025

我上面看错了。domianID bob 是 lite 节点的。创建对应的 cdr 可参考:https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.14.0b0/reference/concepts/domainroute_cn

想问下 https://www.secretflow.org.cn/zh-CN/docs/kuscia/v0.11.0b0/reference/concepts/domain_cn#id2 中提到的“内部节点”是怎么配置的呢,有示例吗?因为我只想简单创建一个domain,但是不想创建实际的物理lite节点,请问这种场景是支持的吗?因为只想使用不同的domain管理不同的数据。

.spec.role:表示隐私计算节点 Domain 的角色,默认为 ""。支持两种取值:partner 和 ""。
partner:表示外部节点,用在点对点组网模式下的协作方节点。点对点组网模式下,需要在任务调度方的集群中创建协作方的 Domain,在创建该 Domain 时,需要将 role 的值设置为 partner 。
"":表示内部节点。

@gaoyonglong
Copy link

"内部节点" 就是集群内部的节点。如果对于 kuscia-master-cxc-alice 来说 alice 就是它的内部节点

@gaoyonglong
Copy link

目前是不支持 一个 domain 没有实际的物理 lite 节点哈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants