Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在docker容器中 部署secretflow-allinone可行吗? #487

Open
sabakioukenasai opened this issue Jan 6, 2025 · 6 comments
Open

在docker容器中 部署secretflow-allinone可行吗? #487

sabakioukenasai opened this issue Jan 6, 2025 · 6 comments
Assignees

Comments

@sabakioukenasai
Copy link

Issue Type

Others

Search for existing issues similar to yours

Yes

Kuscia Version

kuscia v0.10.0b0

Link to Relevant Documentation

https://www.secretflow.org.cn/zh-CN/docs/secretpad-all-in-one/v1.8.0b0/p2p_deploy/platform_installation_guidelines

Question Details

应用场景:有一台公用的Ubuntu22.04服务器,每个用户在该服务器中拥有一个docker容器(也是Ubuntu系统),通过ssh连接上自己的docker容器,并且在各自的docker中进行开发工作。

我希望在我的docker中部署secretflow-allinone环境,于是从https://www.secretflow.org.cn/zh-CN/docs/secretpad-all-in-one/v1.11.0b0/history_download下载了secretflow-allinone-v1.8.0的x86_64安装包,然后上传到我的docker容器并解压。

当我依据文档采用P2P模式部署安装时,在运行install.sh时产生报错
> bash install.sh autonomy -n alice -s 8080 -g 40803 -k 40802 -p 10080 -q 13081 -P mtls
KUSCIA_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia:0.10.0b0
SECRETFLOW_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/secretflow-lite-anolis8:1.8.0b0
k3s data already exists /root/kuscia/root-kuscia-autonomy-alice/k3s...
Whether to retain k3s data?(y/n): yes
ROOT=/root/szzd-proj/allinone
DOMAIN_ID=alice
DOMAIN_HOST_PORT=10080
DOMAIN_HOST_INTERNAL_PORT=13081
DOMAIN_DATA_DIR=/root/kuscia/autonomy/data/alice
DOMAIN_LOG_DIR=/root/kuscia/autonomy/alice
KUSCIA_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia:0.10.0b0
KUSCIAAPI_HTTP_PORT=40802
KUSCIAAPI_GRPC_PORT=40803
The container 'root-kuscia-autonomy-alice' already exists. Do you need to recreate it? [y/n]: yes
Remove container root-kuscia-autonomy-alice ...
root-kuscia-autonomy-alice
Starting container root-kuscia-autonomy-alice ...
root-kuscia-autonomy-alice-containerd
domain_hostname=root-kuscia-autonomy-alice-wwh
network=kuscia-exchange
442bcc84b5aa1daf00102619709ebe74b058b592228849b3e28fabe86821b0a3
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/root/szzd-proj/allinone/root-kuscia-autonomy-alice/kuscia.yaml" to rootfs at "/home/kuscia/etc/conf/kuscia.yaml": create mount destination for /home/kuscia/etc/conf/kuscia.yaml mount: cannot mkdir in /var/lib/docker/overlay2/5709cba50b20cfee9be0ee34af5c73a28a32d0b75c851dfba8d6a9ac5fd1d0b2/merged/home/kuscia/etc/conf/kuscia.yaml: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.

我不清楚这个报错是不是和我在docker容器内部署secretflow-allinone有关系,因为有人告诉我在docker里面部署集群可能会出现问题。
@Chrisdehe
Copy link
Member

@sabakioukenasai
排查一下哈:
1、docker版本是多少,我们推荐配置:

操作系统:MacOS,` CentOS7, CentOS8,Ubuntu 16.04 及以上版本,Windows(通过WSL2上的 Ubuntu)
资源:8core/16G Memory/200G Hard disk
Docker:推荐使用 20.10.24 或更高版本

2、检查下root权限,确保Docker进程有足够的权限在目标位置创建目录和文件。

@sabakioukenasai
Copy link
Author

@sabakioukenasai 排查一下哈: 1、docker版本是多少,我们推荐配置:

操作系统:MacOS,` CentOS7, CentOS8,Ubuntu 16.04 及以上版本,Windows(通过WSL2上的 Ubuntu)
资源:8core/16G Memory/200G Hard disk
Docker:推荐使用 20.10.24 或更高版本

2、检查下root权限,确保Docker进程有足够的权限在目标位置创建目录和文件。

抱歉之前没有提供操作系统等信息。
宿主机操作系统为 Ubuntu22.04,docker版本为

> docker version
Client: Docker Engine - Community
 Version:           27.4.1
 API version:       1.47
 Go version:        go1.22.10
 Git commit:        b9d17ea
 Built:             Tue Dec 17 15:45:52 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.3.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.7
  Git commit:       41ca978
  Built:            Fri Sep 20 11:41:00 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.22
  GitCommit:        7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc:
  Version:          1.1.14
  GitCommit:        v1.1.14-0-g2c9f560
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker容器使用的镜像是 Ubuntu 20.04

看样子应该不是docker版本问题,有可能是权限问题,能详细说说如何检查docker进程是否具有对应权限吗?

@zimu-yuxi
Copy link

确认下在部署kuscia的时候是否是root用户

@sabakioukenasai
Copy link
Author

确认下在部署kuscia的时候是否是root用户

image
是root用户

@zimu-yuxi
Copy link

方便提供下dmesg日志吗?

@sabakioukenasai
Copy link
Author

方便提供下dmesg日志吗?

运行bash install.sh autonomy -n alice -s 8080 -g 40803 -k 40802 -p 10080 -q 13081 -P mtls前后dmesg日志新增输出为:

[113431.472306] docker0: port 6(vethc1a3ad3) entered blocking state
[113431.472316] docker0: port 6(vethc1a3ad3) entered disabled state
[113431.473228] vethc1a3ad3: entered allmulticast mode
[113431.481522] vethc1a3ad3: entered promiscuous mode
[113431.682949] eth0: renamed from vethd3e15e1
[113431.726118] docker0: port 6(vethc1a3ad3) entered blocking state
[113431.726129] docker0: port 6(vethc1a3ad3) entered forwarding state
[113431.926488] userif-27: sent link down event.
[113431.926501] userif-27: sent link up event.
[113431.932428] docker0: port 6(vethc1a3ad3) entered disabled state
[113431.932845] vethd3e15e1: renamed from eth0
[113432.087576] docker0: port 6(vethc1a3ad3) entered disabled state
[113432.096617] vethc1a3ad3 (unregistering): left allmulticast mode
[113432.096627] vethc1a3ad3 (unregistering): left promiscuous mode
[113432.096636] docker0: port 6(vethc1a3ad3) entered disabled state
[113432.132668] userif-27: sent link down event.
[113432.132684] userif-27: sent link up event.
[113432.643903] docker0: port 6(veth8e465a5) entered blocking state
[113432.643911] docker0: port 6(veth8e465a5) entered disabled state
[113432.643926] veth8e465a5: entered allmulticast mode
[113432.646456] veth8e465a5: entered promiscuous mode
[113432.894819] eth0: renamed from veth44f5440
[113432.943968] docker0: port 6(veth8e465a5) entered blocking state
[113432.943980] docker0: port 6(veth8e465a5) entered forwarding state
[113433.144145] userif-27: sent link down event.
[113433.144163] userif-27: sent link up event.
[113434.195072] docker0: port 6(veth8e465a5) entered disabled state
[113434.195211] veth44f5440: renamed from eth0
[113434.292716] docker0: port 6(veth8e465a5) entered disabled state
[113434.298690] veth8e465a5 (unregistering): left allmulticast mode
[113434.298700] veth8e465a5 (unregistering): left promiscuous mode
[113434.298707] docker0: port 6(veth8e465a5) entered disabled state
[113434.395141] userif-27: sent link down event.
[113434.395155] userif-27: sent link up event.
[113440.247383] br-8f95630c1f15: port 1(veth2a34b2f) entered blocking state
[113440.247390] br-8f95630c1f15: port 1(veth2a34b2f) entered disabled state
[113440.247449] veth2a34b2f: entered allmulticast mode
[113440.248333] veth2a34b2f: entered promiscuous mode
[113441.462510] br-8f95630c1f15: port 1(veth2a34b2f) entered disabled state
[113441.464977] veth2a34b2f (unregistering): left allmulticast mode
[113441.464984] veth2a34b2f (unregistering): left promiscuous mode
[113441.464991] br-8f95630c1f15: port 1(veth2a34b2f) entered disabled state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants