Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

[Feedback v0.14.0] how to use Team wise storage plugin via nfs ? #4001

Closed
apri30th opened this issue Dec 10, 2019 · 24 comments
Closed

[Feedback v0.14.0] how to use Team wise storage plugin via nfs ? #4001

apri30th opened this issue Dec 10, 2019 · 24 comments
Assignees

Comments

@apri30th
Copy link

i read the Team wise storage doc and do this as fallow

python storagectl.py server set nfsserver nfs 172.18.67.7 /data/nfs_data
python storagectl.py config set confignfs paigroup -s nfsserver -m /data nfsserver nfs_data -m /user nfsserver 'users/${PAI_USER_NAME}' -d

python storagectl.py groupsc add paigroup confignfs(this command fail and the log is storagectl.py: error: invalid choice: 'groupsc' (choose from 'server', 'config', 'user'))


how can i use nfs plugin ? any suggestions?

@Binyang2014
Copy link
Contributor

Which version of PAI do you use. If you use 0.14 release, please refer to doc: https://github.com/microsoft/pai/tree/pai-0.14.y/contrib/storage_plugin

This doc has been updated in master branch

@shiyemin
Copy link

Which version of PAI do you use. If you use 0.14 release, please refer to doc: https://github.com/microsoft/pai/tree/pai-0.14.y/contrib/storage_plugin

This doc has been updated in master branch

How about 0.16? I have tried for few days without any progress...

@Binyang2014
Copy link
Contributor

@shiyemin you can try according to this doc: https://github.com/microsoft/pai/tree/master/contrib/storage_plugin, and check secret in pai-storage namespace to make sure it take effect
image

@shiyemin
Copy link

@shiyemin you can try according to this doc: https://github.com/microsoft/pai/tree/master/contrib/storage_plugin, and check secret in pai-storage namespace to make sure it take effect
image

Do you assume the groups are managered by winbind or AAD? I can not find a way to create group. If the storage is configured for 'default' or 'admingroup' groups, then the rest-server will fail.

@shiyemin
Copy link

After create group and add user to the group manually, the NFS works now.
Though, i still do not understand if there is a way to create and manage group by PAI API.

@Binyang2014
Copy link
Contributor

@mzmssg @ydye Take a look when you have time?

@mzmssg
Copy link
Member

mzmssg commented Dec 12, 2019

@shiyemin, @Binyang2014

  1. We have user/group API, but have not exposed the interface on webportal

  2. In most cases, especially in "basic" mode, admin shouldn't touch group API, it's an "internal" storage that PAI take the responsibility for maintenance.
    In your case, default and admingroup groups should be created automatically. Please provide the restserver bootup log for further debugging.

@apri30th
Copy link
Author

apri30th commented Dec 13, 2019

After create group and add user to the group manually, the NFS works now.
Though, i still do not understand if there is a way to create and manage group by PAI API.

my openpai version = 0.14.0
@shiyemin
how did you do to make nfs works? i did as follows
1 .python storagectl.py server set nfsserver nfs 172.18.67.7 /data/nfs/nfs_data
2. python storagectl.py config set confignfs paigroup -s nfsserver -m /data nfsserver nfs_data -d
3. python storagectl.py user set root nfsserver

but when i use root to submit a job the data show nothing .

image

@shiyemin
Copy link

After create group and add user to the group manually, the NFS works now.
Though, i still do not understand if there is a way to create and manage group by PAI API.

my openpai version = 0.14.0
@shiyemin
how did you do to make nfs works? i did as follows
1 .python storagectl.py server set nfsserver nfs 172.18.67.7 /data/nfs/nfs_data
2. python storagectl.py config set confignfs paigroup -s nfsserver -m /data nfsserver nfs_data -d
3. python storagectl.py user set root nfsserver

but when i use root to submit a job the data show nothing .

image

I wrote a small program to manipulate group and grouplist of user. Then "python storagectl.py groupsc set" will work.

@apri30th
Copy link
Author

Which version of PAI do you use. If you use 0.14 release, please refer to doc: https://github.com/microsoft/pai/tree/pai-0.14.y/contrib/storage_plugin

This doc has been updated in master branch

@Binyang2014 my pai version is 0.14.0 and i read https://github.com/microsoft/pai/tree/pai-0.14.y/contrib/storage_plugin doc . but it still show nothing as above. how could i do? is there somthing i miss? via nfs i should create user first ?

@apri30th
Copy link
Author

After create group and add user to the group manually, the NFS works now.
Though, i still do not understand if there is a way to create and manage group by PAI API.

my openpai version = 0.14.0
@shiyemin
how did you do to make nfs works? i did as follows
1 .python storagectl.py server set nfsserver nfs 172.18.67.7 /data/nfs/nfs_data
2. python storagectl.py config set confignfs paigroup -s nfsserver -m /data nfsserver nfs_data -d
3. python storagectl.py user set root nfsserver
but when i use root to submit a job the data show nothing .
image

I wrote a small program to manipulate group and grouplist of user. Then "python storagectl.py groupsc set" will work.

@shiyemin thx u , but in v0.14.0 storagectl.py couldn't use groupsc as param

@shiyemin
Copy link

After create group and add user to the group manually, the NFS works now.
Though, i still do not understand if there is a way to create and manage group by PAI API.

my openpai version = 0.14.0
@shiyemin
how did you do to make nfs works? i did as follows
1 .python storagectl.py server set nfsserver nfs 172.18.67.7 /data/nfs/nfs_data
2. python storagectl.py config set confignfs paigroup -s nfsserver -m /data nfsserver nfs_data -d
3. python storagectl.py user set root nfsserver
but when i use root to submit a job the data show nothing .
image

I wrote a small program to manipulate group and grouplist of user. Then "python storagectl.py groupsc set" will work.

@shiyemin thx u , but in v0.14.0 storagectl.py couldn't use groupsc as param

In 0.14, you will have to use "default" as config name.

@apri30th
Copy link
Author

apri30th commented Dec 17, 2019

Which version of PAI do you use. If you use 0.14 release, please refer to doc: https://github.com/microsoft/pai/tree/pai-0.14.y/contrib/storage_plugin

This doc has been updated in master branch

@Binyang2014
my pai version is 0.14 release and i read the doc you mentioned .
i do it as the follow comands
1.python storagectl.py server set nfsserver nfs 172.18.67.7 /data/nfs/nfs_data
2.python storagectl.py config set default paigroup -s nfsserver -m /data nfsserver nfs_data -d
3. python storagectl.py user set root nfsserver
but when i use root to submit a job the data show nothing
image

the config already show in k8s config
image
------ any suggestions? is there something i miss?

@Binyang2014
Copy link
Contributor

@wangdian Can you take a look?

@Binyang2014
Copy link
Contributor

@apri30th Do you know which group the root user belongs to?
Add may be you should run:
python storagectl.py config set config default -s nfsserver -m /data nfsserver nfs_data -d
Make group name as default not paigroup

@apri30th
Copy link
Author

@apri30th Do you know which group the root user belongs to?
Add may be you should run:
python storagectl.py config set config default -s nfsserver -m /data nfsserver nfs_data -d
Make group name as default not paigroup

@Binyang2014 thank you for your reply , i can see the data now. the default group name is 'default' and the root user belongs to default group

@zeng-hello-world
Copy link

Hi, @Binyang2014
When should Team-wise plugin server run, before k8s boot or OpenPAI service start?

@Binyang2014
Copy link
Contributor

Hi @nan0755 The team-wise plugin should run after PAI service start

@zeng-hello-world
Copy link

zeng-hello-world commented Jan 9, 2020

@Binyang2014
I got this ERROR in the job Stderr window while submitting job:

debconf: delaying package configuration, since apt-utils is not installed
mount.nfs4: Protocol not supported

It seems my NFS server has some version incompatible problem with PAI's mount command.
Do you have any suggestions?

Thanks!

@Binyang2014
Copy link
Contributor

@zeyu-hello Can you show me your job config? And full logs include stdout and stderr?

@zeng-hello-world
Copy link

zeng-hello-world commented Jan 10, 2020

@Binyang2014

1. Team-wise job config

(note that the data_folder lies in /volume)

 python storagectl.py server set nfsserver nfs 10.10.30.90 /volume
 python storagectl.py config set confignfs default -s nfsserver -m /data nfsserver data_folder -d

2. Stdout log

python-crypto_2.6.1-6ubuntu0.16.04.3_amd64.deb ...
Unpacking python-crypto (2.6.1-6ubuntu0.16.04.3) ...
Selecting previously unselected package python-ldb.
Preparing to unpack .../python-ldb_2%3a1.1.24-1ubuntu3.1_amd64.deb ...
Unpacking python-ldb (2:1.1.24-1ubuntu3.1) ...
Selecting previously unselected package python-tdb.
Preparing to unpack .../python-tdb_1.3.8-2_amd64.deb ...
Unpacking python-tdb (1.3.8-2) ...
Selecting previously unselected package python-talloc.
Preparing to unpack .../python-talloc_2.1.5-2_amd64.deb ...
Unpacking python-talloc (2.1.5-2) ...
Selecting previously unselected package samba-libs:amd64.
Preparing to unpack .../samba-libs_2%3a4.3.11+dfsg-0ubuntu0.16.04.24_amd64.deb ...
Unpacking samba-libs:amd64 (2:4.3.11+dfsg-0ubuntu0.16.04.24) ...
Selecting previously unselected package python-samba.
Preparing to unpack .../python-samba_2%3a4.3.11+dfsg-0ubuntu0.16.04.24_amd64.deb ...
Unpacking python-samba (2:4.3.11+dfsg-0ubuntu0.16.04.24) ...
Selecting previously unselected package samba-common-bin.
Preparing to unpack .../samba-common-bin_2%3a4.3.11+dfsg-0ubuntu0.16.04.24_amd64.deb ...
Unpacking samba-common-bin (2:4.3.11+dfsg-0ubuntu0.16.04.24) ...
Selecting previously unselected package sshpass.
Preparing to unpack .../sshpass_1.05-1_amd64.deb ...
Unpacking sshpass (1.05-1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Processing triggers for systemd (229-4ubuntu21.22) ...
Setting up libpopt0:amd64 (1.16-10) ...
Setting up libnfsidmap2:amd64 (0.25-5) ...
Setting up libwbclient0:amd64 (2:4.3.11+dfsg-0ubuntu0.16.04.24) ...
Setting up samba-common (2:4.3.11+dfsg-0ubuntu0.16.04.24) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline

Creating config file /etc/samba/smb.conf with new version
Setting up libtalloc2:amd64 (2.1.5-2) ...
Setting up cifs-utils (2:6.4-1ubuntu1.1) ...
Setting up keyutils (1.5.9-8ubuntu1) ...
Setting up libevent-2.0-5:amd64 (2.0.21-stable-2ubuntu0.16.04.1) ...
Setting up libtdb1:amd64 (1.3.8-2) ...
Setting up libtevent0:amd64 (0.9.28-0ubuntu0.16.04.1) ...
Setting up libldb1:amd64 (2:1.1.24-1ubuntu3.1) ...
Setting up libtirpc1:amd64 (0.2.5-1ubuntu0.1) ...
Setting up rpcbind (0.2.3-0.2) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Setting up nfs-common (1:1.2.8-9ubuntu12.2) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline

Creating config file /etc/idmapd.conf with new version
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline

Creating config file /etc/default/nfs-common with new version
Adding system user `statd' (UID 107) ...
Adding new user `statd' (UID 107) with group `nogroup' ...
Not creating home directory `/var/lib/nfs'.
invoke-rc.d: unknown initscript, /etc/init.d/gssd not found.
invoke-rc.d: could not determine current runlevel
invoke-rc.d: unknown initscript, /etc/init.d/idmapd not found.
invoke-rc.d: could not determine current runlevel
Setting up python-crypto (2.6.1-6ubuntu0.16.04.3) ...
Setting up python-ldb (2:1.1.24-1ubuntu3.1) ...
Setting up python-tdb (1.3.8-2) ...
Setting up python-talloc (2.1.5-2) ...
Setting up samba-libs:amd64 (2:4.3.11+dfsg-0ubuntu0.16.04.24) ...
Setting up python-samba (2:4.3.11+dfsg-0ubuntu0.16.04.24) ...
Setting up samba-common-bin (2:4.3.11+dfsg-0ubuntu0.16.04.24) ...
Setting up sshpass (1.05-1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
Processing triggers for systemd (229-4ubuntu21.22) ...

3. Stderr log

debconf: delaying package configuration, since apt-utils is not installed
mount.nfs4: Protocol not supported

@Binyang2014
Copy link
Contributor

Which image do you use for your job. Can you provide your job config which can be found at webportal
image

@zeng-hello-world
Copy link

zeng-hello-world commented Jan 10, 2020

Hi, @Binyang2014 , thanks for your reply.
I realized my own build docker may cause this problem, so I trrid another test using the default docker in docker lists instead. However, here is another type of ERROR, and I think also caused by the mount version issue.

1. docker image in job config

prerequisites:
  - type: dockerimage
    uri: 'ufoym/deepo:pytorch-py36-cpu'
    name: docker_image_0

2. Stdout log

line 76.)
debconf: falling back to frontend: Readline
Setting up libcap2:amd64 (1:2.25-1.2) ...
Setting up libjansson4:amd64 (2.11-1) ...
Setting up keyutils (1.5.9-9.2ubuntu2) ...
Setting up libdevmapper1.02.1:amd64 (2:1.02.145-4.1ubuntu3.18.04.2) ...
Setting up libbsd0:amd64 (0.8.7-1) ...
Setting up ucf (3.0038) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline
Setting up sshpass (1.06-1) ...
Setting up libtirpc1:amd64 (0.2.5-1.2ubuntu0.1) ...
Setting up dmsetup (2:1.02.145-4.1ubuntu3.18.04.2) ...
Setting up libtalloc2:amd64 (2.1.10-2ubuntu1) ...
Setting up libxdmcp6:amd64 (1:1.1.2-3) ...
Setting up openssh-client (1:7.6p1-4ubuntu0.3) ...
Setting up libx11-data (2:1.6.4-3ubuntu0.2) ...
Setting up libpython2.7-stdlib:amd64 (2.7.17-1~18.04) ...
Setting up libxau6:amd64 (1:1.0.8-1) ...
Setting up libwrap0:amd64 (7.6.q-27) ...
Setting up rpcbind (0.2.3-0.6) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Setting up libavahi-common-data:amd64 (0.7-3.1ubuntu1.2) ...
Setting up libwbclient0:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.14) ...
Setting up nfs-common (1:1.3.4-2.1ubuntu5.2) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline

Creating config file /etc/idmapd.conf with new version
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline
Adding system user `statd' (UID 101) ...
Adding new user `statd' (UID 101) with group `nogroup' ...
Not creating home directory `/var/lib/nfs'.
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Setting up python2.7 (2.7.17-1~18.04) ...
Setting up samba-common (2:4.7.6+dfsg~ubuntu-0ubuntu2.14) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline

Creating config file /etc/samba/smb.conf with new version
Setting up libpython-stdlib:amd64 (2.7.15~rc1-1) ...
Setting up libtevent0:amd64 (0.9.34-1) ...
Setting up libpython2.7:amd64 (2.7.17-1~18.04) ...
Setting up libavahi-common3:amd64 (0.7-3.1ubuntu1.2) ...
Setting up libxcb1:amd64 (1.13-2~ubuntu18.04) ...
Setting up python (2.7.15~rc1-1) ...
Setting up python-talloc (2.1.10-2ubuntu1) ...
Setting up cifs-utils (2:6.8-1) ...
update-alternatives: using /usr/lib/x86_64-linux-gnu/cifs-utils/idmapwb.so to provide /etc/cifs-utils/idmap-plugin (idmap-plugin) in auto mode
update-alternatives: warning: skip creation of /usr/share/man/man8/idmap-plugin.8.gz because associated file /usr/share/man/man8/idmapwb.8.gz (of link group idmap-plugin) doesn't exist
Setting up python-crypto (2.6.1-8ubuntu2) ...
Setting up python-tdb (1.3.15-2) ...
Setting up libldb1:amd64 (2:1.2.3-1ubuntu0.1) ...
Setting up libx11-6:amd64 (2:1.6.4-3ubuntu0.2) ...
Setting up libxmuu1:amd64 (2:1.1.2-2) ...
Setting up libavahi-client3:amd64 (0.7-3.1ubuntu1.2) ...
Setting up libcups2:amd64 (2.2.7-1ubuntu2.7) ...
Setting up python-ldb:amd64 (2:1.2.3-1ubuntu0.1) ...
Setting up samba-libs:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.14) ...
Setting up libxext6:amd64 (2:1.3.3-1) ...
Setting up python-samba (2:4.7.6+dfsg~ubuntu-0ubuntu2.14) ...
Setting up xauth (1:1.0.10-1) ...
Setting up samba-common-bin (2:4.7.6+dfsg~ubuntu-0ubuntu2.14) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Processing triggers for mime-support (3.60ubuntu1) ...

3. Stderr log

debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 53.)
debconf: falling back to frontend: Readline
mount.nfs4: access denied by server while mounting 10.10.30.90:/volume/

4. PAI auto-generated command is incompatible with nfs server version

I noticed the auto-generated command using nfs4 to as mount type.

mount -t nfs4 10.10.30.90:/volume/ /tmp_nfsserver_root/

However, my nfs server version is:

Flags:	rw,noatime,vers=3, ....

So, if I cannot change my nfs server version due to some reason, can PAI support this nfs version?

@zeng-hello-world
Copy link

I finally change the nfs server to support nfs4 to solve this probelm.

Thanks for your help anyway! @Binyang2014

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants