Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor bug when downloading files containing spaces to EC2 instance #326

Open
csoulette opened this issue Mar 24, 2021 · 0 comments
Open

Comments

@csoulette
Copy link

csoulette commented Mar 24, 2021

Hello,

I'm running snakemake from my local machine to run a workflow on AWS. I have deployed a unicorn using Tibanna v0.18.3 and am launching jobs as a non-admin user part of the unicorn group. For my workflow a number of files are uploaded to S3 and then downloaded to EC2 for execution. In the list of files, there are a few with spaces in the name (poor naming scheme on my part), and they seem to break the download chain which prevents me from downloading the rest of my files -- including the Snakefile "p.smk." I'm not sure if this is an issue on Tibanna side or with download/upload methods used to move files from S3 to EC2. The error is avoidable if I rename my files, but I thought it would be worth mentioning this.Please find the log trace below for my failed run:

Thanks~

Wed Mar 24 06:06:21 UTC 2021
--2021-03-24 06:06:21--  https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf//aws_decode_run_json.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /4dn-dcic/tibanna/master/awsf/aws_decode_run_json.py [following]
--2021-03-24 06:06:21--  https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf/aws_decode_run_json.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 12965 (13K) [text/plain]
Saving to: ‘aws_decode_run_json.py’

     0K .......... ..                                         100% 17.2M=0.001s

2021-03-24 06:06:22 (17.2 MB/s) - ‘aws_decode_run_json.py’ saved [12965/12965]

--2021-03-24 06:06:22--  https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf//aws_update_run_json.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /4dn-dcic/tibanna/master/awsf/aws_update_run_json.py [following]
--2021-03-24 06:06:22--  https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf/aws_update_run_json.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 699 [text/plain]
Saving to: ‘aws_update_run_json.py’

     0K                                                       100% 56.0M=0s

2021-03-24 06:06:22 (56.0 MB/s) - ‘aws_update_run_json.py’ saved [699/699]

--2021-03-24 06:06:22--  https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf//aws_upload_output_update_json.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /4dn-dcic/tibanna/master/awsf/aws_upload_output_update_json.py [following]
--2021-03-24 06:06:22--  https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf/aws_upload_output_update_json.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 9194 (9.0K) [text/plain]
Saving to: ‘aws_upload_output_update_json.py’

     0K ........                                              100%  103M=0s

2021-03-24 06:06:22 (103 MB/s) - ‘aws_upload_output_update_json.py’ saved [9194/9194]

--2021-03-24 06:06:22--  https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf//download_workflow.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /4dn-dcic/tibanna/master/awsf/download_workflow.py [following]
--2021-03-24 06:06:23--  https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf/download_workflow.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 2076 (2.0K) [text/plain]
Saving to: ‘download_workflow.py’

     0K ..                                                    100% 48.9M=0s

2021-03-24 06:06:23 (48.9 MB/s) - ‘download_workflow.py’ saved [2076/2076]

cmsdef
download: s3://cmsdef/Be9ch8zerwpa.run.json to ./Be9ch8zerwpa.run.json
NAME        MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1     259:1    0   8G  0 disk 
└─nvme0n1p1 259:2    0   8G  0 part /
nvme1n1     259:0    0  10G  0 disk 
mke2fs 1.42.13 (17-May-2015)
Creating filesystem with 2621440 4k blocks and 655360 inodes
Filesystem UUID: fc6cfcab-8491-4518-80e4-b22d03631f1f
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

Collecting boto3==1.15
  Downloading https://files.pythonhosted.org/packages/13/a1/d9e77245939ec608f0787545824356705f2341c395c5a37e9778bdb1cd98/boto3-1.15.0-py2.py3-none-any.whl (129kB)
Collecting awscli==1.18.152
  Downloading https://files.pythonhosted.org/packages/11/7a/1f483d74fbf9d20f5e82849031b5adb59896b8e4ffcbf9c35040200edd81/awscli-1.18.152-py2.py3-none-any.whl (3.4MB)
Requirement already satisfied (use --upgrade to upgrade): urllib3==1.22 in /usr/local/lib/python2.7/dist-packages
Collecting botocore==1.18.11
  Downloading https://files.pythonhosted.org/packages/1a/e0/11125c627b9fdd17652165f2500968a1f5984496b60cebf3280fa5528c95/botocore-1.18.11-py2.py3-none-any.whl (6.7MB)
Collecting s3transfer<0.4.0,>=0.3.0 (from boto3==1.15)
  Downloading https://files.pythonhosted.org/packages/98/14/0b4be62b65c52d6d1c442f24e02d2a9889a73d3c352002e14c70f84a679f/s3transfer-0.3.6-py2.py3-none-any.whl (73kB)
Requirement already satisfied (use --upgrade to upgrade): jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python2.7/dist-packages (from boto3==1.15)
Requirement already satisfied (use --upgrade to upgrade): rsa<=4.5.0,>=3.1.2; python_version != "3.4" in /usr/local/lib/python2.7/dist-packages (from awscli==1.18.152)
Requirement already satisfied (use --upgrade to upgrade): docutils<0.16,>=0.10 in /usr/local/lib/python2.7/dist-packages (from awscli==1.18.152)
Requirement already satisfied (use --upgrade to upgrade): PyYAML<5.4,>=3.10; python_version != "3.4" in /usr/local/lib/python2.7/dist-packages (from awscli==1.18.152)
Requirement already satisfied (use --upgrade to upgrade): colorama<0.4.4,>=0.2.5; python_version != "3.4" in /usr/local/lib/python2.7/dist-packages (from awscli==1.18.152)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python2.7/dist-packages (from botocore==1.18.11)
Requirement already satisfied (use --upgrade to upgrade): futures<4.0.0,>=2.2.0; python_version == "2.7" in /usr/local/lib/python2.7/dist-packages (from s3transfer<0.4.0,>=0.3.0->boto3==1.15)
Requirement already satisfied (use --upgrade to upgrade): pyasn1>=0.1.3 in /usr/local/lib/python2.7/dist-packages (from rsa<=4.5.0,>=3.1.2; python_version != "3.4"->awscli==1.18.152)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/local/lib/python2.7/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore==1.18.11)
Installing collected packages: botocore, s3transfer, boto3, awscli
  Found existing installation: botocore 1.10.30
    Uninstalling botocore-1.10.30:
      Successfully uninstalled botocore-1.10.30
  Found existing installation: s3transfer 0.1.13
    Uninstalling s3transfer-0.1.13:
      Successfully uninstalled s3transfer-0.1.13
  Found existing installation: awscli 1.15.30
    Uninstalling awscli-1.15.30:
      Successfully uninstalled awscli-1.15.30
Successfully installed awscli-1.18.152 boto3-1.15.0 botocore-1.18.11 s3transfer-0.3.6
You are using pip version 8.1.1, however version 21.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
main workflow file: p.smk
workflow files: ['envs/scikit.yml', 'p.py', 'jnotebooks/Read', 'Lengths-Copy1.ipynb', 'scripts/2xfasta.py',  'rules/umiConesnsi.smk', 'scripts/poaSeqs.py', **'jnotebooks/Read', 'Lengths-Copy2.ipynb',** ', 'p.smk']

downloading key Be9ch8zerwpa.workflow/rules/reports.smk from bucket cmsdef to target /data1/snakemake/rules/reports.smk
downloading key Be9ch8zerwpa.workflow/envs/blat_env.yaml from bucket cmsdef to target /data1/snakemake/envs/blat_env.yaml

Broken here ...

downloading key Be9ch8zerwpa.workflow/jnotebooks/Read from bucket cmsdef to target /data1/snakemake/jnotebooks/Read
Traceback (most recent call last):
  File "./download_workflow.py", line 61, in <module>
    main()
  File "./download_workflow.py", line 57, in main
    s3.download_file(Bucket=bucket_name, Key=key, Filename=target)
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/inject.py", line 172, in download_file
    extra_args=ExtraArgs, callback=Callback)
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 307, in download_file
    future.result()
  File "/usr/local/lib/python2.7/dist-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/usr/local/lib/python2.7/dist-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   152  100   152    0     0   1027      0 --:--:-- --:--:-- --:--:--  1034
100   612  100   612    0     0   1483      0 --:--:-- --:--:-- --:--:--  2956
100 13.3M  100 13.3M    0     0  10.8M      0  0:00:01  0:00:01 --:--:-- 82.6M
user_allow_other
tibanna version=0.18.3
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Login Succeeded
top - 06:07:02 up 0 min,  0 users,  load average: 0.91, 0.27, 0.10
Tasks: 133 total,   4 running, 129 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.0 us, 15.4 sy,  1.9 ni, 41.0 id, 21.3 wa,  0.0 hi,  0.1 si,  1.3 st
KiB Mem :   987604 total,    68704 free,   168644 used,   750256 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   643396 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7144 root      20   0   87804  29896   7952 R  93.8  3.0   0:00.24 aws
 7152 root      20   0   41564  11784   4240 R  43.8  1.2   0:00.07 mon-put-in+
 7155 root      20   0   41292  11696   4416 R  37.5  1.2   0:00.06 mon-put-in+
    1 root      20   0   37704   5720   3976 S   0.0  0.6   0:01.74 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.01 ksoftirqd/0
    4 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:+
4.0K	/data1/input/
20K	/data1/out/
Wed Mar 24 06:07:03 UTC 2021
Wed Mar 24 06:07:03 UTC 2021
aws_decode_run_json.py
aws_update_run_json.py
aws_upload_output_update_json.py
Be9ch8zerwpa.run.json
cromwell
download_command_list.txt
download_workflow.py
env_command_list.txt
goofys-latest
mount_command_list.txt
Wed Mar 24 06:07:03 UTC 2021
Wed Mar 24 06:07:03 UTC 2021
aws_decode_run_json.py
aws_update_run_json.py
aws_upload_output_update_json.py
Be9ch8zerwpa.run.json
cromwell
download_command_list.txt
download_workflow.py
env_command_list.txt
goofys-latest
mount_command_list.txt
Filesystem     1K-blocks    Used Available Use% Mounted on
udev              485812       0    485812   0% /dev
tmpfs              98764    2924     95840   3% /run
/dev/nvme0n1p1   8065444 2311468   5737592  29% /
tmpfs             493800       0    493800   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs             493800       0    493800   0% /sys/fs/cgroup
/dev/nvme1n1    10190136   23136   9626328   1% /data1
/home/ubuntu
total 108K
-rwxr-xr-x   1 root   root  15K Mar 24 06:06 aws_run_workflow_generic.sh
-rw-r--r--   1 root   root    0 Mar 24 06:06 Be9ch8zerwpa.job_started
drwxr-xr-x   2 root   root 4.0K Oct 12  2018 bin
drwxr-xr-x   3 root   root 4.0K Oct 12  2018 boot
drwxr-xr-x   7 ubuntu root 4.0K Mar 24 06:06 data1
drwxr-xr-x  13 root   root 3.3K Mar 24 06:06 dev
drwxr-xr-x  99 root   root 4.0K Mar 24 06:06 etc
drwxr-xr-x   3 root   root 4.0K May 31  2018 home
lrwxrwxrwx   1 root   root   30 Oct 12  2018 initrd.img -> boot/initrd.img-4.4.0-1069-aws
lrwxrwxrwx   1 root   root   30 May 22  2018 initrd.img.old -> boot/initrd.img-4.4.0-1060-aws
drwxr-xr-x  21 root   root 4.0K May 31  2018 lib
drwxr-xr-x   2 root   root 4.0K May 22  2018 lib64
drwx------   2 root   root  16K May 22  2018 lost+found
drwxr-xr-x   2 root   root 4.0K May 22  2018 media
drwxr-xr-x   2 root   root 4.0K May 22  2018 mnt
drwxr-xr-x   2 root   root 4.0K May 22  2018 opt
dr-xr-xr-x 140 root   root    0 Mar 24 06:06 proc
drwx------   7 root   root 4.0K Mar 24 06:07 root
drwxr-xr-x  24 root   root  940 Mar 24 06:06 run
drwxr-xr-x   2 root   root  12K Oct 12  2018 sbin
drwxr-xr-x   2 root   root 4.0K May 31  2018 snap
drwxr-xr-x   2 root   root 4.0K May 22  2018 srv
dr-xr-xr-x  13 root   root    0 Mar 24 06:06 sys
drwxrwxrwt   8 root   root 4.0K Mar 24 06:07 tmp
drwxr-xr-x  10 root   root 4.0K May 22  2018 usr
drwxr-xr-x  13 root   root 4.0K May 22  2018 var
lrwxrwxrwx   1 root   root   27 Oct 12  2018 vmlinuz -> boot/vmlinuz-4.4.0-1069-aws
lrwxrwxrwx   1 root   root   27 May 22  2018 vmlinuz.old -> boot/vmlinuz-4.4.0-1060-aws
total 32K
drwxr-xr-x 2 root   root 4.0K Mar 24 06:06 input
drwx--x--x 2 ubuntu root  16K Mar 24 06:06 lost+found
drwxr-xr-x 2 root   root 4.0K Mar 24 06:06 out
drwxr-xr-x 2 root   root 4.0K Mar 24 06:06 reference
drwxr-xr-x 7 root   root 4.0K Mar 24 06:06 snakemake
/data1/input:
total 0
/data1/snakemake:
total 28K
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 configs
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 envs
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 jnotebooks
-rw-r--r-- 1 root root  328 Mar 24 06:06 p.py
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 rules
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 scripts
-rw-r--r-- 1 root root 2.2K Mar 24 06:06 snk


/data1/snakemake/envs:
total 8.0K
-rw-r--r-- 1 root root  318 Mar 24 06:06 blat_env.yaml
-rw-r--r-- 1 root root 1.8K Mar 24 06:06 scikit.yml

/data1/snakemake/jnotebooks:
total 0


/data1/snakemake/scripts:
total 8.0K
-rw-r--r-- 1 root root 1.4K Mar 24 06:06 adapterAlignmentToBed.py
-rw-r--r-- 1 root root  929 Mar 24 06:06 genotype.R
running snakemake cmsdef/done2.txt --snakefile p.smk --force -j1 --keep-target-files --keep-remote --latency-wait 0 --scheduler ilp --attempt 1 --force-use-threads --max-inventory-time 0 --allowed-rules write --nocolor --notemp --no-hooks --nolock --use-conda in docker image snakemake/snakemake:v6.0.0...
Unable to find image 'snakemake/snakemake:v6.0.0' locally
v6.0.0: Pulling from snakemake/snakemake
5b676b6f8d00: Pulling fs layer
98cb3cecd5c8: Pulling fs layer
f10974a71498: Pulling fs layer
74cfd1e16fdb: Pulling fs layer
9da70d002e9f: Pulling fs layer
a52d44520cf3: Pulling fs layer
74cfd1e16fdb: Waiting
9da70d002e9f: Waiting
a52d44520cf3: Waiting
98cb3cecd5c8: Verifying Checksum
98cb3cecd5c8: Download complete
5b676b6f8d00: Verifying Checksum
5b676b6f8d00: Download complete
f10974a71498: Verifying Checksum
f10974a71498: Download complete
a52d44520cf3: Verifying Checksum
a52d44520cf3: Download complete
74cfd1e16fdb: Verifying Checksum
74cfd1e16fdb: Download complete
5b676b6f8d00: Pull complete
98cb3cecd5c8: Pull complete
9da70d002e9f: Verifying Checksum
9da70d002e9f: Download complete
f10974a71498: Pull complete
74cfd1e16fdb: Pull complete
9da70d002e9f: Pull complete
a52d44520cf3: Pull complete
Digest: sha256:73fc3c3c5f898ce4b2010515e59bc068cc458b1ad63831146bf7837dc9b3f598
Status: Downloaded newer image for snakemake/snakemake:v6.0.0
Error: Snakefile "p.smk" not found.
Wed Mar 24 06:07:50 UTC 2021
/data1/out/:
total 24K
-rw-r--r-- 1 root   root   0 Mar 24 06:07 Be9ch8zerwpa.error
-rw-r--r-- 1 root   root  64 Mar 24 06:07 Be9ch8zerwpa.md5sum.txt
-rwxr-xr-x 1 ubuntu root 19K Mar 24 06:07 Be9ch8zerwpa.log
total 36K
drwx--x--x 2 ubuntu root  16K Mar 24 06:06 lost+found
drwxr-xr-x 2 root   root 4.0K Mar 24 06:06 reference
drwxr-xr-x 2 root   root 4.0K Mar 24 06:06 input
drwxr-xr-x 7 root   root 4.0K Mar 24 06:06 snakemake
drwxr-xr-x 2 root   root 4.0K Mar 24 06:07 tmp
drwxr-xr-x 2 root   root 4.0K Mar 24 06:07 out
/data1/input/:
total 0
/data1/snakemake/:
total 28K
-rw-r--r-- 1 root root  328 Mar 24 06:06 p.py
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 scripts
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 envs
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 rules
-rw-r--r-- 1 root root 2.2K Mar 24 06:06 snk
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 configs
drwxr-xr-x 2 root root 4.0K Mar 24 06:06 jnotebooks

/data1/snakemake/scripts:
total 8.0K
-rw-r--r-- 1 root root  929 Mar 24 06:06 genotype.R
-rw-r--r-- 1 root root 1.4K Mar 24 06:06 adapterAlignmentToBed.py

/data1/snakemake/envs:
total 8.0K
-rw-r--r-- 1 root root 1.8K Mar 24 06:06 scikit.yml
-rw-r--r-- 1 root root  318 Mar 24 06:06 blat_env.yaml

/data1/snakemake/rules:
total 12K
-rw-r--r-- 1 root root 3.3K Mar 24 06:06 limaSplit.smk
-rw-r--r-- 1 root root  842 Mar 24 06:06 reports.smk
-rw-r--r-- 1 root root 1.9K Mar 24 06:06 callUMI.smk


/data1/snakemake/jnotebooks:
total 0
Traceback (most recent call last):
  File "./aws_upload_output_update_json.py", line 157, in <module>
    raise Exception("output file {} upload to {} failed. %s".format(source, bucket + '/' + target) % e)
Exception: output file /data1/snakemake/cmsdef/done2.txt upload to cmsdef/done2.txt failed. [Errno 2] No such file or directory: '/data1/snakemake/cmsdef/done2.txt'
Filesystem      Size  Used Avail Use% Mounted on
udev            475M     0  475M   0% /dev
tmpfs            97M  2.9M   94M   3% /run
/dev/nvme0n1p1  7.7G  4.3G  3.5G  56% /
tmpfs           483M     0  483M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           483M     0  483M   0% /sys/fs/cgroup
/dev/nvme1n1    9.8G   23M  9.2G   1% /data1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant