Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File gets deleted if the name is too long #5200

Closed
aliabbasjaffri opened this issue Jan 4, 2021 · 3 comments · Fixed by #5201
Closed

File gets deleted if the name is too long #5200

aliabbasjaffri opened this issue Jan 4, 2021 · 3 comments · Fixed by #5201
Assignees
Labels
p0-critical Critical issue. Needs to be fixed ASAP.

Comments

@aliabbasjaffri
Copy link

aliabbasjaffri commented Jan 4, 2021

Bug Report

Description

A clear and concise description of what the bug is.

  • I tried adding a model file with params appended to its name which added signifcant characters to the file name
  • dvc add gave me an error with that: unexpected error - [Errno 63] File name too long
  • it also deleted the file from the respective folder
  • Please refer to the attached screenshot
    Screenshot 2021-01-04 at 13 55 34

Reproduce

  • Copy a file with a long name in a folder
  • ls (verify file exists)
  • dvc add <file_name> (results in error)
  • ls (file not present)

Expected

  • The file should not be deleted from the folder

Environment information

  • dvc version output
DVC version: 1.10.2 (osxpkg)
---------------------------------
Platform: Python 3.7.9 on Darwin-20.1.0-x86_64-i386-64bit
Supports: All remotes
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s2s1
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk1s2s1
Repo: dvc, git
**Additional Information (if any):**
  • dvc add --verbose output
dvc add ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_\[mae__tweedie-nloglik@1-65\]2017_bestscore_4.5141.dat --verbose
2021-01-04 14:02:17,725 DEBUG: Check for update is enabled.
2021-01-04 14:02:17,727 DEBUG: fetched: [(3,)]
2021-01-04 14:02:18,040 DEBUG: Adding 'ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat' to '.gitignore'.
2021-01-04 14:02:18,041 DEBUG: Path '/Users/a.jaffri/Downloads/dvc/model/ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat' inode '18832587'
2021-01-04 14:02:18,042 DEBUG: fetched: []
2021-01-04 14:02:18,160 DEBUG: Path 'ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat' inode '18832587'
2021-01-04 14:02:18,160 DEBUG: fetched: []
2021-01-04 14:02:18,160 DEBUG: {'ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat': 'modified'}
2021-01-04 14:02:18,161 DEBUG: Path '/Users/a.jaffri/Downloads/dvc/model/ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat' inode '18832587'
2021-01-04 14:02:18,161 DEBUG: fetched: [('1607622730429545472', '73420677', '2a5ff67f5dd61b4be951f4e23b4f5f5a', '1609765338160652032')]
2021-01-04 14:02:18,161 DEBUG: Computed stage: 'ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat.dvc' md5: 'None'
2021-01-04 14:02:18,163 DEBUG: Saving 'ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat' to '../.dvc/cache/2a/5ff67f5dd61b4be951f4e23b4f5f5a'.
2021-01-04 14:02:18,164 DEBUG: Assuming '/Users/a.jaffri/Downloads/dvc/.dvc/cache/2a/5ff67f5dd61b4be951f4e23b4f5f5a' is unchanged since it is read-only
2021-01-04 14:02:18,165 DEBUG: Created 'reflink': ../.dvc/cache/.cache_type_test_file -> .2DkQ3K3jagzEP2Vv2ti4zb
2021-01-04 14:02:18,166 DEBUG: Removing '/Users/a.jaffri/Downloads/dvc/model/.2DkQ3K3jagzEP2Vv2ti4zb'
2021-01-04 14:02:18,166 DEBUG: Removing '/Users/a.jaffri/Downloads/dvc/.dvc/cache/.cache_type_test_file'
2021-01-04 14:02:18,166 DEBUG: Removing '/Users/a.jaffri/Downloads/dvc/model/ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat'
2021-01-04 14:02:18,170 DEBUG: Cache type 'reflink' is not supported: reflink is not supported
Adding...
2021-01-04 14:02:18,172 DEBUG: fetched: [(24,)]
2021-01-04 14:02:18,173 ERROR: unexpected error - [Errno 63] File name too long: '/Users/a.jaffri/Downloads/dvc/model/ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat.A42iEPyqyqc2kZaf886P4q.tmp'
------------------------------------------------------------
Traceback (most recent call last):
  File "dvc/main.py", line 90, in main
  File "dvc/command/add.py", line 24, in run
  File "dvc/repo/__init__.py", line 60, in wrapper
  File "dvc/repo/scm_context.py", line 4, in run
  File "dvc/repo/add.py", line 85, in add
  File "dvc/repo/add.py", line 124, in _process_stages
  File "site-packages/funcy/decorators.py", line 39, in wrapper
  File "dvc/stage/decorators.py", line 36, in rwlocked
  File "site-packages/funcy/decorators.py", line 60, in __call__
  File "dvc/stage/__init__.py", line 453, in commit
  File "dvc/output/base.py", line 297, in commit
  File "site-packages/funcy/decorators.py", line 39, in wrapper
  File "dvc/cache/base.py", line 40, in use_state
  File "site-packages/funcy/decorators.py", line 60, in __call__
  File "dvc/cache/base.py", line 317, in save
  File "dvc/cache/base.py", line 326, in _save
  File "dvc/cache/base.py", line 203, in _save_file
  File "dvc/cache/base.py", line 141, in link
  File "dvc/cache/base.py", line 148, in _link
  File "dvc/remote/slow_link_detection.py", line 38, in wrapper
  File "dvc/cache/base.py", line 166, in _try_links
  File "dvc/cache/base.py", line 182, in _do_link
  File "dvc/tree/local.py", line 182, in copy
  File "dvc/system.py", line 32, in copy
  File "shutil.py", line 121, in copyfile
OSError: [Errno 63] File name too long: '/Users/a.jaffri/Downloads/dvc/model/ek006_xgb_modelbooster_dart__maxdepth_15__minchildweight_1-5__eta_0-05__subsample_0-9__colsamplebytree_1__objective_reg_tweedie__tweedievariancepower_1-65__treemethod_hist__evalmetric_[mae__tweedie-nloglik@1-65]2017_bestscore_4.5141.dat.A42iEPyqyqc2kZaf886P4q.tmp'
------------------------------------------------------------
2021-01-04 14:02:18,732 DEBUG: Version info for developers:
DVC version: 1.10.2 (osxpkg)
---------------------------------
Platform: Python 3.7.9 on Darwin-20.1.0-x86_64-i386-64bit
Supports: All remotes
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s2s1
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk1s2s1
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-01-04 14:02:18,733 DEBUG: Analytics is enabled.
2021-01-04 14:02:18,915 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/d7/bhrr7mcd40s6gm1pcs1qn085mxf330/T/tmphb5wban3']'
2021-01-04 14:02:18,917 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/d7/bhrr7mcd40s6gm1pcs1qn085mxf330/T/tmphb5wban3']'
@efiop
Copy link
Contributor

efiop commented Jan 4, 2021

Reproducer for ext4:

#!/bin/bash                                                              
                                                                         
set -e                                                                   
set -x                                                                   
                                                                         
rm -rf myrepo                                                            
mkdir myrepo                                                             
cd myrepo                                                                
                                                                         
git init                                                                 
dvc init                                                                 

NAME_MAX=$(getconf NAME_MAX .)                                                                         
FNAME=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w $NAME_MAX | head -n 1) 
mkdir data
echo foo > data/$FNAME
dvc add data                                               

@efiop efiop added the p0-critical Critical issue. Needs to be fixed ASAP. label Jan 4, 2021
@efiop efiop self-assigned this Jan 4, 2021
@efiop
Copy link
Contributor

efiop commented Jan 4, 2021

Hi @aliabbasjaffri !

Thanks for reporting this issue! I suppose you have a backup for that file, right? If not, the file is not lost, it is in your cache and can be recovered. I'm preparing a fix for this NAME_MAX issue, but we should definitely handle this nicer in general 🙁

@aliabbasjaffri
Copy link
Author

No worries. I had a backup. I was testing out dvc as a part of incorporating it in our MLOps pipelines.
In fact, i am really impressed by the tool and appreciate the quick support from the team :-)

@efiop efiop mentioned this issue Jan 4, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p0-critical Critical issue. Needs to be fixed ASAP.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants