-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected Error [Errno36] File name too long #3423
Comments
Hi @rsomani95 ! Sorry for the delay. Could you please provide verbose log? I.e. run your command with But the issue here is that your filename is percent-encoded. E.g.
aka
has a length of 227 chars when percent-encoded, but only 79 chars when decoded. And so when our Now, the question here is what is using those percent-encoded names. I have a few ideas:
So in order to debug further, please show us the verbose log as requested above 🙂 Thanks for reporting this issue! |
Hi @efiop! Thanks a ton for the detailed response, appreciate it!
For files like this... '09-may-2018-portugal-lisbon-maltas-christabelle-standing-on-the-stage-during-the-second-dress-rehearsal-of-the-second-semi-final-at-the-eurovision-song-contest-the-final-takes-place-on-the-12th-may-2018-photo-jrg-carstensendpa-mm7hb4.jpg.BsQCnKzVGHypo4MWtSTpYg.tmp' ... it isn't encoding-decoding issue, but the filename is just too long, correct? Verbose Error Message (Truncated): 2020-03-02 17:58:43,075 DEBUG: checking if 'data/film_grab_and_google/train/shot_location/shot_location_exterior_nature_wetlands/677px-%D0%91%D0%B5%D0%BB%D0%BE%D1%80%D1%83%D1%87%D0%B5%D0%B9%D1%81%D0%BA%D0%B0%D1%8F_%D0%A3%D0%96%D0%94_%D0%B2_%D1%80%D0%B0%D0%B9%D0%BE%D0%BD%D0%B5_%D0%BF%D0%BE%D1%81%D1%91%D0%BB%D0%BA%D0%B0_%D0%A1%D0%B0%D0%BC%D0%B5%D0%BD%D0%B6%D0%B0.jpg'('{'md5': 'b8261e689bc14d83cc90091d06b89715'}') has changed.
2020-03-02 17:58:43,076 DEBUG: 'data/film_grab_and_google/train/shot_location/shot_location_exterior_nature_wetlands/677px-%D0%91%D0%B5%D0%BB%D0%BE%D1%80%D1%83%D1%87%D0%B5%D0%B9%D1%81%D0%BA%D0%B0%D1%8F_%D0%A3%D0%96%D0%94_%D0%B2_%D1%80%D0%B0%D0%B9%D0%BE%D0%BD%D0%B5_%D0%BF%D0%BE%D1%81%D1%91%D0%BB%D0%BA%D0%B0_%D0%A1%D0%B0%D0%BC%D0%B5%D0%BD%D0%B6%D0%B0.jpg' doesn't exist.
2020-03-02 17:58:43,076 DEBUG: Cache type 'reflink' is not supported: reflink is not supported
2020-03-02 17:58:43,081 DEBUG: SELECT count from state_info WHERE rowid=?
2020-03-02 17:58:43,081 DEBUG: fetched: [(93025,)]
2020-03-02 17:58:43,081 DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
2020-03-02 17:58:43,094 ERROR: unexpected error - [Errno 36] File name too long: '/home/rahul/github_projects/CinemaNet-Dataset/data/film_grab_and_google/train/shot_location/shot_location_exterior_nature_wetlands/677px-%D0%91%D0%B5%D0%BB%D0%BE%D1%80%D1%83%D1%87%D0%B5%D0%B9%D1%81%D0%BA%D0%B0%D1%8F_%D0%A3%D0%96%D0%94_%D0%B2_%D1%80%D0%B0%D0%B9%D0%BE%D0%BD%D0%B5_%D0%BF%D0%BE%D1%81%D1%91%D0%BB%D0%BA%D0%B0_%D0%A1%D0%B0%D0%BC%D0%B5%D0%BD%D0%B6%D0%B0.jpg.Am5WZHo2cnELtXmLYDroQd.tmp'
------------------------------------------------------------
Traceback (most recent call last):
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/main.py", line 49, in main
ret = cmd.run()
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/command/data_sync.py", line 30, in run
recursive=self.args.recursive,
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/repo/__init__.py", line 28, in wrapper
ret = f(repo, *args, **kwargs)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/repo/pull.py", line 31, in pull
targets=targets, with_deps=with_deps, force=force, recursive=recursive
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/repo/checkout.py", line 67, in _checkout
filter_info=filter_info,
File "/home/rahul/anaconda3/lib/python3.7/site-packages/funcy/decorators.py", line 39, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/stage.py", line 161, in rwlocked
return call()
File "/home/rahul/anaconda3/lib/python3.7/site-packages/funcy/decorators.py", line 60, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/stage.py", line 992, in checkout
filter_info=filter_info,
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/output/base.py", line 301, in checkout
filter_info=filter_info,
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 974, in checkout
path_info, checksum, force, progress_callback, relink, filter_info
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 991, in _checkout
path_info, checksum, force, progress_callback, relink, filter_info
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 903, in _checkout_dir
self.link(entry_cache_info, entry_info)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 368, in link
self._link(from_info, to_info, self.cache_types)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 375, in _link
self._try_links(from_info, to_info, link_types)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/remote/slow_link_detection.py", line 38, in wrapper
result = f(remote, *args, **kwargs)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 393, in _try_links
self._do_link(from_info, to_info, link_method)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 409, in _do_link
link_method(from_info, to_info)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/remote/local.py", line 155, in copy
System.copy(from_info, tmp_info)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/dvc/system.py", line 38, in copy
return shutil.copyfile(src, dest)
File "/home/rahul/anaconda3/lib/python3.7/site-packages/pyfastcopy/__init__.py", line 77, in copyfile
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
OSError: [Errno 36] File name too long: '/home/rahul/github_projects/CinemaNet-Dataset/data/film_grab_and_google/train/shot_location/shot_location_exterior_nature_wetlands/677px-%D0%91%D0%B5%D0%BB%D0%BE%D1%80%D1%83%D1%87%D0%B5%D0%B9%D1%81%D0%BA%D0%B0%D1%8F_%D0%A3%D0%96%D0%94_%D0%B2_%D1%80%D0%B0%D0%B9%D0%BE%D0%BD%D0%B5_%D0%BF%D0%BE%D1%81%D1%91%D0%BB%D0%BA%D0%B0_%D0%A1%D0%B0%D0%BC%D0%B5%D0%BD%D0%B6%D0%B0.jpg.Am5WZHo2cnELtXmLYDroQd.tmp'
------------------------------------------------------------
Everything you've mentioned so far suggests to me that cleaning up the file-names on my end would be the cleanest approach. Thanks again for all the help :) |
Sorry for the delay, @rsomani95 , had a conf run this week, so didn't check notifications that much.
Yes, correct 🙁 Looks like you are trying to store a lot of info your filenames, maybe consider creating a meta file alongside or something. It is hard for me to tell if you really have strong reasons to organise the storage that way, but personally feels suboptimal. And, well, dvc's suffix is striking the last nail there. We could shorten the prefix to something like 3-4 chars (because when you |
Hey @efiop, that's alright. The filenames were based off of google searches and so, we retained a lot of info. Anyways, I ended up just shortening the filenames to go around this issue. Thanks for your feedback. From my end, this issue can be closed. |
Ok, closing for now then. @rsomani95 Thank you so much for the feedback! 🙏 |
DVC version:
OS: Ubuntu 18.04
When trying to
dvc pull
from a GCS bucket, I face the following error"ERROR: unexpected error - [Errno 36] File name too long: '/home/rahul/github_projects/CinemaNet-Dataset/data/film_grab_and_google/train/shot_location/shot_location_exterior_nature_wetlands/677px-%D0%91%D0%B5%D0%BB%D0%BE%D1%80%D1%83%D1%87%D0%B5%D0%B9%D1%81%D0%BA%D0%B0%D1%8F_%D0%A3%D0%96%D0%94_%D0%B2_%D1%80%D0%B0%D0%B9%D0%BE%D0%BD%D0%B5_%D0%BF%D0%BE%D1%81%D1%91%D0%BB%D0%BA%D0%B0_%D0%A1%D0%B0%D0%BC%D0%B5%D0%BD%D0%B6%D0%B0.jpg.uqi3uTLbYsUU6FJNjzm9EC.tmp'
At first, I thought this was an issue with the linux filesystem (I have ext4) because of a limit in the max no. of characters in the file name. This can be seen with
getconf NAME_MAX / # (== 255 on my system)
(more details here). I tried changing this manually but to no avail.
However, the following scenario makes me think that there's more going on
I have a local copy of the data I was trying to pull from the remote. And when I try to add that folder in a new folder initialised using
dvc init --no-scm
, I can do so with some persistence.When first running
dvc add shot_location
(data-dir =shot_location
), I get the same error, but for a different file. Every time I re-rundvc add ...
, I get the same error, but for a different file, until finally I'm able to add the file successfully. Here's what that process looks like:Repeatedly trying
dvc checkout data.dvc
from the GCS remote did not behave this way; it kept throwing the same error over and over again.Now after this, I tested whether I could checkout this data after deleting it
And I was able to successfully, with the caveat that none of the files that threw this unexpected error exist.
NOTE
The filenames that throw these unexpected errors aren't fully accurate. They have the format
{long_filename}.jpg.{garbled_string}.tmp
(# characters > 255) whereas on disk, the actual names are just{long_filename}.jpg
(# characters < 255)Also, I haven't modified the config files to include
symlinks
orhardlinks
, both on remote and localThe text was updated successfully, but these errors were encountered: