Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reached link limit when downloading a large amount of bundles (140000+) #427

Closed
GPelayo opened this issue Aug 13, 2019 · 3 comments
Closed
Assignees
Labels
BUG demoed orange Done by the Azul, Data Browser and Portal team

Comments

@GPelayo
Copy link

GPelayo commented Aug 13, 2019

WARNING:hca:Download task failed: DSSFile(name='project_0.json', uuid='e0009214-c0a0-4a7b-96e2-d6a83e966ce0', version='2019-07-09T221320.395000Z', sha256='ba4df5b43e0bdff717f6d81b5aaaa941987fb50d3a91498b47a49353abbbedee', size=6366, indexed=True, replica='aws')
Traceback (most recent call last):
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ubuntu/dcp-venv/lib/python3.6/site-packages/hca/dss/__init__.py", line 580, in _download_and_link_to_filestore
    hardlink(file_store_path, file_path)
  File "/home/ubuntu/dcp-venv/lib/python3.6/site-packages/hca/dss/util/__init__.py", line 57, in hardlink
    os.link(source, link_name)
OSError: [Errno 31] Too many links: '.hca/v2/files_2_4/ba/4df5/ba4df5b43e0bdff717f6d81b5aaaa941987fb50d3a91498b47a49353abbbedee' -> '8ac31d30-3e22-494e-a8c2-0bb837fa7d4e.2019-08-01T200147.756920Z/project_0.json'
@theathorn theathorn added the BUG label Aug 14, 2019
@hannes-ucsc hannes-ucsc added the orange Done by the Azul, Data Browser and Portal team label Aug 15, 2019
@GPelayo
Copy link
Author

GPelayo commented Aug 16, 2019

I did a quick fix at...

try:
os.link(source, link_name)
except OSError as e:
if e.errno != errno.EEXIST:
raise

...where if an EMLINK exception is thrown, the source file is copied to the link's location instead being linked. This is implemented using the snippet below.

try: 
    os.link(source, link_name)
except OS Error as e:
    if e.errno == errno.EMLINK:
        shutil.copyfile(source, link_name)
    elif e.errno != errno.EEXIST:
        raise

While this fix allowed me a large number download files, it was only tested in Ubuntu with Python 3.7 and doesn't completely resolve this issue.

@jessebrennan
Copy link
Collaborator

@hannes-ucsc and I have discussed designs for this, but have not reached a final agreement (a prerequisite to finishing this ticket).

@hannes-ucsc
Copy link
Contributor

We should implement the quick fix plus logging. If additional improvements can implemented on top of that, we should file separate tickets.

@theathorn theathorn modified the milestones: Q3 2019 Milestone 3, Q4 2019 Milestone 1 Sep 23, 2019
@theathorn theathorn modified the milestones: Q4 2019 Milestone 1, Q4 2019 Milestone 2 Oct 25, 2019
@theathorn theathorn modified the milestones: Q4 2019 Milestone 2, Q4 2019 Milestone 3 Nov 20, 2019
@theathorn theathorn removed this from the Q4 2019 Milestone 3 milestone Jan 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG demoed orange Done by the Azul, Data Browser and Portal team
Projects
None yet
Development

No branches or pull requests

4 participants