Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to simple built-in FileFetcher on import #3881

Merged
merged 16 commits into from
Dec 14, 2022
Merged

Switch to simple built-in FileFetcher on import #3881

merged 16 commits into from
Dec 14, 2022

Conversation

dafeder
Copy link
Member

@dafeder dafeder commented Nov 23, 2022

The external FileFetcher library makes file downloads slower, and has led to failures/corruptions of data. Replacing it with a single FileFetcher job that copies a file all at once with Guzzle/cURL will lead to faster, more stable imports of large files. Our experience has been that downloading the file in chunks leads to more problems than it solves, at least as it works now.

The new FileFetcherJob class is all reused code from the FileFetcher library - specifically, the "LastResort" processor. Under this new paradigm the idea of a "processor" is no longer relevant. It's possible that in the future we would want to re-introduce the concept if our one-size solution does not actually fit all.

Local testing showed that a large remote CSV file that took 65 minutes to import completed in 20 minutes after making this change.

QA Steps

  • Create datasets with both remote and local file URLs in downloadURL
  • Run import and confirm file is successfully imported

@dafeder dafeder marked this pull request as ready for review December 2, 2022 16:46
@dafeder dafeder requested a review from janette December 2, 2022 16:47
@janette janette merged commit edd187c into 2.x Dec 14, 2022
@janette janette deleted the copy-import branch December 14, 2022 22:56
dafeder added a commit that referenced this pull request Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants