Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow _wp_handle_upload() #373

Closed
shahin8r opened this issue Jan 18, 2020 · 15 comments
Closed

Extremely slow _wp_handle_upload() #373

shahin8r opened this issue Jan 18, 2020 · 15 comments

Comments

@shahin8r
Copy link

shahin8r commented Jan 18, 2020

Hey guys,

Love this plugin - been using it for years.

I've stumbled across a problem with _wp_handle_upload() where it's taking way too long to handle the uploaded file. I discovered this when trying to import attachments programmatically with media_handle_sideload().

I've wrote a small script on a fresh WP install to test this out with an 12KB image.

$starttime_upload = microtime(true);
_wp_handle_upload($file, ['test_form' => false, 'test_size' => false], current_time('mysql'), 'wp_handle_sideload');
$endtime_upload = microtime(true) - $starttime_upload;

echo "took {$endtime_upload}s to handle upload";

Which outputs:

took 2.6300148963928s to handle upload

If I disable S3-Uploads plugin I get:

took 0.029120206832886s to handle upload

Just to check I ran wp s3-uploads verify and it verified in around 2-300ms.

Not sure if this is expected or not - would appreciate any input you guys might have on this.

@hubertnguyen
Copy link

hubertnguyen commented Feb 3, 2020

~300ms sound right if your server is located in a different region than the S3 Bucket. Not sure why it would take 2.6 seconds, it depends on how "chatty" the protocol is.

@asretux
Copy link

asretux commented Feb 27, 2020

I experience a similar issue using WordPress 5.3.

As far as I can tell, they changed the logic behind wp_unique_filename, doing a scandir on the S3 directory. I am handling a website with 400k+ attachment uploads, and a single upload in 5.3 issues a 30s timeout.

@hubertnguyen
Copy link

@asretux : ouch, if that's the case it's a serious problem for scalability!

@rmccue
Copy link
Member

rmccue commented Mar 2, 2020

It looks like this may be due to https://core.trac.wordpress.org/changeset/46822 - we've not yet replicated this, but from reading the code seems fairly obvious.

This might need changing in WordPress core. We'll continue to look into this issue.

@jakubkol
Copy link

jakubkol commented Mar 20, 2020

The same issue. Scanning a directory with 208614 files takes over 90s:

time wp eval '@scandir("s3://wp-bucket/uploads/2020/02");'

real	1m38.598s
user	0m22.520s
sys	0m2.320s

Is there a workaround for this if I don't want to override /wp-includes/functions.php containing wp_unique_filename()?

@jakubkol
Copy link

jakubkol commented Mar 21, 2020

Workaround

Place every new uploaded file to a unique name subdirectory. Then the slow scandir function scans only an empty directory:

add_filter( 'upload_dir', function( $uploads ) {
  $uniqid = uniqid();
  $uploads['path'] .= '/' . $uniqid;
  $uploads['url'] .= '/' . $uniqid;
  return $uploads;
} );

@asretux
Copy link

asretux commented Apr 28, 2020

@jakubkol I don't think creating a new directory for each file, in terms of scalability, would be a good workaround. On sites with thousands of uploads per day, like a news magazine portal, the number of subdirectories per month would surpass the threshold number for optimal performance.

@jakubkol
Copy link

@asretux

Amazon S3 has a flat structure instead of a hierarchy like you would see in a file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. Amazon S3 does this by using a shared name prefix for objects (that is, objects have names that begin with a common string).

Source: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html

@tvup
Copy link

tvup commented Apr 28, 2020

1.3 million in that directory:

time wp eval '@scandir("s3://dkwebshops/uploads/2020/04");'


real    3m51.058s
user    1m36.401s
sys     0m2.539s

@joehoyle
Copy link
Member

joehoyle commented Jul 3, 2020

Mmm this is a good issue! Indeed, the scandir added to Core in wp_unique_filename causes a full scan on the current month's dir.

The logic / use case WordPress is protecting against is valid, and I don't see a way of achieving the same algorithm without doing some kind of scan on the currently uploaded files. However, I think there is a significant performance improvement we can make by adjusting the scandir call to effectively do an S3 prefix listObjects call with the name of the uploaded file. This means we'd only need to do (essentially) '@scandir("s3://dkwebshops/uploads/2020/04/my-file*");' (S3 supports an arbitrary prefix, unlike the filesystem). Doing so would mean a much smaller results set (mostly none even).

The challenge is how to appropriately hook in this behavior. wp_unique_filename accepts a $unique_filename_callback param, which we might be able to provide as an override further up the call-stack for the "usual" wp_handle_upload calls, so this would at least be faster on normal upload workflows.

I don't see any other lower level hooks in Core, and I'd rather be able to fix this without needing a patch to WordPress.

If we can provide a copied-but-modified version of wp_unique_filename which does a scandir on the more specific path prefix, and hook it as required, I think this should be possible.

@joehoyle
Copy link
Member

joehoyle commented Jul 3, 2020

Another larger hack, which might work would be to inspect debug_backtrace in our implementation of scandir (via dir_readdir on the streamwrapper) and look for the appropriate call stack and argument, to make the scan more specific.

joehoyle added a commit that referenced this issue Jul 3, 2020
Please for give me for what I have done.

Fixes #373.
@joehoyle
Copy link
Member

joehoyle commented Jul 3, 2020

Mmm ok, after trying the above approach, the filters via wp_handle_upload etc don't provide enough context to change this call. I took a less sound approach.

@joehoyle
Copy link
Member

joehoyle commented Jul 7, 2020

I've also opened a Trac ticket with patch for a potential Core change: https://core.trac.wordpress.org/ticket/50587

@rmccue
Copy link
Member

rmccue commented Jul 7, 2020

Patch has been committed: https://core.trac.wordpress.org/changeset/48369

@joehoyle can you prepare the filter for this? We can add it immediately even though it won't take effect until the next major WP version is out, and then leave the (now-inert) backtrace hack for a few versions afterwards.

@joehoyle
Copy link
Member

joehoyle commented Jul 8, 2020

This has been fixed in version 2.2.2 of S3 Uploads.

@joehoyle joehoyle closed this as completed Jul 8, 2020
carlalexander added a commit to ymirapp/wordpress-plugin that referenced this issue Aug 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants