-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ext storage performance with stat cache [$5] #7910
Comments
@karlitschek @craigpg can we schedule this ? Some external storage backends are making needlessly lots of calls to the remote just for simple things like checking if the file exists. Some calls are repeated within a single PHP run, not good for performance. |
Why is FTP listed as check off/internal stat cache? My testing 20 minutes ago showed several hundred connections in just a few minutes to create one file and upload three more. Tested SFTP/FTP/Webdav - each of these have the issue. |
@deklar from what I saw last time I checked (which was months ago) the FTP should be using PHP's stat cache already since all FTP calls are going through PHP's FS APIs which, AFAIK, are using a stat cache. Maybe your version of PHP or combination of plugins or config are disabling it somehow ? |
Here in the documentation of It also says Last time I tried was with early OC 7 when testing FTP against a server that only allowed 5 connections per IP. I haven't tried again, maybe something got broken since then. |
Definitely one of the key areas to work on for 8 |
I don't see how my stat() would be any other than the built in PHP version that it came with. At this point I think the best way to go forward with this one is to have someone verify that with OC 7 that it does in fact work for them (they will need access to the FTP server logs in order to verify this - I can supply you with such if needed). If this cannot be easily duplicated by anyone on OC 7 (in other words, if it works properly), I'd be interested in knowing the PHP version, the compile time options, and having a copy of the php.ini file. Also keep in mind that the number of connections per IP is not the same thing as the number of connections per minute. I did not see more than one or two connections at one time - however several hundred back to back (one terminates, another one starts). This may have been why you were successful in your testing. |
I'm not even sure if this is related or not, but I switched from the PHP built in opcache to xcache. Am I completely off base thinking that there are simply too many calls going to the FTP side (and others) that really don't need to happen? If SQL holds each file location, the last time it has been touched, the size, etc - then there would be no need to read a directory (remote or otherwise) unless you are going to modify that file, pull that file for reading, or delete that file (and of course upload a new file). For example, when I upload a file to a SFTP map, there should be one connection to drop that file in place, and perhaps a second one to verify the file size - but the rest should be done internally, no? Here is an example of just FTP on what I have been tracking: Uploading a single 120KB file:
(this is where it gets really really bad, as if it wasn't bad enough before)
|
@deklar like I said, ownCloud is using the PHP functions so if those are doing more than one command to the FTP server than it it partially PHP's fault. But I agree that some buffering should be done on the ownCloud side as well. If you're willing to dig deeper I suggest you to setup XDebug and a debugger, then step into where the calls are done in Normally if I do agree that uploading shouldn't require that many calls, not sure why it is happening. The only place where I'd expect a flood of calls is when scanning for changes. |
I dug into this a bit last night, it was clear that PHP is handling the FTP calls - but I'm not going to be too quick to blame PHP here just yet as something on the OC side is calling for those calls to be made (for example, the deletion of the file after it was uploaded, only to re-upload it again, I suspect that is not the PHP side). That alone is enough reason not to run the FTP end, and I'm not even sure if this is what is taking place on local file systems for that matter. I do like OC, it is pretty unique and offers a lot for an individual or a group of people -- but my goal was to offer this to customers at some point .. which is looking less likely. |
@PVince81 @icewind1991 I thought a lot about #7897, #11712 and ideas from #11797. External filesystem backends will wildly vary in capabilities and we should
|
Another Idea: We could check the mtime / etag behavior of the storage when configuring the mount point:
the necessary scanning strategy might depend on the remote source:
Future backends might should be easier to implement if we can automatically determine a scanning strategy. The scanners could measure how long it takes to do a scan and adjust the rescan time accordingly, as in Minutes, Hours, Days or Weeks. |
I noticed today there's also a problem with etag propagation: when you have a deep folder structure on S3 and create a file inside the last folder, the etag of every parent folder is updated. For some reason there's a long delay for every etag update. I guess it is maybe internally trying to check whether the folder exists or something. Using the oc_filecache table as stat cache will also help a lot with this (#11712) |
@PVince81 8.0 8.1 ?? THX |
This is more of an ongoing topic. I'd say backlog for now, but we need to keep it in sight when improving external storage. |
moving to 8.1-next - just no to forget 🙊 |
DAV doesn't have a stat cache and it looks bad with server to server sharing, see #13882 |
@icewind1991 would it make sense to implement a generic stat cache as a storage wrapper ? |
Stat cache storage wrapper ticket here: #13971 |
This is an ongoing effort, and we're past feature freeze. Moving to 9.0 |
While grepping the code I just noticed that the GDrive library and also the AWS library both have the option to use Memcache. Not sure if it uses it for caching results or like a stat cache. Something to look into, maybe. |
See how it's done for SMB here #21648 |
Feature freeze => 9.1 |
Many ext storage implementations are calling remote APIs for every call of
stat
,is_dir
,filetype
,filemtime
,file_exists
etc.Note that for many file operations it will intensively use these functions to make sure for example that a file doesn't exist before overwriting.
Some ext storages already use PHP's function that have their own internal stat cache (FTP, etc).
To make file operations faster (and browsing as well), we should add a local cache for all ext storage implementation that need them. See the Dropbox one as an example.
CC @karlitschek @icewind1991 (this is something I just noticed while fixing Openstack bugs)
The text was updated successfully, but these errors were encountered: