-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate mimetype insertion during scan #5199
Comments
@icewind1991 @schiesbn @DeepDiver1975 @karlitschek |
Ok, here are my steps with MySQL:
Expected resultNavigates to the 'sftp' subdir Actual resultStays in the current dir and the network console has this error in a HTML page:
Sometimes it is another exception about constraint violation when insert a file into the DB. It doesn't seem to happen for smaller image files. It seems to be a race condition. |
Okay, it seems to be a conflict between the scanning triggered by scan.php and the list retrieval from list.php, which also triggers a scanning. Before navigating to the subdir, if you observe the network console and wait for the scan.php call to finish, then click on the subdir, it will all work fine. |
@icewind1991 @DeepDiver1975 is scanning supposed to be exclusive / locking other scans out or is it allowed for multiple scan processes to run at the same time ? |
Getting closer, it looks like there are two server calls that conflict: First call: Initiates the background scan Second call: Scans as part of getDirectoryContent(): The second call tries to insert the same file into the DB which causes a duplicate key exception. After both calls are done, the entry for that file appears once in the DB. I'll try to pinpoint what is triggering this call and why the second call doesn't find the file in the DB. |
Ok, the first call is from scan.php which triggers the background scan and the second one from list.php to get the directory contents. If at the time where the second call is running, the background hasn't finished yet or hasn't started, the second call will believe the scan is incomplete here in lib/private/files/view.php:
and when the scan is incomplete, getDirectoryContent() will then call $scanner->scan() which itself is checking whether we have data in the cache in lib/private/files/cache/scanner.php in the scanFile() method:
When no cache data is available, it will put/insert the scanned file into the DB. If between this the first call has inserted the row already, the duplicate key error will occur. I'm not sure yet how the two scanning processes are supposed to lock each other out. I'll look into the getWatcher() call above which might be what I'm looking for. |
@icewind1991 can you please have a look? THX |
@PVince81 The query is only inserting a single mimetype. So you can just go to where the insert happens, catch the duplicate key exception and ignore it? |
Yes, @karlitschek suggested to do that as a workaround. |
@PVince81 Proper design would suggest that there is only a single place where mimetype insertion happens. |
@bantu The point is that this can happen if several php process run at the same time and try to insert the same primary key into the db. Catching the duplicate key exception should be fine |
Or use |
@tanghus Excellent point. I used this in the past with MySQL. I wasn't sure that this works with all databases :-) |
@karlitschek amazingly it works even with Oracle ;) AFAIRC performance with SQLite isn't optimal, but when is it ever that. |
@karlitschek You seem to have misunderstood or I wasn't clear enough.
was in reference to
My point is that there should be only a single place where the mimetype insert SQL is executed and thus only a single place where the duplicate key error has to be ignored. |
There are at least two code paths that reach that single place: the scanner from scan.php and the getDirectoryContent() from list.php I'll have a try catching and ignoring the exception directly where the insert is done.
Should this conflict/exception ignoring be logged ? |
I would log it with level DEBUG especially until we are 100% that we found and fixed the problem. In normal operation, especially in a heavy load setup, this should not be logged. It's just normal |
From what I see catching the exception isn't enough, we need to return an id. |
see here: http://stackoverflow.com/questions/1132571/implementing-update-if-exists-in-doctrine-orm |
@karlitschek Probably not. That performs a delete+insert which changes the autoincrement id according to the link you posted. |
@PVince81 Please link to work-in-progress patch. |
After discussing we found out that REPLACE INTO wouldn't work with Oracle DB and others. |
CC @butonic FYI |
Ok, i guess it's probably a transactions thing. Transaction 1: (scan.php)
In parallel, Transaction 2 is started: (list.php) Which means that the current solution with try{ insert } catch () { select id } can't be used. |
@PVince81 handler for this exception needs to refresh the mime cache and retry https://github.com/owncloud/core/blob/master/lib/private/files/cache/cache.php#L88 |
I've moved the duplicate exception catching to scanFile() as it's the place we expect it to happen while two scan processes happen. Now the problem with duplicate mimetype insertion remains. Ignoring the exception cause image previews to not be rendered until the browser is refreshed. |
When two scanning processed run at the same time, for example when scan.php and list.php were called at the same time on an external storage mountpoint, duplicate insertion errors can occurs. These errors are now logged and ignored. Since both scans are running in parallel transactions, they don't see each other's changes directly in the DB which can cause duplicate insertion errors for mime types as well, but those mime types can't be selected yet. The solution to this is to force-reload the mimetypes list after the transaction is finished. Fixes #5199
Ok, I found out that there is an explicit DB transaction is started when scanning a folder. To make sure that we still get the mime type id after the scan is done (by list.php), we force-reload the mimetype id after the transaction is finished. I've updated the PR. |
Steps to reproduce:
These aren't exact steps and it doesn't always happen, might be a race condition.
It does seem to happen more often when changing the config though.
It seems that the files scanning is trying to insert the same mimetype multiple times into the DB:
The text was updated successfully, but these errors were encountered: