Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization failure, Deadlock found when trying to get lock #14757

Closed
PVince81 opened this issue Mar 9, 2015 · 40 comments
Closed

Serialization failure, Deadlock found when trying to get lock #14757

PVince81 opened this issue Mar 9, 2015 · 40 comments

Comments

@PVince81
Copy link
Contributor

PVince81 commented Mar 9, 2015

Steps to reproduce

  1. Prepare a local folder "lots" with 1600+ files
  2. Setup the sync client and let it upload all the files
  3. Rename "lots" to "lots2", wait for sync
  4. Rename "lots2" to "lots3"

Expected result

Renamed properly.

Actual result

Very strange internal server error:

{"reqId":"913558a89371e56c7248d7c0ac24f820","remoteAddr":"127.0.0.1","app":"remote","message":"An exception occurred while executing 'UPDATE `oc_filecache` SET `path` = ?, `path_hash` = ?, `name` = ?, `parent` =? WHERE `fileid` = ?' with params [\"files\\\/lots3\", \"6496454381429272daca5f1453986396\", \"lots3\", \"3\", 8]:\n\nSQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction","level":4,"time":"2015-03-09T09:45:40+00:00","method":"MOVE","url":"\/owncloud\/remote.php\/webdav\/lots2"}

And then the sync client reuploads all the files which takes a while...

Versions

ownCloud 8.0.0 / stable8 found while testing #14752
mariadb-10.0.14-2.3.x86_64
php5-5.6.6-1.1.x86_64
openSUSE Factory

Not sure if this can be reproduced consistently, will try again.

That could be another argument for closeCursor.

@DeepDiver1975 @icewind1991 @MorrisJobke

@PVince81
Copy link
Contributor Author

PVince81 commented Mar 9, 2015

Hmm I wonder if it's due to #14752 which introduces a transaction because that's the branch I got the issue on.

@MorrisJobke
Copy link
Contributor

@PVince81 Also @felixboehm mentioned that we maybe need to handle aborted transactions because of conflicts somehow.

@PVince81
Copy link
Contributor Author

PVince81 commented Mar 9, 2015

Hmmm just now I noticed that while I renamed "lots3" to "lots4", now I have two duplicate local folders called "lots3" and "lots4".

Aborted transactions ? You mean when the sync client tries to do something but the server is still busy from the previous parallel operation ?

@MorrisJobke
Copy link
Contributor

No aborted transaction as in: The database says "no I can't do this transaction, because there was a parallel concurrent transaction and this transaction isn't valid anymore"

@PVince81
Copy link
Contributor Author

PVince81 commented Mar 9, 2015

I tried to reproduce the issue but this run didn't. One thing I observed though is that the sync client sends a MOVE operation for EVERY file inside the renamed folder instead of just renaming the folder. It doesn't seem to always do that though.

@PVince81
Copy link
Contributor Author

PVince81 commented Mar 9, 2015

I'm trying to sequentially rename "lotsX" to "lotsY" and see what happens, in the hope to reproduce the issue.
There's more weird stuff happening:

{"reqId":"174cfbf66b485a8274cb2bc8071e6a21","remoteAddr":"127.0.0.1","app":"core","message":"unable to rename, file does not exists : files_versions\/lots9.v","level":3,"time":"2015-03-09T10:12:04+00:00","method":"MOVE","url":"\/owncloud\/remote.php\/webdav\/lots9"}

(notice that lots9 is a folder, not a file)

@PVince81
Copy link
Contributor Author

PVince81 commented Mar 9, 2015

I tried several times and didn't manage to reproduce the deadlock issue.
Maybe it was a one off or race condition of some sorts... but with the nasty side effect of having the folder duplicate itself.

@christianrj
Copy link

Maybe this has something to do with one these bugs that I previously reported?
#10711
#11847
#12659
#12661

@lparmentier-quatral
Copy link

We've trying OC 8.1.0.6 beta2 on a cloud (HaProxy on frontend apache, HaProxy on frontend MariaDB Cluster) and there is the same problem. Not due to renaming directory, so not due to #14752 cause we've the last version where it's resolved, and we caused the problem without renaming directory.

@MCoLoL
Copy link

MCoLoL commented Jul 17, 2015

hi all, we have the same problem with OC8.0.5 and client version 1.8.4 ( build 2531 ), when we upload big files ( eg 300MB), the synchronization will abort with "Internal server error"
In the owncloud log, we can find:

Jul 17 09:29:02 sdcoca15 ownCloud[24989]: {remote} An exception occurred while executing 'UPDATE oc_filecache SET mtime = ?, etag=? WHERE (mtime <> ? OR etag <> ? ) AND fileid = ? ' with params [1437118142, "55a8aebee9f3a", 1437118142, "55a8aebee9f3a", "11251183"]:#12#012SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

Probably is the same bug?
thanks

@PVince81
Copy link
Contributor Author

PVince81 commented Oct 8, 2015

Is anyone still seeing this with 8.1.3 with transactional redis locking enabled ? (https://doc.owncloud.org/server/8.1/admin_manual/configuration_files/files_locking_transactional.html)

@michi-roth
Copy link
Contributor

We are using OC 8.1.2 with transactional filelocking and MariaDB and are still getting this error.

@PVince81
Copy link
Contributor Author

Can you check the "Admin" page and under "Status" you should see a message that says that locking is enabled.

@michi-roth
Copy link
Contributor

The message is visible.
Before activating the new locking we often had deadlocks in the table "file_locks" (I am not sure of the correct name, but is was the table for the file locking). After activating the new lock we only have deadlocks while updating mtime and etag in the "file_cache" table.#

Also our behaviour is, that after the message the sync client does not sync the new files. I think this is because of the unchanged etag

@PVince81
Copy link
Contributor Author

Hmmm, the table "file_locks" doesn't exist in 8.1.x. The app "files_locking" is something different and doesn't use the database.
Did you actually upgrade to 8.2RC2 ?

@michi-roth
Copy link
Contributor

I just upgraded to OC 8.2.0 and got the same error again. The updated file is shown in the web interface but no client downloads the changes because the etag of the parentfolder is not changed.

The oc_file_locks table is still in the database and the admin page does not show the message for the activated transactional locking. But I don't think that this has something to do with the error while updating the database. What is ownCloud doing when a DB transaction fails? I would assume that ownCloud retries to change the DB but I doesn't look so.

The exact error message:

{"reqId":"DEPld/dVX2BKKGWwRuGK","remoteAddr":"137.250.18.11","app":"index","message":"Exception: {"Exception":"Doctrine\DBAL\Exception\DriverException","Message":"An exception occurred while executing 'UPDATE oc_filecache SET mtime = ?, etag=? WHERE (mtime <> ? OR etag <> ? ) AND fileid = ? ' with params [1445422901, \"56276735aadd0\", 1445422901, \"56276735aadd0\", 2511469]:\n\nSQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction","Code":0,"Trace":"#0 \/var\/www\/owncloud-8.2.0\/3rdparty\/doctrine\/dbal\/lib\/Doctrine\/DBAL\/DBALException.php(116): Doctrine\DBAL\Driver\AbstractMySQLDriver->convertException('An exception oc...', Object(Doctrine\DBAL\Driver\PDOException))\n#1 \/var\/www\/owncloud-8.2.0\/3rdparty\/doctrine\/dbal\/lib\/Doctrine\/DBAL\/Statement.php(172): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(Doctrine\DBAL\Driver\PDOMySql\Driver), Object(Doctrine\DBAL\Driver\PDOException), 'UPDATE oc_file...', Array)\\n#2 \\\/var\\\/www\\\/owncloud-8.2.0\\\/lib\\\/private\\\/db\\\/statementwrapper.php(73): Doctrine\\\\DBAL\\\\Statement->execute(Array)\\n#3 \\\/var\\\/www\\\/owncloud-8.2.0\\\/lib\\\/private\\\/db.php(147): OC_DB_StatementWrapper->execute(Array)\\n#4 \\\/var\\\/www\\\/owncloud-8.2.0\\\/lib\\\/private\\\/files\\\/cache\\\/cache.php(300): OC_DB::executeAudited('UPDATE*PREFIX...', Array)\n#5 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/cache\/changepropagator.php(79): OC\Files\Cache\Cache->update(2511469, Array)\n#6 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/cache\/updater.php(117): OC\Files\Cache\ChangePropagator->propagateChanges(NULL)\n#7 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/view.php(1038): OC\Files\Cache\Updater->update('\/thumbnails\/251...')\n#8 \/var\/www\/owncloud-8.2.0\/lib\/private\/files\/view.php(585): OC\Files\View->basicOperation('file_put_conten...', 'thumbnails\/2517...', Array, '\x89PNG\r\n\x1A\n\x00\x00\x00\rIHD...')\n#9 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(1048): OC\Files\View->file_put_contents('thumbnails\/2517...', '\x89PNG\r\n\x1A\n\x00\x00\x00\rIHD...')\n#10 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(894): OC\Preview->storePreview(2517738, 32, 32)\n#11 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(1162): OC\Preview->resizeAndStore(2517738)\n#12 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(770): OC\Preview->generatePreview(2517738)\n#13 \/var\/www\/owncloud-8.2.0\/lib\/private\/preview.php(795): OC\Preview->getPreview()\n#14 \/var\/www\/owncloud-8.2.0\/core\/ajax\/preview.php(63): OC\Preview->showPreview()\n#15 \/var\/www\/owncloud-8.2.0\/lib\/private\/route\/route.php(154) : runtime-created function(1): require_once('\/var\/www\/ownclo...')\n#16 [internal function]: __lambda_func(Array)\n#17 \/var\/www\/owncloud-8.2.0\/lib\/private\/route\/router.php(282): call_user_func('\x00lambda_83', Array)\n#18 \/var\/www\/owncloud-8.2.0\/lib\/base.php(831): OC\Route\Router->match('\/core\/preview.p...')\n#19 \/var\/www\/owncloud-8.2.0\/index.php(39): OC::handleRequest()\n#20 {main}","File":"\/var\/www\/owncloud-8.2.0\/3rdparty\/doctrine\/dbal\/lib\/Doctrine\/DBAL\/Driver\/AbstractMySQLDriver.php","Line":115}","level":3,"time":"2015-10-21T12:21:41+02:00"}

@nickvergessen
Copy link
Contributor

Duplicate of #20226 and #20555 ?

@michi-roth
Copy link
Contributor

#20555 refers to the oc_file_locks table whereas our problem occurs in the oc_filecache table. Can this to problems be related?

@PVince81
Copy link
Contributor Author

Actually with the new locking this should not happen any more.

But if you observed it on 8.2.0 where the new locking exists, then there might be an additional issue.

@PVince81 PVince81 added this to the 9.0-current milestone Dec 22, 2015
@catborise
Copy link

Hi,
i am using 8.2.2 and windows client 2.1.0

I add a file to sync folder which is a big file about 4Gb.
Firstly, application try to sync that file and give error like insufficent space. Then i change the quota and change the file name...

Then every sync operation gives

Jul 17 09:29:02 sdcoca15 ownCloud[24989]: {remote} An exception occurred while executing 'UPDATE oc_filecache SET mtime = ?, etag=? WHERE (mtime <> ? OR etag <> ? ) AND fileid = ? ' with params [1437118142, "55a8aebee9f3a", 1437118142, "55a8aebee9f3a", "11251183"]:#12#012SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

error.

i am not using redis cache, only php cache...

@PVince81
Copy link
Contributor Author

Please update to 8.2.2, it solves a few locking issues when using DB locking (like stray locks)

@catborise you seem to have a different issue, so you say it always happen when hitting insufficent space ? Maybe that code path somehow doesn't free the database table ?

@michi-roth
Copy link
Contributor

Hello everybody,
we updated to 9.0RC1 on our test system and are still getting the problem.
We are using redis and mariadb.

{"reqId":"SuM+doLziYR6e8RC6p6c","remoteAddr":"xxx.xxx.xxx.xxx","app":"webdav","message":"Exception: {\"Message\":\"An exception occurred while executing 'UPDATE `oc_filecache` SET `mtime` = ?, `etag` = ?, `size`=? WHERE (`mtime` <> ? OR `etag` <> ? OR `size` <> ? OR `mtime` IS NULL OR `etag` IS NULL OR `size` IS NULL) AND `fileid` = ? ' with params [1455720965, \\\"56c48a05b0476\\\", -1, 1455720965, \\\"56c48a05b0476\\\", -1, 2334640]:

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction\",\"Exception\":\"Doctrine\DBAL\Exception\DriverException\",\"Code\":0,\"Trace\":\"#0 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php(116): Doctrine\DBAL\Driver\AbstractMySQLDriver->convertException('An exception oc...', Object(Doctrine\DBAL\Driver\PDOException))
#1 3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(836): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(Doctrine\DBAL\Driver\PDOMySql\Driver), Object(Doctrine\DBAL\Driver\PDOException), 'UPDATE `oc_file...', Array)
#2 lib/private/db/connection.php(183): Doctrine\DBAL\Connection->executeQuery('UPDATE `oc_file...', Array, Array, NULL)
#3 lib/private/files/cache/cache.php(308): OC\DB\Connection->executeQuery('UPDATE `*PREFIX...', Array)
#4 lib/private/files/cache/propagator.php(67): OC\Files\Cache\Cache->update(2334640, Array)
#5 lib/private/files/view.php(1242): OC\Files\Cache\Propagator->propagateChange('.local/share/ev...', 1455720965)
#6 lib/private/files/view.php(1337): OC\Files\View->getCacheEntry(Object(OC\Files\Storage\Wrapper\Encryption), '.local/share/ev...', '/home/.local/sh...')
#7 apps/dav/lib/connector/sabre/directory.php(225): OC\Files\View->getDirectoryContent('/home/.local/sh...')
#8 3rdparty/sabre/dav/lib/DAV/Tree.php(195): OCA\DAV\Connector\Sabre\Directory->getChildren()
#9 3rdparty/sabre/dav/lib/DAV/Server.php(857): Sabre\DAV\Tree->getChildren('home/.local/sha...')
#10 3rdparty/sabre/dav/lib/DAV/Server.php(911): Sabre\DAV\Server->addPathNodesRecursively(Array, Object(Sabre\DAV\PropFind))
#11 3rdparty/sabre/dav/lib/DAV/CorePlugin.php(336): Sabre\DAV\Server->getPropertiesForPath('home/.local/sha...', Array, 1)
#12 [internal function]: Sabre\DAV\CorePlugin->httpPropFind(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
#13 3rdparty/sabre/event/lib/EventEmitterTrait.php(105): call_user_func_array(Array, Array)
#14 3rdparty/sabre/dav/lib/DAV/Server.php(459): Sabre\Event\EventEmitter->emit('method:PROPFIND', Array)
#15 3rdparty/sabre/dav/lib/DAV/Server.php(248): Sabre\DAV\Server->invokeMethod(Object(Sabre\HTTP\Request), Object(Sabre\HTTP\Response))
#16 apps/dav/appinfo/v1/webdav.php(54): Sabre\DAV\Server->exec()
#17 remote.php(137): require_once('/var/www/ownclo...')
#18 {main}\",\"File\":\"3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractMySQLDriver.php\",\"Line\":115}","level":4,"time":"2016-02-17T14:56:05+00:00"}

@PVince81
Copy link
Contributor Author

Some cleanup background jobs can also cause the DB to be locked for too long: #22243

@ghost ghost added this to the 9.0.1-next-maintenance milestone Feb 23, 2016
@ghost ghost removed this from the 9.0-current milestone Feb 23, 2016
@ghost ghost assigned icewind1991 Feb 23, 2016
@MorrisJobke MorrisJobke modified the milestones: 9.0.2-current-maintenance, 9.0.1 Apr 11, 2016
@MorrisJobke
Copy link
Contributor

But if you observed it on 8.2.0 where the new locking exists, then there might be an additional issue.

8.2.2: https://github.com/owncloud/enterprise/issues/1150

@MorrisJobke
Copy link
Contributor

@PVince81 @icewind1991 I could have a look at this instance - what are we looking for? What information is needed to debug this?

@PVince81 PVince81 assigned PVince81 and unassigned icewind1991 Apr 19, 2016
@PVince81
Copy link
Contributor Author

@MorrisJobke not sure. It might depend for what query it happens, then need to find out why it happens in the first place.

@PVince81
Copy link
Contributor Author

Looking at @michi-roth's stack trace here #14757 (comment) it looks like the file was modified outside of ownCloud, and at the time of the PROPFIND ownCloud noticed the change and tried to update the cache and propagate it.

@michi-roth did you explicitly modify files outside of ownCloud, directly in the data folder ? If not, then this might be a side-effect of another issue.
Or potentially an occurrence of a similar issue before where the exception happened after updating the file on-disk but before the cache could be updated.

@PVince81
Copy link
Contributor Author

PVince81 commented Apr 22, 2016

So far there are only to possibly long-running transactions:

Normally as I said before, the new file locking from 8.2/9.0 should be able to prevent the DB code to be reached in the first place.

Let's see whether there are some code paths that slip through the cracks and still produce this condition.

@PVince81
Copy link
Contributor Author

I found this fix 26939a2 but it seems that it was already in 9.0 RC1, where the report has been reported again.

@PVince81
Copy link
Contributor Author

@michi-roth can you check whether you have a setting "file_system_check_changes" in your config.php ? Your last stack trace seems to imply that you have explicitly set this value to 1, while the default is 0. Usually setting it to 0 would solve the issue.

@PVince81
Copy link
Contributor Author

PVince81 commented Apr 22, 2016

For now I'll assume that you had "filesystem_check_changes" set to 1. Or maybe it was an external storage for which update detection is enabled anyway, so the situation is likely to happen there too.

Here is my first test run:

  1. Set "filesystem_check_changes" to 1 in config.php
  2. Create a folder "test" in the web UI
  3. Upload about 70 files into the "test" folder
  4. In the source code, add sleep(10) here in the loop https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/cache/cache.php#L521
  5. Rename "test" to "test2" in the web UI.
  6. Set breakpoint in https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/cache/scanner.php#L364
  7. On the CLI with cadaver, do a "ls" on the root folder, the one containing the "test" folder

The sleep call will make sure that every entry in the folder takes 10 to be renamed in the database.
Then when doing the "ls" / PROPFIND, continued debugging until the scanner tries to scan "files/test2". At this point, observe that the lock is preventing it to enter the folder and returns the old cache data.

This test did NOT reproduce the issue and showed that locking worked properly so far.

@PVince81
Copy link
Contributor Author

Side note: the web UI's regular "getstoragestats.php" call will also trigger a scan while trying to computer the used space. That one also successfully was excluded thanks to the lock.

@PVince81
Copy link
Contributor Author

Next step: try the reverse condition: have the scanner run first and then try and rename.

@PVince81
Copy link
Contributor Author

So far I could not reproduce this exact issues. All my attempts at reproducing it lead to a deadlock on oc_file_lock, but not oc_filecache, which were documented here: #20555 (comment)

@PVince81
Copy link
Contributor Author

I'm out of ideas for this and was unable to reproduce the deadlock on oc_filecache.
With a bit of luck the fix from #20555 (comment) might help mitigate/eliminate this case altogether.

Moving to 9.0.3 to keep in scope, and hopefully you guys will have upgraded to 9.0.2 and reported back whether the fix worked.

Thanks!

@PVince81 PVince81 modified the milestones: 9.0.3-next-maintenance, 9.0.2-current-maintenance Apr 25, 2016
@PVince81
Copy link
Contributor Author

PVince81 commented Jun 2, 2016

Is anyone still seeing this in 9.0.2 ?
Also make sure to have the proper transaction isolation mode: https://doc.owncloud.org/server/9.0/admin_manual/configuration_database/linux_database_configuration.html#mysql-mariadb-with-binary-logging-enabled

@michi-roth
Copy link
Contributor

We were able to find the reason for the error in our configuration. The problem was the MariaDB Galera cluster when write requests were sent to all servers deadlock can occur. The solution was to send all write requests to a single server.
A more detailed explanation can be found under:
http://severalnines.com/blog/avoiding-deadlocks-galera-set-haproxy-single-node-writes-and-multi-node-reads

@centredecalcul
Copy link

Verified ! No error since I changed my HAproxy configuration, thank you !

@PVince81
Copy link
Contributor Author

@michi-roth thanks a lot for the hint. I raised a ticket for the documentation, maybe having it as a hint there would help other people with this issue: owncloud-archive/documentation#2490

@lock
Copy link

lock bot commented Aug 4, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants