Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commits silently fail after node quorum rejects changes (Percona cluster + Galera plugin) #3175

Closed
tonykiselis opened this issue Jun 5, 2018 · 10 comments

Comments

@tonykiselis
Copy link

Bug Report

Q A
BC Break no
Version ^2.0

Summary

Commit from code side passes as if commit was successful when no data was written into the database. Issue reveals itself when Doctrine is being used with Percona multi-master cluster with galera plugin.

Commits silently fail. You end up with "lost data" as rest of the code executes as if everything went fine.

Establish Percona database cluster with galera plugin. Run multiple queries which would use Pessimistic lock to different nodes at the same time. Durining the quorum the requested will be rejected. However, the transanction from code side will pass as if commit was successful despite of it.

To fix: Doctrine\DBAL\Connection::commit() consistently returns false when commit didn't happen. However, this information is tightly wrapped and inaccessible from the Doctrine's API.

Best in my opinion: thow exception on Doctrine\ORM\Connection::commit() when it calls Doctrine\DBAL\Connection::commit() and it returns false. This will uphold the consistency instead of dealing with bool values.

To have exception being thrown after failure to commit in this instance.

@morozov
Copy link
Member

morozov commented Jun 5, 2018

Looks like all implementors of the Connection::commit() violate the signature and return nothing and/or throw an exception in the case of failure. The wrapper connection even re-declares the return type as void.

As part of 2.x, we can return the boolean value consistently across platforms, in 3.x (#3005) we should start throwing exceptions.

@tonykiselis what about rollbacks? Besides having been called in a wrong state, can it fail and therefore have to be handled by the caller?

@morozov morozov added this to the 2.8.0 milestone Jun 5, 2018
@morozov morozov self-assigned this Jun 5, 2018
@tonykiselis
Copy link
Author

tonykiselis commented Jun 5, 2018

I found that weird too. Although, it's also kinda weird that in most cases it should throw exception so you can wrap commit in a try/catch, but on deeper level it returns bool?

Well, there is no rollback. I mean, you can try/catch commit which fails in this case and it will pass as data would have been committed. Although no data was actually committed. I don't feel qualified to comment on inner workings of Percona / galera cluster, but I recall that changes are 'discarded' if nodes fail to find consensus during the quorum.

This happens, as I've mentioned, when multiple nodes try to write into same record. Nodes communicate and ask other members of the cluster if it's okey for them to commit. If some other node currently has that row locked, transaction doesn't proceed and is discarded. However, from code side in doctrine, commit does not throw exception on this part and you end up with nasty situation.

Currently hacking around the problem by extending DBAL\Connection class, copying it in it's entirety due to one private property $_transactionNestingLevel and modifying commit method to throw exception if $this->_conn->commit() returns false.

@morozov
Copy link
Member

morozov commented Jun 5, 2018

Although, it's also kinda weird that in most cases it should throw exception so you can wrap commit in a try/catch, but on deeper level it returns bool?

According to the method signature,

/**
* Commits a transaction.
*
* @return bool TRUE on success or FALSE on failure.
*/
public function commit();

neither of the implementors are supposed to throw exceptions but are supposed to return booleans. This indeed is not OO-friendly and most likely was borrowed from PDO.

Well, there is no rollback. I mean, you can try/catch commit which fails in this case and it will pass as data would have been commited. Although none is present.

Sorry, I should have asked it clearer. My question was, should we fix the rollBack() implementations in the same way as the commit() ones? Right now the method signature

/**
* Rolls back the current transaction, as initiated by beginTransaction().
*
* @return bool TRUE on success or FALSE on failure.
*/
public function rollBack();

suggests that implementors should return a boolean value, however, none (or at least not all) of them do. From your experience with Percona, is it possible that a rollback fails in the way a caller can/should handle the failure?

@tonykiselis
Copy link
Author

Our issue, well, of which we are at least aware of now is with commit silent failures in described circumstances. Cannot really comment on rollbacks. I would need to check on that, now that you've mentioned it.

@morozov
Copy link
Member

morozov commented Jun 12, 2018

While the fix for this issue seems trivial, I'm having a hard time creating a functional test which would fail on COMMIT using a regular setup (w/o replication or messing around with the DB internals). @Ocramius, @Majkl578, any ideas?

@Majkl578
Copy link
Contributor

I only have one idea: violating an initially deferred foreign key that fails on COMMIT, not after the statement which violated the constraint.

@morozov
Copy link
Member

morozov commented Jun 14, 2018

Besides the reported bug, I see the following issues which don't help to fix it:

  1. Inconsistent indication of the commit failure across drivers. Despite the fact that the interface states that commit() will return a boolean, only the MySQLi driver does that. All others throw an exception.

    We can easily make MySQLi throw the exception and solve the problem but it would be against the interface and would be a BC break.

  2. The implementations of the rollBack() are in the exact same inconsistent state. IMO, if we fix commit(), we should fix rollBack() in the same way.

  3. The connection wrapper can use savepoints which are implemented via Connection::exec() which in turn can throw all sorts of exceptions. Therefore, even if we implemen returning a boolean from the wrapper, it still can throw exceptions.

With the above said and the fact a commit failure cannot be consistently reproduced across the platforms, can we just have a patch like this without any tests?

     /**
      * Commits the current transaction.
      *
-     * @return void
+     * {@inheritDoc}
      *
      * @throws \Doctrine\DBAL\ConnectionException If the commit failed due to no active transaction or
      *                                            because the transaction was marked for rollback only.
@@ -1305,12 +1305,13 @@ class Connection implements DriverConnection
         $this->connect();
 
         $logger = $this->_config->getSQLLogger();
+        $result = true;
 
         if ($this->_transactionNestingLevel == 1) {
             if ($logger) {
                 $logger->startQuery('"COMMIT"');
             }
-            $this->_conn->commit();
+            $result = $this->_conn->commit();
             if ($logger) {
                 $logger->stopQuery();
             }
@@ -1329,6 +1330,8 @@ class Connection implements DriverConnection
         if (false === $this->autoCommit && 0 === $this->_transactionNestingLevel) {
             $this->beginTransaction();
         }
+
+        return $result;
     }
 
     /**

@morozov morozov modified the milestones: 2.8.0, 2.9.0 Jun 19, 2018
@morozov morozov removed this from the 2.9.0 milestone Dec 2, 2018
@pulzarraider
Copy link
Contributor

This is probably php/PDO issue: https://bugs.php.net/bug.php?id=66528

@morozov
Copy link
Member

morozov commented Jun 26, 2019

To fix: Doctrine\DBAL\Connection::commit() consistently returns false when commit didn't happen. However, this information is tightly wrapped and inaccessible from the Doctrine's API.

I'm closing this issue since the originally suggested change has been implemented in #3588. If the problem still persists, we can reopen it or file a new more specific issue.

@morozov morozov closed this as completed Jun 26, 2019
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants