Management of record locks #3631

martingg88 · 2015-02-20T04:53:22Z

do you have any idea about following issue?

caused by: com.orientechnologies.common.concur.OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record

lvca · 2015-02-20T09:42:57Z

This could be also a timeout problem. Could you explain better how it happens?

martingg88 · 2015-02-23T03:38:20Z

i do not have any idea what cause about this issue. It will happen but not too often. anyway. i have upgrade orientdb to latest version. let me try the latest version and let you know if i facing this same issue again. thanks

vtliem · 2015-02-25T03:26:25Z

Maybe relative to this #3593

martingg88 · 2015-02-25T07:33:15Z

@ivca the following query cause the OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record. but it wont' happen too often. anyidea?

BEGIN
LET account = SELECT * FROM account WHERE email = "ck@hotmail.com" LIMIT 1
LET user = SELECT EXPAND(in("own")) FROM account WHERE email = "ck@hotmail.com"
LET payment = SELECT EXPAND(out("own")[@Class ="payment"]) FROM $user
COMMIT
RETURN [$account, $user, $payment]

martingg88 · 2015-02-25T13:27:48Z

checked. the issue is related to storage.record.lockTimeout.
should increase the lockTimeout to max value?

andrii0lomakin · 2015-02-26T06:34:45Z

yes, if it fix this issue so no problems with this.

martingg88 · 2015-02-26T07:14:59Z

so what is the right value to be adjusted for locktimeout. it is not too good if i set the locktimeout to a greater/max value..right?

theheapdump · 2015-04-01T16:39:02Z

public int next(OrientBaseGraph graph, String class) {
        String query = String.format(
            "update (select from  osequence  where classname = '%s' ) increment value = 1 RETURN after @this lock record", class) ;
        Iterable<OrientElement> iterable = graph.command(new OCommandSQL(query)).execute();
        return iterable.iterator().next().getProperty("value");
      }

i'm gettingCaused by: com.orientechnologies.common.concur.OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record if i call the above API in a multi threaded environment .

I spawn a thread say 20 times and my run method is

    public void run() {
        ograph = factory.getTx();
            createGraph(oHelper.next( ograph , "Test"));
            ograph.commit();
            ograph.shutdown();
    }

the first thread gets the sequencer but the next n number of threads fail with the exception as in the bug .

i need immediate solution , please help

lvca · 2015-04-01T23:58:33Z

@anmoldeep0123 Why locking the record?

theheapdump · 2015-04-02T01:23:50Z

@lvca need to lock the record or I end up in concurrent modification exception, in multi threaded environment, and we do not want to catch the exception and try the default locking behaviour, I think the optimistic locking.

kowalot · 2015-04-06T18:58:32Z

The same happened to me ( version 2.0.5). I use heavily LOCK RECORD and have never seen that error before with older OrientDB. All my queries are optimized&fast mostly by RID

com.orientechnologies.orient.core.exception.ODatabaseException: Error on retrieving record #46:916 (cluster: XXXXX)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:1605)
    at com.orientechnologies.orient.core.tx.OTransactionNoTx.loadRecord(OTransactionNoTx.java:80)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:1437)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:117)
    at com.orientechnologies.orient.core.iterator.OIdentifiableIterator.readCurrentRecord(OIdentifiableIterator.java:278)
    at com.orientechnologies.orient.core.iterator.ORecordIteratorClusters.hasNext(ORecordIteratorClusters.java:147)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.fetchFromTarget(OCommandExecutorSQLSelect.java:1300)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearch(OCommandExecutorSQLSelect.java:433)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.execute(OCommandExecutorSQLSelect.java:391)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:64)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:1189)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:1178)
    at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:71)
    at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.query(ODatabaseDocumentTx.java:630)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLUpdate.execute(OCommandExecutorSQLUpdate.java:214)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:64)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:1189)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:1178)
    at com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:63)
    at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.command(ONetworkProtocolBinary.java:1178)
    at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.executeRequest(ONetworkProtocolBinary.java:385)
    at com.orientechnologies.orient.server.network.protocol.binary.OBinaryNetworkProtocolAbstract.execute(OBinaryNetworkProtocolAbstract.java:220)
    at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:69)
Caused by: com.orientechnologies.common.concur.OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.acquireWriteLock(OAbstractPaginatedStorage.java:1293)
    at com.orientechnologies.orient.core.tx.OTransactionAbstract.lockRecord(OTransactionAbstract.java:118)
    at com.orientechnologies.orient.core.id.ORecordId.lock(ORecordId.java:282)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.lockRecord(OAbstractPaginatedStorage.java:1790)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:1430)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:694)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:1572)
    ... 23 more
``

lvca · 2015-04-06T19:01:47Z

Maybe on highly concurrent clients the default value of 2 seconds it's too low.

Could you try setting this value higher? Put this in last section of config/orientdb-server-config.xml file. Below an example with 10secs:

<entry name="storage.record.lockTimeout" value="10000"/>

kowalot · 2015-04-06T19:28:05Z

Ok, small update. This was a first time when I started to use UPDATE with following syntax

UPDATE Customer SET FormNotificationStage = 2 RETURN AFTER  LOCK RECORD  WHERE FormNotificationStage=1 LIMIT 1

It gets a record from a waiting list by many remote clients (but in that case ONE). Usually I do UPDATE RID SET .... LOCK RECORD and so on.
When I removed/ revamped that command (avoiding UPDATE CLASS) which is single running command (no batches or transactions), everything went to normal state.

Strange but deadlocks appeared on record 0:1 ?!
I hope it helps.

kowalot · 2015-04-06T19:32:43Z

One more info. There is no records matching WHERE criteria (even no records in Customer). It's enough to get stuck with infinite lock.

kowalot · 2015-04-07T18:49:55Z

Update:
Please find below how you can repeat it on any database (2.0.5)
CONSOLE 1:

orientdb> connect remote:localhost/X root root

Connecting to database [remote:localhost/X] with user 'root'...OK
orientdb {db=X}> UPDATE V SET NotExistingField = 2 RETURN AFTER  LOCK RECORD  WHERE NotExistingField=1 LIMIT 1

Updated record(s) '[]' in 0,751000 sec(s).

orientdb {db=X}>

AFTER IT, LAUNCH CONSOLE 2

orientdb> connect remote:localhost/X root root

Connecting to database [remote:localhost/X] with user 'root'...
orientdb> 
Error: com.orientechnologies.common.exception.OException: Error on creation of shared resource

Error: com.orientechnologies.orient.core.exception.ORecordNotFoundException: The record with id '#0:1' not found

Error: com.orientechnologies.orient.core.exception.ODatabaseException: Error on retrieving record #0:1 (cluster: internal)

Error: com.orientechnologies.orient.core.exception.ODatabaseException: Error on retrieving record #0:1 (cluster: internal)

Error: com.orientechnologies.common.concur.OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record

kowalot · 2015-04-12T11:19:06Z

I spent couple of hours analyzing record locking system. Maybe I don't fully understand it (so excuse me if I am wrong) but I strongly advice to review it. Let me share some of my concerns I hope it would be useful to build better OrientDB.

1. UPDATE with LOCK RECORD

EX locks are applied for whole searching process. So let's imagine we execute a query on class with 10 milions of records

UPDATE 10MILLION_OF_RECORD_CLASS SET NotExistingField = 2 RETURN AFTER  LOCK RECORD  WHERE OnlyOneRecordMatchThisExp=1

10 milions of time EX lock will be acquired. Shouldn't a shared lock be upgraded to EX when filtering matches?
Please find below nice article how it works in sql server

http://sqlblog.com/blogs/kalen_delaney/archive/2009/11/13/update-locks.aspx

"While SQL Server is searching, it shouldn’t need to acquire an EXCLUSIVE lock; it only needs the EXCLUSIVE lock when the data to be changed is found"

2. EX locks are not fully released?

Second my big concern is method unlockRecord

https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/storage/impl/local/OAbstractPaginatedStorage.java#L1766

Even if it's correct from some reason (but I believe it's not) it should be better commented why EX is omitted from releasing locks. Someone can blindly believe that it does no matter what kind of strategy is used.

I checked it and in endResponse of binary protocol I could easily find in storage write locks not released during UPDATE LOCK RECORD operation. I believe I shouldn't find any acquired lock in locks buckets of PaginatedStorage.

3. EX lock acquired twice

During UPDATE operation EX lock is acquired and released twice with "kind of break" among locks.

First time here within assignTarget (acquired and released)

https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/sql/OCommandExecutorSQLSelect.java#L420

Second time here (acquired and released)

https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/sql/OCommandExecutorSQLSelect.java#L433

I am not sure what would happen if another transaction would try to do the same interfering during break.

4. LOCKS BUCKETS in the future can generate deadlocks in transactions

OrientDB uses lock bucket based on hash of the RecordID. It's not predictable for end user to modify records in order guaranteeing deadlock avoidance. It might be a problem when locks are kept across transaction scope.

5. DEADLOCK definition

I am not sure if all issues should be classified as a deadlock. Lock acquiring timeout is much better. Since OrientDB keeps lock only for one record classical deadlock is rather not possible unless another locked internal resources are involved in different order.

kowalot · 2015-04-12T11:31:05Z

#3841 may refer to mentioned issues

kowalot · 2015-04-14T20:02:39Z

This library would be really good instead of ReentrantReadWriteLock
https://code.google.com/p/concurrent-locks/
just to have fully upgradable locks

kowalot · 2015-04-15T10:23:29Z

Guys,
I did some test and integrated this library changing a bit how OrientDB handles record locking.
It doesn't remove all problems but all tests passed and it is more open for upgradable locks and has eliminated the problem mentioned by me here (UPDATE CLASS LOCK RECORD)

kowalot@82da242

If you like it I may clean it and prepare PR.

andrii0lomakin · 2015-04-27T08:47:26Z

Hi, implementation of record locks was changed in 2.0 rc1 . Before this version we had situation when UPDATE SQL expressions left some records unlocked. Could you double check now whether you still have this issue ?

andrii0lomakin · 2015-04-27T14:26:46Z

About your comments some of them really worth of considering. I mean 4 and 5. For 5-th item we are going to create lock manager to avoid deadlocks in tx.

andrii0lomakin · 2015-04-27T14:27:41Z

About items from 1 to 3 could you update links to the code so we may discuss them farther.

andrii0lomakin · 2015-04-27T14:29:55Z

About 4-th item we will use map based lock manager.

andrii0lomakin · 2015-04-27T14:33:29Z

About update lock it was not supported since 2013 do not think it is good idea to use not supported library but we may borrow idea and implement our own lock.

kowalot · 2015-04-27T18:12:03Z

The main concern is about OAbstractPaginatedStorage.unlockRecord()

https://github.com/orientechnologies/orientdb/blob/develop/core/src/main/java/com/orientechnologies/orient/core/storage/impl/local/OAbstractPaginatedStorage.java#L1692

As you see it does nothing for all lockingStrategy different than DEFAULT. I found the place where it is released and understood why but it may be error prone in the future.

It would be good to use locking mechanism which calling sequence is immune to missing lock releasing like callInLock()

After fixing #3841 I didnt catch any problem like this.

As far as I remember line

https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/sql/OCommandExecutorSQLSelect.java#L440

acquired lock once for initializing iterator when UPDATE CLASS is being executed.

fetchFromTarget()
https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/sql/OCommandExecutorSQLSelect.java#L453

was a second place when lock was acquired (because of Reetrant) but released only once.

andrii0lomakin · 2015-04-28T10:00:52Z

yes , you are right. such approach is fragile but we work on mechanics when if run everything in tx mode and rollback/commit of tx will unlock all locked records.

andrii0lomakin · 2015-04-29T07:09:52Z

Fixed in issue #3641 .

martingg88 · 2015-05-06T04:42:42Z

I'm very regret that we still have this issue for version 2.1-rc2. can you guys fix this issue again
? thanks

andrii0lomakin · 2015-05-06T06:43:46Z

which one ?

andrii0lomakin · 2015-05-06T06:44:28Z

more precisely in which circumstances you have this issue . Because according to @kowalot this issue is fixed.

martingg88 · 2015-05-06T07:22:41Z

the issue is still the same for #3641 that i have lock timeout and com.orientechnologies.common.concur.lock.OLockException: Cannot unlock a never acquired lock. thanks.

martingg88 · 2015-05-06T13:58:30Z

FYI. make sure you try to run your script at parallel way like what i have mentioned in #3641. I bet you will have this issue if you run any update/select query in parallel.

lvca self-assigned this Feb 20, 2015

lvca added the waiting reply label Apr 6, 2015

lvca added this to the 2.0.7 milestone Apr 6, 2015

lvca removed the waiting reply label Apr 13, 2015

lvca assigned tglman and unassigned lvca Apr 13, 2015

lvca modified the milestones: 2.1-rc2, 2.0.7 Apr 13, 2015

lvca changed the title ~~record deadlock~~ Management of record locks Apr 13, 2015

lvca assigned andrii0lomakin and unassigned tglman Apr 14, 2015

kowalot mentioned this issue Apr 20, 2015

Deadlock issue on mass update #3841

Closed

andrii0lomakin added the in progress label Apr 27, 2015

andrii0lomakin added a commit that referenced this issue Apr 27, 2015

Issue #3631 array based record lock manager was replaced by map based.

2790583

andrii0lomakin removed the in progress label Apr 27, 2015

andrii0lomakin closed this as completed Apr 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Management of record locks #3631

Management of record locks #3631

martingg88 commented Feb 20, 2015

lvca commented Feb 20, 2015

martingg88 commented Feb 23, 2015

vtliem commented Feb 25, 2015

martingg88 commented Feb 25, 2015

martingg88 commented Feb 25, 2015

andrii0lomakin commented Feb 26, 2015

martingg88 commented Feb 26, 2015

theheapdump commented Apr 1, 2015

lvca commented Apr 1, 2015

theheapdump commented Apr 2, 2015

kowalot commented Apr 6, 2015

lvca commented Apr 6, 2015

kowalot commented Apr 6, 2015

kowalot commented Apr 6, 2015

kowalot commented Apr 7, 2015

kowalot commented Apr 12, 2015

kowalot commented Apr 12, 2015

kowalot commented Apr 14, 2015

kowalot commented Apr 15, 2015

andrii0lomakin commented Apr 27, 2015

andrii0lomakin commented Apr 27, 2015

andrii0lomakin commented Apr 27, 2015

andrii0lomakin commented Apr 27, 2015

andrii0lomakin commented Apr 27, 2015

kowalot commented Apr 27, 2015

andrii0lomakin commented Apr 28, 2015

andrii0lomakin commented Apr 29, 2015

martingg88 commented May 6, 2015

andrii0lomakin commented May 6, 2015

andrii0lomakin commented May 6, 2015

martingg88 commented May 6, 2015

martingg88 commented May 6, 2015

Management of record locks #3631

Management of record locks #3631

Comments

martingg88 commented Feb 20, 2015

lvca commented Feb 20, 2015

martingg88 commented Feb 23, 2015

vtliem commented Feb 25, 2015

martingg88 commented Feb 25, 2015

martingg88 commented Feb 25, 2015

andrii0lomakin commented Feb 26, 2015

martingg88 commented Feb 26, 2015

theheapdump commented Apr 1, 2015

lvca commented Apr 1, 2015

theheapdump commented Apr 2, 2015

kowalot commented Apr 6, 2015

lvca commented Apr 6, 2015

kowalot commented Apr 6, 2015

kowalot commented Apr 6, 2015

kowalot commented Apr 7, 2015

kowalot commented Apr 12, 2015

1. UPDATE with LOCK RECORD

2. EX locks are not fully released?

3. EX lock acquired twice

4. LOCKS BUCKETS in the future can generate deadlocks in transactions

5. DEADLOCK definition

kowalot commented Apr 12, 2015

kowalot commented Apr 14, 2015

kowalot commented Apr 15, 2015

andrii0lomakin commented Apr 27, 2015

andrii0lomakin commented Apr 27, 2015

andrii0lomakin commented Apr 27, 2015

andrii0lomakin commented Apr 27, 2015

andrii0lomakin commented Apr 27, 2015

kowalot commented Apr 27, 2015

andrii0lomakin commented Apr 28, 2015

andrii0lomakin commented Apr 29, 2015

martingg88 commented May 6, 2015

andrii0lomakin commented May 6, 2015

andrii0lomakin commented May 6, 2015

martingg88 commented May 6, 2015

martingg88 commented May 6, 2015