Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Management of record locks #3631

Closed
martingg88 opened this issue Feb 20, 2015 · 32 comments
Closed

Management of record locks #3631

martingg88 opened this issue Feb 20, 2015 · 32 comments
Assignees
Milestone

Comments

@martingg88
Copy link

do you have any idea about following issue?

caused by: com.orientechnologies.common.concur.OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record

@lvca
Copy link
Member

lvca commented Feb 20, 2015

This could be also a timeout problem. Could you explain better how it happens?

@lvca lvca self-assigned this Feb 20, 2015
@martingg88
Copy link
Author

i do not have any idea what cause about this issue. It will happen but not too often. anyway. i have upgrade orientdb to latest version. let me try the latest version and let you know if i facing this same issue again. thanks

@vtliem
Copy link

vtliem commented Feb 25, 2015

Maybe relative to this #3593

@martingg88
Copy link
Author

@ivca the following query cause the OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record. but it wont' happen too often. anyidea?

BEGIN
LET account = SELECT * FROM account WHERE email = "ck@hotmail.com" LIMIT 1
LET user = SELECT EXPAND(in("own")) FROM account WHERE email = "ck@hotmail.com"
LET payment = SELECT EXPAND(out("own")[@Class ="payment"]) FROM $user
COMMIT
RETURN [$account, $user, $payment]

@martingg88
Copy link
Author

checked. the issue is related to storage.record.lockTimeout.
should increase the lockTimeout to max value?

@andrii0lomakin
Copy link
Member

yes, if it fix this issue so no problems with this.

@martingg88
Copy link
Author

so what is the right value to be adjusted for locktimeout. it is not too good if i set the locktimeout to a greater/max value..right?

@theheapdump
Copy link
Contributor

public int next(OrientBaseGraph graph, String class) {
        String query = String.format(
            "update (select from  osequence  where classname = '%s' ) increment value = 1 RETURN after @this lock record", class) ;
        Iterable<OrientElement> iterable = graph.command(new OCommandSQL(query)).execute();
        return iterable.iterator().next().getProperty("value");
      }

i'm gettingCaused by: com.orientechnologies.common.concur.OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record if i call the above API in a multi threaded environment .

I spawn a thread say 20 times and my run method is

    public void run() {
        ograph = factory.getTx();
            createGraph(oHelper.next( ograph , "Test"));
            ograph.commit();
            ograph.shutdown();
    }

the first thread gets the sequencer but the next n number of threads fail with the exception as in the bug .

i need immediate solution , please help

@lvca
Copy link
Member

lvca commented Apr 1, 2015

@anmoldeep0123 Why locking the record?

@theheapdump
Copy link
Contributor

@lvca need to lock the record or I end up in concurrent modification exception, in multi threaded environment, and we do not want to catch the exception and try the default locking behaviour, I think the optimistic locking.

@kowalot
Copy link
Contributor

kowalot commented Apr 6, 2015

The same happened to me ( version 2.0.5). I use heavily LOCK RECORD and have never seen that error before with older OrientDB. All my queries are optimized&fast mostly by RID

com.orientechnologies.orient.core.exception.ODatabaseException: Error on retrieving record #46:916 (cluster: XXXXX)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:1605)
    at com.orientechnologies.orient.core.tx.OTransactionNoTx.loadRecord(OTransactionNoTx.java:80)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:1437)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:117)
    at com.orientechnologies.orient.core.iterator.OIdentifiableIterator.readCurrentRecord(OIdentifiableIterator.java:278)
    at com.orientechnologies.orient.core.iterator.ORecordIteratorClusters.hasNext(ORecordIteratorClusters.java:147)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.fetchFromTarget(OCommandExecutorSQLSelect.java:1300)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearch(OCommandExecutorSQLSelect.java:433)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.execute(OCommandExecutorSQLSelect.java:391)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:64)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:1189)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:1178)
    at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:71)
    at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.query(ODatabaseDocumentTx.java:630)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLUpdate.execute(OCommandExecutorSQLUpdate.java:214)
    at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:64)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:1189)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:1178)
    at com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:63)
    at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.command(ONetworkProtocolBinary.java:1178)
    at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.executeRequest(ONetworkProtocolBinary.java:385)
    at com.orientechnologies.orient.server.network.protocol.binary.OBinaryNetworkProtocolAbstract.execute(OBinaryNetworkProtocolAbstract.java:220)
    at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:69)
Caused by: com.orientechnologies.common.concur.OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.acquireWriteLock(OAbstractPaginatedStorage.java:1293)
    at com.orientechnologies.orient.core.tx.OTransactionAbstract.lockRecord(OTransactionAbstract.java:118)
    at com.orientechnologies.orient.core.id.ORecordId.lock(ORecordId.java:282)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.lockRecord(OAbstractPaginatedStorage.java:1790)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:1430)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:694)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:1572)
    ... 23 more
``

@lvca
Copy link
Member

lvca commented Apr 6, 2015

Maybe on highly concurrent clients the default value of 2 seconds it's too low.

Could you try setting this value higher? Put this in last section of config/orientdb-server-config.xml file. Below an example with 10secs:

<entry name="storage.record.lockTimeout" value="10000"/>

@lvca lvca added this to the 2.0.7 milestone Apr 6, 2015
@kowalot
Copy link
Contributor

kowalot commented Apr 6, 2015

Ok, small update. This was a first time when I started to use UPDATE with following syntax

UPDATE Customer SET FormNotificationStage = 2 RETURN AFTER  LOCK RECORD  WHERE FormNotificationStage=1 LIMIT 1

It gets a record from a waiting list by many remote clients (but in that case ONE). Usually I do UPDATE RID SET .... LOCK RECORD and so on.
When I removed/ revamped that command (avoiding UPDATE CLASS) which is single running command (no batches or transactions), everything went to normal state.

Strange but deadlocks appeared on record 0:1 ?!
I hope it helps.

@kowalot
Copy link
Contributor

kowalot commented Apr 6, 2015

One more info. There is no records matching WHERE criteria (even no records in Customer). It's enough to get stuck with infinite lock.

@kowalot
Copy link
Contributor

kowalot commented Apr 7, 2015

Update:
Please find below how you can repeat it on any database (2.0.5)
CONSOLE 1:

orientdb> connect remote:localhost/X root root

Connecting to database [remote:localhost/X] with user 'root'...OK
orientdb {db=X}> UPDATE V SET NotExistingField = 2 RETURN AFTER  LOCK RECORD  WHERE NotExistingField=1 LIMIT 1

Updated record(s) '[]' in 0,751000 sec(s).

orientdb {db=X}> 

AFTER IT, LAUNCH CONSOLE 2

orientdb> connect remote:localhost/X root root

Connecting to database [remote:localhost/X] with user 'root'...
orientdb> 
Error: com.orientechnologies.common.exception.OException: Error on creation of shared resource

Error: com.orientechnologies.orient.core.exception.ORecordNotFoundException: The record with id '#0:1' not found

Error: com.orientechnologies.orient.core.exception.ODatabaseException: Error on retrieving record #0:1 (cluster: internal)

Error: com.orientechnologies.orient.core.exception.ODatabaseException: Error on retrieving record #0:1 (cluster: internal)

Error: com.orientechnologies.common.concur.OTimeoutException: Can not lock record for 2000 ms. seems record is deadlocked by other record

@kowalot
Copy link
Contributor

kowalot commented Apr 12, 2015

I spent couple of hours analyzing record locking system. Maybe I don't fully understand it (so excuse me if I am wrong) but I strongly advice to review it. Let me share some of my concerns I hope it would be useful to build better OrientDB.

1. UPDATE with LOCK RECORD

  • EX locks are applied for whole searching process. So let's imagine we execute a query on class with 10 milions of records
UPDATE 10MILLION_OF_RECORD_CLASS SET NotExistingField = 2 RETURN AFTER  LOCK RECORD  WHERE OnlyOneRecordMatchThisExp=1 

10 milions of time EX lock will be acquired. Shouldn't a shared lock be upgraded to EX when filtering matches?
Please find below nice article how it works in sql server

http://sqlblog.com/blogs/kalen_delaney/archive/2009/11/13/update-locks.aspx

"While SQL Server is searching, it shouldn’t need to acquire an EXCLUSIVE lock; it only needs the EXCLUSIVE lock when the data to be changed is found"

2. EX locks are not fully released?

Second my big concern is method unlockRecord

https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/storage/impl/local/OAbstractPaginatedStorage.java#L1766

Even if it's correct from some reason (but I believe it's not) it should be better commented why EX is omitted from releasing locks. Someone can blindly believe that it does no matter what kind of strategy is used.

I checked it and in endResponse of binary protocol I could easily find in storage write locks not released during UPDATE LOCK RECORD operation. I believe I shouldn't find any acquired lock in locks buckets of PaginatedStorage.

3. EX lock acquired twice

During UPDATE operation EX lock is acquired and released twice with "kind of break" among locks.

First time here within assignTarget (acquired and released)

https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/sql/OCommandExecutorSQLSelect.java#L420

Second time here (acquired and released)

https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/sql/OCommandExecutorSQLSelect.java#L433

I am not sure what would happen if another transaction would try to do the same interfering during break.

4. LOCKS BUCKETS in the future can generate deadlocks in transactions

OrientDB uses lock bucket based on hash of the RecordID. It's not predictable for end user to modify records in order guaranteeing deadlock avoidance. It might be a problem when locks are kept across transaction scope.

5. DEADLOCK definition

I am not sure if all issues should be classified as a deadlock. Lock acquiring timeout is much better. Since OrientDB keeps lock only for one record classical deadlock is rather not possible unless another locked internal resources are involved in different order.

@kowalot
Copy link
Contributor

kowalot commented Apr 12, 2015

#3841 may refer to mentioned issues

@lvca lvca assigned tglman and unassigned lvca Apr 13, 2015
@lvca lvca modified the milestones: 2.1-rc2, 2.0.7 Apr 13, 2015
@lvca lvca changed the title record deadlock Management of record locks Apr 13, 2015
@kowalot
Copy link
Contributor

kowalot commented Apr 14, 2015

This library would be really good instead of ReentrantReadWriteLock
https://code.google.com/p/concurrent-locks/
just to have fully upgradable locks

@lvca lvca assigned andrii0lomakin and unassigned tglman Apr 14, 2015
@kowalot
Copy link
Contributor

kowalot commented Apr 15, 2015

Guys,
I did some test and integrated this library changing a bit how OrientDB handles record locking.
It doesn't remove all problems but all tests passed and it is more open for upgradable locks and has eliminated the problem mentioned by me here (UPDATE CLASS LOCK RECORD)

kowalot@82da242

If you like it I may clean it and prepare PR.

@andrii0lomakin
Copy link
Member

Hi, implementation of record locks was changed in 2.0 rc1 . Before this version we had situation when UPDATE SQL expressions left some records unlocked. Could you double check now whether you still have this issue ?

@andrii0lomakin
Copy link
Member

About your comments some of them really worth of considering. I mean 4 and 5. For 5-th item we are going to create lock manager to avoid deadlocks in tx.

@andrii0lomakin
Copy link
Member

About items from 1 to 3 could you update links to the code so we may discuss them farther.

@andrii0lomakin
Copy link
Member

About 4-th item we will use map based lock manager.

@andrii0lomakin
Copy link
Member

About update lock it was not supported since 2013 do not think it is good idea to use not supported library but we may borrow idea and implement our own lock.

@kowalot
Copy link
Contributor

kowalot commented Apr 27, 2015

The main concern is about OAbstractPaginatedStorage.unlockRecord()

https://github.com/orientechnologies/orientdb/blob/develop/core/src/main/java/com/orientechnologies/orient/core/storage/impl/local/OAbstractPaginatedStorage.java#L1692

As you see it does nothing for all lockingStrategy different than DEFAULT. I found the place where it is released and understood why but it may be error prone in the future.

It would be good to use locking mechanism which calling sequence is immune to missing lock releasing like callInLock()

After fixing #3841 I didnt catch any problem like this.

As far as I remember line

https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/sql/OCommandExecutorSQLSelect.java#L440

acquired lock once for initializing iterator when UPDATE CLASS is being executed.

fetchFromTarget()
https://github.com/orientechnologies/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/sql/OCommandExecutorSQLSelect.java#L453

was a second place when lock was acquired (because of Reetrant) but released only once.

@andrii0lomakin
Copy link
Member

yes , you are right. such approach is fragile but we work on mechanics when if run everything in tx mode and rollback/commit of tx will unlock all locked records.

@andrii0lomakin
Copy link
Member

Fixed in issue #3641 .

@martingg88
Copy link
Author

I'm very regret that we still have this issue for version 2.1-rc2. can you guys fix this issue again
? thanks

@andrii0lomakin
Copy link
Member

which one ?

@andrii0lomakin
Copy link
Member

more precisely in which circumstances you have this issue . Because according to @kowalot this issue is fixed.

@martingg88
Copy link
Author

the issue is still the same for #3641 that i have lock timeout and com.orientechnologies.common.concur.lock.OLockException: Cannot unlock a never acquired lock. thanks.

@martingg88
Copy link
Author

FYI. make sure you try to run your script at parallel way like what i have mentioned in #3641. I bet you will have this issue if you run any update/select query in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants