Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Orientdb loses data when a node is restarted while under load #8162

Closed
KluSe opened this issue Mar 19, 2018 · 2 comments
Closed
Assignees

Comments

@KluSe
Copy link

KluSe commented Mar 19, 2018

OrientDB Version: 2.3.30

Java Version: 1.8.0_45

OS: Ubuntu 14.0.4.3 LTS

Distributed Config: 3 nodes, no sharding

Expected behavior

When a node is rebooted and misses writes because of that, these writes should be replicated when the node enters the cluster again.

Actual behavior

Rebooting a node while there are concurrent write accesses on all nodes sometimes causes the node to lose a record and not replicate it afterwards.

Steps to reproduce

We run a test cluster with three nodes behind a load balancer, each with a java-based application server (Apache Karaf). We start a load test that accesses a java-based REST-API which then uses the Orientdb-client to make a write-access at the local node. Our load test runs 10 threads in parallel load balanced across all nodes. We use one technical user, and the writes consist of two new vertexes and an edge connecting the two wrapped in a transaction, without unique indexes.

When we reboot the second node (clean shutdown, no kill -9 or poweroff), the other two nodes continue working. But after the second node rejoins the cluster, sometimes certain records are missing:

Query on node 1 ("ErzeugtAm" is the createDate):

orientdb {db=b-OS}> SELECT @Rid, ErzeugtAm FROM TAAAngebot WHERE ErzeugtAm < '2018-03-19 08:39:35' AND ErzeugtAm > '2018-03-19 08:39:20';

+----+-------+-------------------+
|# |RID |ErzeugtAm |
+----+-------+-------------------+
|0 |#57:747|2018-03-19 08:39:28|
|1 |#58:384|2018-03-19 08:39:23|
|2 |#58:385|2018-03-19 08:39:31|
|3 |#59:436|2018-03-19 08:39:21|
|4 |#59:437|2018-03-19 08:39:29|
|5 |#59:438|2018-03-19 08:39:34|
+----+-------+-------------------+

6 item(s) found. Query executed in 0.025 sec(s).

Query on node 2:

orientdb {db=b-OS}> SELECT @Rid, ErzeugtAm FROM TAAAngebot WHERE ErzeugtAm < '2018-03-19 08:39:35' AND ErzeugtAm > '2018-03-19 08:39:20';

+----+-------+-------------------+
|# |RID |ErzeugtAm |
+----+-------+-------------------+ <--- record #57:747 is missing!
|0 |#58:384|2018-03-19 08:39:23|
|1 |#58:385|2018-03-19 08:39:31|
|2 |#59:436|2018-03-19 08:39:21|
|3 |#59:437|2018-03-19 08:39:29|
|4 |#59:438|2018-03-19 08:39:34|
+----+-------+-------------------+

5 item(s) found. Query executed in 0.066 sec(s).

Query on node 3:

orientdb {db=b-OS}> SELECT @Rid, ErzeugtAm FROM TAAAngebot WHERE ErzeugtAm < '2018-03-19 08:39:35' AND ErzeugtAm > '2018-03-19 08:39:20';

+----+-------+-------------------+
|# |RID |ErzeugtAm |
+----+-------+-------------------+
|0 |#57:747|2018-03-19 08:39:28|
|1 |#58:384|2018-03-19 08:39:23|
|2 |#58:385|2018-03-19 08:39:31|
|3 |#59:436|2018-03-19 08:39:21|
|4 |#59:437|2018-03-19 08:39:29|
|5 |#59:438|2018-03-19 08:39:34|
+----+-------+-------------------+

6 item(s) found. Query executed in 0.02 sec(s).

orientdb {db=b-OS}> list clusters

CLUSTERS (collections)
+----+--------------------------------------+----+------------------------------------+-----+------------+---------------+--------------------+
|# |NAME | ID|CLASS |COUNT|OWNER_SERVER| OTHER_SERVERS |AUTO_DEPLOY_NEW_NODE|
+----+--------------------------------------+----+------------------------------------+-----+------------+---------------+--------------------+
...
|87 |taaangebot | 57|TAAAngebot | 179| b-OS-3 |[b-OS-2,b-OS-1]| true |
|88 |taaangebot_1 | 58|TAAAngebot | 95| b-OS-1 |[b-OS-2,b-OS-3]| true |
|89 |taaangebot_2 | 59|TAAAngebot | 94| b-OS-1 |[b-OS-2,b-OS-3]| true |
|90 |taaangebot_3 | 60|TAAAngebot | 18| b-OS-2 |[b-OS-1,b-OS-3]| true |
...

There are no visible errors in the logfiles:
server-1.log
server-2.log
server-3.log

Regards,
KluSe

@tglman
Copy link
Member

tglman commented Apr 16, 2018

Hi @KluSe,

Similar issue have been already fixed in recent hotfixes, please update to the latest(2.2.34) hotfix.

Let us know if you can still reproduce the problem with the last hotfix.

Regards

@tglman tglman self-assigned this Apr 16, 2018
@denislemercier
Copy link

Hello,

I had the same issue with Orientdb 3.0.4, few documents are missing after a node restart.

I found this :
When a node is down, if I remove the entire database instance folder and restart the node, it works fine (= transfers the entire database).

Maybe the function OSyncDatabaseDeltaTask causes a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants