Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributing Shards Between Servers #8234

Closed
hossein-md opened this issue Apr 23, 2018 · 19 comments
Closed

Distributing Shards Between Servers #8234

hossein-md opened this issue Apr 23, 2018 · 19 comments

Comments

@hossein-md
Copy link

OrientDB Version: 2.2.33

OS: Linux

Expected behavior

I have 3 servers and use the following configs (default-distributed-db-config.json):

{
  "autoDeploy": true,
  "readQuorum": 1,
  "writeQuorum": "majority",
  "executionMode": "undefined",
  "readYourWrites": true,
  "newNodeStrategy": "static",
  "servers": {
    "*": "master"
  },
  "clusters": {
    "internal": {
    },
    "shard_01": {
      "servers": ["n01"]
    },
    "shard_02": {
      "servers": ["n01","n02"]
    },
    "shard_03": {
      "servers": ["n03"]
    },
    "*": {
      "servers": ["<NEW_NODE>"]
    }
  }
}

I insert some nodes (Using java api, graph model) I have 2 problem (Actually Three!)

  1. I expect the nodes go to every shard according to the policy "e.g., round-robin"
  2. I expect the size of database folder for 'n01' is more than the 'n02' and 'n03'

Actual behavior

Here's my problems:
First:
All the nodes go to the one server. e.g., if I use this OrientGraph graph = new OrientGraph(n01-ip) all the nodes go to the shard_01 (not even shard_02!)
My code is: graph.addVertex("class:" + className, properties)

Second:
The size of database folder for servers 'n02' and 'n03' had been less than the size of database folder server 'n01' before I closed 'n02' and 'n03'. When I shutdown the 'n02' and 'n03' and delete their databases, the database is copied from n01 to n02 and n03 and the size of all the database folders are equal.

Third:
I can't read data! I use this code: ( FYI: keyName and keyValue is unique property)

Iterator<Vertex> iterator = graph.getVertices(className + "." + keyName, keyValue).iterator();

And I get this error from my java output:

Exception in thread "main" com.orientechnologies.orient.server.distributed.ODistributedException: No active nodes found to execute command: sql.select rid from index:User.idNumber where key = ?
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.throwSerializedException(OChannelBinaryAsynchClient.java:449)
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:400)
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:283)
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:167)
	at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2362)
	at com.orientechnologies.orient.client.remote.OStorageRemote$27.execute(OStorageRemote.java:1209)
	at com.orientechnologies.orient.client.remote.OStorageRemote$2.execute(OStorageRemote.java:206)
	at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:251)
	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetry(OStorageRemote.java:203)
	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperation(OStorageRemote.java:214)
	at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1185)
	at com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:69)
	at com.orientechnologies.orient.core.index.OIndexRemoteOneValue.get(OIndexRemoteOneValue.java:46)
	at com.orientechnologies.orient.core.index.OIndexRemoteOneValue.get(OIndexRemoteOneValue.java:36)
	at com.orientechnologies.orient.core.index.OIndexAbstractDelegate.get(OIndexAbstractDelegate.java:58)
	at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:262)
	at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:40)
	at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.getVertices(OrientBaseGraph.java:780)
	at graphdb.Orientdb.findVertex(Orientdb.java:10)

And this is from "dserver.sh":

11:38:42:910 WARNI I/O Error on distributed channel (clientId=8 reqType=121 error=java.io.InvalidClassException: com.orientechnologies.orient.core.sql.query.OBasicResultSet; no valid constructor)
11:38:53:001 WARNI [n01] Timeout (10001ms) on waiting for synchronous responses from nodes=[n03, n02] responsesSoFar=[n02] request=(id=0.56 task=gossip timestamp: 1524467322999 lockManagerServer: n01)
11:38:53:002 WARNI [n01]->[n03] Server 'n03' did not respond to the gossip message (db=myDB, timeout=10000ms), but cannot be set OFFLINE by configuration

Does anyone run the Orientdb in the cloud way that all the servers haven't all the shards?

Thanks

@hossein-md
Copy link
Author

hossein-md commented Apr 23, 2018

More information!
I test this:
Using the 'Console.sh' I have connected to the 'n01' and run this query:

select * from cluster:shard_01 limit 10

The output is correct!
And I run this

select * from MyClass limit 10 (FYI: shard_01, shard_02, shard_03 are the shards of myClass)

It returns the following error:

Error: com.orientechnologies.orient.server.distributed.ODistributedException: No active nodes found to execute command: sql.select * from MyClass limit 10

@hossein-md
Copy link
Author

I found an issue that has one part of my problem! (issue: #7887)

@bipulkarnani
Copy link

bipulkarnani commented Apr 23, 2018

Hi @hossein-md ,

Probable reason to your first issue: I do not know if you have gone through this issue: #7110 where @tglman mentions about this:

From what you say it seems that you are looking for the ROUND_ROBIN_REQUEST setting instead of ROUND_ROBIN_CONNECT, the difference is that ROUND_ROBIN_CONNECT choose a server each new session is created (that in your case is limitated by the pool) instead ROUND_ROBIN_REQUEST each time you execute an remote operation (like a query).

And there is another concept in sharding which you are doing: https://orientdb.com/docs/last/Distributed-Sharding.html#cluster-locality

@hossein-md
Copy link
Author

Thanks @bipulkarnani
I think one of my problem is half-solved!
I thought I had to create the database in not-distributed mode (As https://orientdb.com/docs/last/Distributed-Architecture.html) because here it says "With releases < v2.2.6 the creation of a database on multiple nodes could cause synchronization problems when clusters are automatically created. Please create the databases before to run in distributed mode." and I thought we are in version 2.2.33 < 2.2.6 !!!!!
When I build my database in "dserver.sh" the nodes go over servers "n01" and "n02" and according to the "Cluster Locality" it seems correct. But what should I do if I want to sent the nodes into the 'n03'?
Thanks

@bipulkarnani
Copy link

bipulkarnani commented Apr 23, 2018

Hi @hossein-md ,

I also tried to rotate my connection to different nodes in cluster and saw the similar behaviour that it always connects to first node in the connection string. Hence I am not sure about how to write in node n03.

@saeedtabrizi
Copy link
Contributor

@hossein-md
Just i test your needed scenario with ODB v3.0.0 and it's working normally .
I attach some output in here
image
image
image
image

I guess you have misconfiguration problem for distribution mode .
Saeed Tabrizi

@saeedtabrizi
Copy link
Contributor

Hi @luigidellaquila
I think this issue can be close .
Thanks

@hossein-md
Copy link
Author

Hi @saeedtabrizi
Thank you for your test.
I use version 2.2.33 and I haven't tested version 3.0.0. But I share all my steps with you. Could you please tell me my misconfigured part(s)?

  1. I've copied two instance of Orientdb in my PC (orientdb-community-importers-2.2.33.zip)
    foldername

  2. I run n01, n02 with 'server.bat' and set the password 123. the servers are stopped by using ctrl + c
    Output_n01
    Output_n02

  3. I've changed the 'default-distributed-db-config.json':

{
  "autoDeploy": true,
  "readQuorum": 1,
  "writeQuorum": "majority",
  "executionMode": "undefined",
  "readYourWrites": true,
  "newNodeStrategy": "static",
  "servers": {
    "*": "master"
  },
  "clusters": {
    "internal": {
    },
	"user":{
         "servers" : [ "n01" ]
    },
    "user_1":{
         "servers" : [ "n02" ]
    },
    "*": {
      "servers": ["<NEW_NODE>"]
    }
  }
}

  1. I've changed the 'hazelcast.xml'. Just the network part. (Add tcp-ip, enable = "false" for multicast) for both n01 and n02.
	<network>
		<port auto-increment="true">2434</port>
		<join>
			<multicast enabled="false">
				<multicast-group>235.1.1.1</multicast-group>
				<multicast-port>2434</multicast-port>
			</multicast>
			<tcp-ip enabled="true">
				<member>localhost:2434</member>
				<member>localhost:2435</member>
			 </tcp-ip>
		</join>
	</network>
  1. I run 'dserver.bat'. First n01 then n02
    Set n01 and n02
    Node name [BLANK=auto generate it]: n01
    Node name [BLANK=auto generate it]: n02

n02 output


+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|Name  |Status|Databases                           |Conns|StartedOn|Binary           |HTTP             |UsedMemory             |
+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|n02(*)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|2    |06:32:53 |192.168.56.1:2425|192.168.56.1:2481|107.67MB/3.56GB (2.96%)|
|n01(@)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|1    |06:31:36 |192.168.56.1:2424|192.168.56.1:2480|108.83MB/3.56GB (2.99%)|
+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+

2018-05-17 06:33:11:857 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql]
2018-05-17 06:33:11:986 INFO  OrientDB Studio available at http://192.168.56.1:2481/studio/index.html
2018-05-17 06:33:11:987 INFO  OrientDB Server is active v2.2.33 (build 77584cd6827f647cf4aa231cf27bd6f10bc04e2c, branch 2.2.x).

n01 output

CLUSTER CONFIGURATION [wQuorum: true] (LEGEND: X = Owner, o = Copy)
+-------------+----+-----------+----------+------+------+
|             |    |           |          |MASTER|MASTER|
|             |    |           |          |ONLINE|ONLINE|
|             |    |           |          |static|static|
+-------------+----+-----------+----------+------+------+
|CLUSTER      |  id|writeQuorum|readQuorum| n01  | n02  |
+-------------+----+-----------+----------+------+------+
|*            |    |     2     |    1     |  X   |  o   |
|e_4          |  21|     2     |    1     |  o   |  X   |
|e_5          |  22|     2     |    1     |  o   |  X   |
|e_6          |  23|     2     |    1     |  o   |  X   |
|e_7          |  24|     2     |    1     |  o   |  X   |
|followed_by_1|  26|     2     |    1     |  o   |  X   |
|followed_by_2|  27|     2     |    1     |  o   |  X   |
|followed_by_3|  28|     2     |    1     |  o   |  X   |
|followed_by_4|  29|     2     |    1     |  o   |  X   |
|internal     |   0|     2     |    1     |      |      |
|sung_by      |  41|     2     |    1     |  o   |  X   |
|sung_by_1    |  42|     2     |    1     |  o   |  X   |
|sung_by_2    |  43|     2     |    1     |  o   |  X   |
|sung_by_3    |  44|     2     |    1     |  o   |  X   |
|v_3          |  12|     2     |    1     |  o   |  X   |
|v_5          |  14|     2     |    1     |  o   |  X   |
|v_6          |  15|     2     |    1     |  o   |  X   |
|v_7          |  16|     2     |    1     |  o   |  X   |
|written_by   |  33|     2     |    1     |  o   |  X   |
|written_by_4 |  37|     2     |    1     |  o   |  X   |
|written_by_6 |  39|     2     |    1     |  o   |  X   |
|written_by_7 |  40|     2     |    1     |  o   |  X   |
+-------------+----+-----------+----------+------+------+


2018-05-17 06:33:12:465 INFO  [n01] Distributed servers status (*=current @=lockManager[n01]):

+---------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|Name     |Status|Databases                           |Conns|StartedOn|Binary           |HTTP             |UsedMemory             |
+---------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|n02      |ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|2    |06:32:53 |192.168.56.1:2425|192.168.56.1:2481|107.67MB/3.56GB (2.96%)|
|n01(*)(@)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|3    |06:31:36 |192.168.56.1:2424|192.168.56.1:2480|70.22MB/3.56GB (1.93%) |
+---------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
  1. Create a database using studio
    createdb
CLUSTER CONFIGURATION [wQuorum: true] (LEGEND: X = Owner, o = Copy)
+--------+----+-----------+----------+------+------+
|        |    |           |          |MASTER|MASTER|
|        |    |           |          |ONLINE|ONLINE|
|        |    |           |          |static|static|
+--------+----+-----------+----------+------+------+
|CLUSTER |  id|writeQuorum|readQuorum| n01  | n02  |
+--------+----+-----------+----------+------+------+
|*       |    |     2     |    1     |  X   |  o   |
|e_4     |  21|     2     |    1     |  o   |  X   |
|e_5     |  22|     2     |    1     |  o   |  X   |
|e_6     |  23|     2     |    1     |  o   |  X   |
|e_7     |  24|     2     |    1     |  o   |  X   |
|internal|   0|     2     |    1     |      |      |
|user    |    |     2     |    1     |  X   |      |
|user_1  |    |     2     |    1     |      |  X   |
|v_3     |  12|     2     |    1     |  o   |  X   |
|v_5     |  14|     2     |    1     |  o   |  X   |
|v_6     |  15|     2     |    1     |  o   |  X   |
|v_7     |  16|     2     |    1     |  o   |  X   |
+--------+----+-----------+----------+------+------+
2018-05-17 06:41:22:478 INFO  [n01] Distributed servers status (*=current @=lockManager[n01]):

+---------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|Name     |Status|Databases                           |Conns|StartedOn|Binary           |HTTP             |UsedMemory             |
+---------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|n02      |ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|2    |06:32:53 |192.168.56.1:2425|192.168.56.1:2481|138.76MB/3.56GB (3.81%)|
|         |      |MyDB=ONLINE (MASTER)                |     |         |                 |                 |                       |
|n01(*)(@)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|6    |06:31:36 |192.168.56.1:2424|192.168.56.1:2480|73.57MB/3.56GB (2.02%) |
|         |      |MyDB=ONLINE (MASTER)                |     |         |                 |                 |                       |
+---------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
CLUSTER CONFIGURATION [wQuorum: true] (LEGEND: X = Owner, o = Copy)
+--------+----+-----------+----------+------+-------------+
|        |    |           |          |MASTER|   MASTER    |
|        |    |           |          |ONLINE|SYNCHRONIZING|
|        |    |           |          |static|   static    |
+--------+----+-----------+----------+------+-------------+
|CLUSTER |  id|writeQuorum|readQuorum| n01  |     n02     |
+--------+----+-----------+----------+------+-------------+
|*       |    |     2     |    1     |  X   |      o      |
|internal|   0|     2     |    1     |      |             |
|user    |    |     2     |    1     |  X   |             |
|user_1  |    |     2     |    1     |      |      X      |
+--------+----+-----------+----------+------+-------------+


2018-05-17 06:41:14:526 INFO  [n02] Installed database 'MyDB' (LSN=LSN{segment=6, position=52})
2018-05-17 06:41:14:529 INFO  [n02] Publishing ONLINE status for database n02.MyDB...
2018-05-17 06:41:14:535 INFO  [n02] Reassigning ownership of clusters for database MyDB...
2018-05-17 06:41:14:541 INFO  [n02] Reassignment of clusters for database 'MyDB' completed (classes=10)
2018-05-17 06:41:14:545 INFO  [n02] Setting new distributed configuration for database: MyDB (version=13)

CLUSTER CONFIGURATION [wQuorum: true] (LEGEND: X = Owner, o = Copy)
+--------+-----------+----------+------+------+
|        |           |          |MASTER|MASTER|
|        |           |          |ONLINE|ONLINE|
|        |           |          |static|static|
+--------+-----------+----------+------+------+
|CLUSTER |writeQuorum|readQuorum| n01  | n02  |
+--------+-----------+----------+------+------+
|*       |     2     |    1     |  X   |  o   |
|e_4     |     2     |    1     |  o   |  X   |
|e_5     |     2     |    1     |  o   |  X   |
|e_6     |     2     |    1     |  o   |  X   |
|e_7     |     2     |    1     |  o   |  X   |
|internal|     2     |    1     |      |      |
|user    |     2     |    1     |  X   |      |
|user_1  |     2     |    1     |      |  X   |
|v_3     |     2     |    1     |  o   |  X   |
|v_5     |     2     |    1     |  o   |  X   |
|v_6     |     2     |    1     |  o   |  X   |
|v_7     |     2     |    1     |  o   |  X   |
+--------+-----------+----------+------+------+
2018-05-17 06:41:14:551 INFO  [n02] Broadcasting new distributed configuration for database: MyDB (version=13)

2018-05-17 06:41:14:557 INFO  [n02]<-[n01] Received new status n02.MyDB=SYNCHRONIZING
2018-05-17 06:41:14:562 INFO  [n02]<-[n01] Received updated status n01.MyDB=BACKUP
2018-05-17 06:41:14:569 INFO  [n02]<-[n01] Received updated status n01.MyDB=ONLINE
2018-05-17 06:41:14:571 INFO  [n02] Received updated status n02.MyDB=ONLINE
2018-05-17 06:41:22:478 INFO  [n02] Distributed servers status (*=current @=lockManager[n01]):

+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|Name  |Status|Databases                           |Conns|StartedOn|Binary           |HTTP             |UsedMemory             |
+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|n02(*)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|2    |06:32:53 |192.168.56.1:2425|192.168.56.1:2481|138.76MB/3.56GB (3.81%)|
|      |      |MyDB=ONLINE (MASTER)                |     |         |                 |                 |                       |
|n01(@)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|4    |06:31:36 |192.168.56.1:2424|192.168.56.1:2480|48.57MB/3.56GB (1.33%) |
+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+

2018-05-17 06:41:32:117 INFO  [n02] Distributed servers status (*=current @=lockManager[n01]):

+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|Name  |Status|Databases                           |Conns|StartedOn|Binary           |HTTP             |UsedMemory             |
+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|n02(*)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|2    |06:32:53 |192.168.56.1:2425|192.168.56.1:2481|139.09MB/3.56GB (3.82%)|
|      |      |MyDB=ONLINE (MASTER)                |     |         |                 |                 |                       |
|n01(@)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|6    |06:31:36 |192.168.56.1:2424|192.168.56.1:2480|73.57MB/3.56GB (2.02%) |
|      |      |MyDB=ONLINE (MASTER)                |     |         |                 |                 |                       |
+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
  1. Connect to DB using console.bat
orientdb> connect remote:localhost:2424/MyDB root 123

Disconnecting from the database [null]...OK
Connecting to database [remote:localhost:2424/MyDB] with user 'root'...OK

CONFIGURED SERVERS
+----+----+------+-----------+-------------------+-----------------+-----------------+----------------+----------------+---------+
|#   |Name|Status|Connections|StartedOn          |Binary           |HTTP             |UsedMemory      |FreeMemory      |MaxMemory|
+----+----+------+-----------+-------------------+-----------------+-----------------+----------------+----------------+---------+
|0   |n02 |ONLINE|2          |2018-05-17 06:32:53|192.168.56.1:2425|192.168.56.1:2481|149.36MB (4.10%)|152.14MB (4.18%)|3.56GB   |
|1   |n01 |ONLINE|5          |2018-05-17 06:31:36|192.168.56.1:2424|192.168.56.1:2480|89.34MB (2.45%) |226.66MB (6.23%)|3.56GB   |
+----+----+------+-----------+-------------------+-----------------+-----------------+----------------+----------------+---------+
  1. run these lines with console.bat
orientdb {db=MyDB}> ALTER DATABASE MINIMUMCLUSTERS 2

Database updated successfully.

orientdb {db=MyDB}> create class User extends V
Class created successfully. Total classes in database now: 12.

orientdb {db=MyDB}> create property User.name string

Property created successfully with id=1.

orientdb {db=MyDB}> clusters


CLUSTERS (collections)
+----+---------+----+---------+-----+------------+-------------+--------------------+
|#   |NAME     |  ID|CLASS    |COUNT|OWNER_SERVER|OTHER_SERVERS|AUTO_DEPLOY_NEW_NODE|
+----+---------+----+---------+-----+------------+-------------+--------------------+
|0   |_studio  |  25|_studio  |    1|    n01     |    [n02]    |        true        |
|1   |default  |   3|         |    0|    n01     |    [n02]    |        true        |
|2   |e        |  17|E        |    0|    n01     |    [n02]    |        true        |
|3   |e_1      |  18|E        |    0|    n01     |    [n02]    |        true        |
|4   |e_2      |  19|E        |    0|    n01     |    [n02]    |        true        |
|5   |e_3      |  20|E        |    0|    n01     |    [n02]    |        true        |
|6   |e_4      |  21|E        |    0|    n02     |    [n01]    |        true        |
|7   |e_5      |  22|E        |    0|    n02     |    [n01]    |        true        |
|8   |e_6      |  23|E        |    0|    n02     |    [n01]    |        true        |
|9   |e_7      |  24|E        |    0|    n02     |    [n01]    |        true        |
|10  |index    |   1|         |    0|    n01     |    [n02]    |        true        |
|11  |internal |   0|         |    3|            |             |                    |
|12  |manindex |   2|         |    0|    n01     |    [n02]    |        true        |
|13  |ofunction|   6|OFunction|    0|    n01     |    [n02]    |        true        |
|14  |orole    |   4|ORole    |    3|    n01     |    [n02]    |        true        |
|15  |oschedule|   8|OSchedule|    0|    n01     |    [n02]    |        true        |
|16  |osequence|   7|OSequence|    0|    n01     |    [n02]    |        true        |
|17  |ouser    |   5|OUser    |    3|    n01     |    [n02]    |        true        |
|18  |user     |  26|User     |    0|    n01     |             |       false        |
|19  |user_1   |  27|User     |    0|    n02     |             |       false        |
|20  |v        |   9|V        |    0|    n01     |    [n02]    |        true        |
|21  |v_1      |  10|V        |    0|    n01     |    [n02]    |        true        |
|22  |v_2      |  11|V        |    0|    n01     |    [n02]    |        true        |
|23  |v_3      |  12|V        |    0|    n02     |    [n01]    |        true        |
|24  |v_4      |  13|V        |    0|    n01     |    [n02]    |        true        |
|25  |v_5      |  14|V        |    0|    n02     |    [n01]    |        true        |
|26  |v_6      |  15|V        |    0|    n02     |    [n01]    |        true        |
|27  |v_7      |  16|V        |    0|    n02     |    [n01]    |        true        |
+----+---------+----+---------+-----+------------+-------------+--------------------+
|    |TOTAL    |    |         |   10|            |             |                    |
+----+---------+----+---------+-----+------------+-------------+--------------------+

9)Insert two vertices into the User

orientdb {db=MyDB}> insert into User set name = 'name01'

Inserted record 'User#26:0{name:name01} v1' in 0.004000 sec(s).

orientdb {db=MyDB}> insert into User set name = 'name02'

Inserted record 'User#26:1{name:name02} v1' in 0.002000 sec(s).

orientdb {db=MyDB}> clusters


CLUSTERS (collections)
+----+---------+----+---------+-----+------------+-------------+--------------------+
|#   |NAME     |  ID|CLASS    |COUNT|OWNER_SERVER|OTHER_SERVERS|AUTO_DEPLOY_NEW_NODE|
+----+---------+----+---------+-----+------------+-------------+--------------------+
|0   |_studio  |  25|_studio  |    1|    n01     |    [n02]    |        true        |
|1   |default  |   3|         |    0|    n01     |    [n02]    |        true        |
|2   |e        |  17|E        |    0|    n01     |    [n02]    |        true        |
|3   |e_1      |  18|E        |    0|    n01     |    [n02]    |        true        |
|4   |e_2      |  19|E        |    0|    n01     |    [n02]    |        true        |
|5   |e_3      |  20|E        |    0|    n01     |    [n02]    |        true        |
|6   |e_4      |  21|E        |    0|    n02     |    [n01]    |        true        |
|7   |e_5      |  22|E        |    0|    n02     |    [n01]    |        true        |
|8   |e_6      |  23|E        |    0|    n02     |    [n01]    |        true        |
|9   |e_7      |  24|E        |    0|    n02     |    [n01]    |        true        |
|10  |index    |   1|         |    0|    n01     |    [n02]    |        true        |
|11  |internal |   0|         |    3|            |             |                    |
|12  |manindex |   2|         |    0|    n01     |    [n02]    |        true        |
|13  |ofunction|   6|OFunction|    0|    n01     |    [n02]    |        true        |
|14  |orole    |   4|ORole    |    3|    n01     |    [n02]    |        true        |
|15  |oschedule|   8|OSchedule|    0|    n01     |    [n02]    |        true        |
|16  |osequence|   7|OSequence|    0|    n01     |    [n02]    |        true        |
|17  |ouser    |   5|OUser    |    3|    n01     |    [n02]    |        true        |
|18  |user     |  26|User     |    2|    n01     |             |       false        |
|19  |user_1   |  27|User     |    0|    n02     |             |       false        |
|20  |v        |   9|V        |    0|    n01     |    [n02]    |        true        |
|21  |v_1      |  10|V        |    0|    n01     |    [n02]    |        true        |
|22  |v_2      |  11|V        |    0|    n01     |    [n02]    |        true        |
|23  |v_3      |  12|V        |    0|    n02     |    [n01]    |        true        |
|24  |v_4      |  13|V        |    0|    n01     |    [n02]    |        true        |
|25  |v_5      |  14|V        |    0|    n02     |    [n01]    |        true        |
|26  |v_6      |  15|V        |    0|    n02     |    [n01]    |        true        |
|27  |v_7      |  16|V        |    0|    n02     |    [n01]    |        true        |
+----+---------+----+---------+-----+------------+-------------+--------------------+
|    |TOTAL    |    |         |   12|            |             |                    |
+----+---------+----+---------+-----+------------+-------------+--------------------+
  1. Connect to Orientdb with console.bat with 2425 port
orientdb {db=MyDB}> connect remote:localhost:2425/MyDB root 123

Disconnecting from the database [MyDB]...OK
Connecting to database [remote:localhost:2425/MyDB] with user 'root'...OK

CONFIGURED SERVERS
+----+----+------+-----------+-------------------+-----------------+-----------------+----------------+----------------+---------+
|#   |Name|Status|Connections|StartedOn          |Binary           |HTTP             |UsedMemory      |FreeMemory      |MaxMemory|
+----+----+------+-----------+-------------------+-----------------+-----------------+----------------+----------------+---------+
|0   |n02 |ONLINE|2          |2018-05-17 06:32:53|192.168.56.1:2425|192.168.56.1:2481|182.65MB (5.02%)|118.85MB (3.26%)|3.56GB   |
|1   |n01 |ONLINE|6          |2018-05-17 06:31:36|192.168.56.1:2424|192.168.56.1:2480|141.34MB (3.88%)|174.66MB (4.80%)|3.56GB   |
+----+----+------+-----------+-------------------+-----------------+-----------------+----------------+----------------+---------+
  1. Insert Three vertices into User
orientdb {db=MyDB}> insert into User set name = 'name03'

Inserted record 'User#27:0{name:name03} v1' in 0.009000 sec(s).

orientdb {db=MyDB}> insert into User set name = 'name04'

Inserted record 'User#27:1{name:name04} v1' in 0.002000 sec(s).

orientdb {db=MyDB}> insert into User set name = 'name05'

Inserted record 'User#27:2{name:name05} v1' in 0.002000 sec(s).

orientdb {db=MyDB}> clusters


CLUSTERS (collections)
+----+---------+----+---------+-----+------------+-------------+--------------------+
|#   |NAME     |  ID|CLASS    |COUNT|OWNER_SERVER|OTHER_SERVERS|AUTO_DEPLOY_NEW_NODE|
+----+---------+----+---------+-----+------------+-------------+--------------------+
|0   |_studio  |  25|_studio  |    1|    n01     |    [n02]    |        true        |
|1   |default  |   3|         |    0|    n01     |    [n02]    |        true        |
|2   |e        |  17|E        |    0|    n01     |    [n02]    |        true        |
|3   |e_1      |  18|E        |    0|    n01     |    [n02]    |        true        |
|4   |e_2      |  19|E        |    0|    n01     |    [n02]    |        true        |
|5   |e_3      |  20|E        |    0|    n01     |    [n02]    |        true        |
|6   |e_4      |  21|E        |    0|    n02     |    [n01]    |        true        |
|7   |e_5      |  22|E        |    0|    n02     |    [n01]    |        true        |
|8   |e_6      |  23|E        |    0|    n02     |    [n01]    |        true        |
|9   |e_7      |  24|E        |    0|    n02     |    [n01]    |        true        |
|10  |index    |   1|         |    0|    n01     |    [n02]    |        true        |
|11  |internal |   0|         |    3|            |             |                    |
|12  |manindex |   2|         |    0|    n01     |    [n02]    |        true        |
|13  |ofunction|   6|OFunction|    0|    n01     |    [n02]    |        true        |
|14  |orole    |   4|ORole    |    3|    n01     |    [n02]    |        true        |
|15  |oschedule|   8|OSchedule|    0|    n01     |    [n02]    |        true        |
|16  |osequence|   7|OSequence|    0|    n01     |    [n02]    |        true        |
|17  |ouser    |   5|OUser    |    3|    n01     |    [n02]    |        true        |
|18  |user     |  26|User     |    0|    n01     |             |       false        |
|19  |user_1   |  27|User     |    3|    n02     |             |       false        |
|20  |v        |   9|V        |    0|    n01     |    [n02]    |        true        |
|21  |v_1      |  10|V        |    0|    n01     |    [n02]    |        true        |
|22  |v_2      |  11|V        |    0|    n01     |    [n02]    |        true        |
|23  |v_3      |  12|V        |    0|    n02     |    [n01]    |        true        |
|24  |v_4      |  13|V        |    0|    n01     |    [n02]    |        true        |
|25  |v_5      |  14|V        |    0|    n02     |    [n01]    |        true        |
|26  |v_6      |  15|V        |    0|    n02     |    [n01]    |        true        |
|27  |v_7      |  16|V        |    0|    n02     |    [n01]    |        true        |
+----+---------+----+---------+-----+------------+-------------+--------------------+
|    |TOTAL    |    |         |   13|            |             |                    |
+----+---------+----+---------+-----+------------+-------------+--------------------+
  1. Run these commands
orientdb {db=MyDB}> select * from cluster:user_1

+----+-----+------+------+
|#   |@RID |@CLASS|name  |
+----+-----+------+------+
|0   |#27:0|User  |name03|
|1   |#27:1|User  |name04|
|2   |#27:2|User  |name05|
+----+-----+------+------+

3 item(s) found. Query executed in 0.003 sec(s).
orientdb {db=MyDB}> select from User

Error: com.orientechnologies.orient.server.distributed.ODistributedException: No active nodes found to execute command: sql.select from User

orientdb {db=MyDB}>

n02 output:

2018-05-17 06:41:32:117 INFO  [n02] Distributed servers status (*=current @=lockManager[n01]):

+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|Name  |Status|Databases                           |Conns|StartedOn|Binary           |HTTP             |UsedMemory             |
+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|n02(*)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|2    |06:32:53 |192.168.56.1:2425|192.168.56.1:2481|139.09MB/3.56GB (3.82%)|
|      |      |MyDB=ONLINE (MASTER)                |     |         |                 |                 |                       |
|n01(@)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|6    |06:31:36 |192.168.56.1:2424|192.168.56.1:2480|73.57MB/3.56GB (2.02%) |
|      |      |MyDB=ONLINE (MASTER)                |     |         |                 |                 |                       |
+------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+

2018-05-17 07:02:04:854 WARNI I/O Error on distributed channel (clientId=5 reqType=121 error=java.io.InvalidClassException: com.orientechnologies.orient.core.s
l.query.OBasicResultSet; no valid constructor)
2018-05-17 07:04:04:852 WARNI I/O Error on distributed channel (clientId=2 reqType=121 error=java.io.InvalidClassException: com.orientechnologies.orient.core.s
l.query.OBasicResultSet; no valid constructor)
2018-05-17 07:06:04:845 WARNI [n02] Timeout (120001ms) on waiting for synchronous responses from nodes=[n01] responsesSoFar=[] request=(id=1.313 task=command_s
l(select from User))

n01 output:

2018-05-17 06:41:22:478 INFO  [n01] Distributed servers status (*=current @=lockManager[n01]):

+---------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|Name     |Status|Databases                           |Conns|StartedOn|Binary           |HTTP             |UsedMemory             |
+---------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+
|n02      |ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|2    |06:32:53 |192.168.56.1:2425|192.168.56.1:2481|138.76MB/3.56GB (3.81%)|
|         |      |MyDB=ONLINE (MASTER)                |     |         |                 |                 |                       |
|n01(*)(@)|ONLINE|GratefulDeadConcerts=ONLINE (MASTER)|6    |06:31:36 |192.168.56.1:2424|192.168.56.1:2480|73.57MB/3.56GB (2.02%) |
|         |      |MyDB=ONLINE (MASTER)                |     |         |                 |                 |                       |
+---------+------+------------------------------------+-----+---------+-----------------+-----------------+-----------------------+

2018-05-17 07:06:05:098 WARNI [n01]->[n02] Error on sending message to distributed node (java.net.SocketException: Software caused connection abort: socket writ
e error) retrying (1/3)
  1. I also test this:
orientdb {db=MyDB}> select * from cluster:user

Error: com.orientechnologies.orient.server.distributed.ODistributedException: No active nodes found to execute command: sql.select * from cluster:user

orientdb {db=MyDB}>

n02 output:

2018-05-17 07:14:50:365 WARNI I/O Error on distributed channel (clientId=6 reqType=121 error=java.io.InvalidClassException: com.orientechnologies.orient.core.sq
l.query.OBasicResultSet; no valid constructor)
2018-05-17 07:15:05:166 WARNI [n02] Timeout (10001ms) on waiting for synchronous responses from nodes=[n01] responsesSoFar=[] request=(id=1.421 task=gossip time
stamp: 1526525095165 lockManagerServer: n01)
2018-05-17 07:15:05:166 WARNI [n02]->[n01] Server 'n01' did not respond to the gossip message (db=GratefulDeadConcerts, timeout=10000ms), but cannot be set OFFL
INE by configuration

n01 output:

2018-05-17 07:06:05:098 WARNI [n01]->[n02] Error on sending message to distributed node (java.net.SocketException: Software caused connection abort: socket writ
e error) retrying (1/3)
2018-05-17 07:15:05:167 WARNI [n01]->[n02] Error on sending message to distributed node (java.net.SocketException: Software caused connection abort: socket writ
e error) retrying (1/3)

Thanks.

@saeedtabrizi
Copy link
Contributor

Hi @hossein-md
Please send your configured solution as zip file to my personal email as you know.
Note: Please don't send large or any sensitive data by email .
Thanks

@hossein-md
Copy link
Author

Hi @saeedtabrizi,

I sent it to your email.

Thanks

@saeedtabrizi
Copy link
Contributor

saeedtabrizi commented May 17, 2018

@hossein-md ,
I checked your database and i found a problem with write quorum value and also a problem to find LSN in your distributed database . (LSN means Log Sequence Number )
Just when i recreate the database and set right quorum value , database syncing and i can retrieve all or part of data from any cluster that be online .
If you feel this answer is enough for you , please close this issue .

@lvca, @Laa This issue occurred when the LSN not recovered truly after a failed transaction in each node . so after any CRUD operation an exception trowed in console .

Thanks
Saeed

@hossein-md
Copy link
Author

Hi @saeedtabrizi,

I tested the writequroum = 1 and I got the same errors.
Also I tested the writequroum = 1 and readquroum = 2 and in this case the query select * from cluster:user and select * from user didn't work.

Thanks

@saeedtabrizi
Copy link
Contributor

Hi @hossein-md
I will send an email about my tested instances asap.

@HassounDev
Copy link

@saeedtabrizi I tried the scenario that u implemented, because i had the same problem as @hossein-md,

It works when i use select from cluster:myclass_a1 but when i use select * from myclass it throws

Error: com.orientechnologies.orient.enterprise.channel.binary.OResponseProcessingException: Exception during response processing
Error: java.lang.ArrayIndexOutOfBoundsException: -1 

@sedirmohammed
Copy link

sedirmohammed commented Aug 8, 2018

Hey @saeedtabrizi, how did you reach that the records are stored on the local cluster? That does not work for me... It stores all records to a specific node, even if the local node owns the matching cluster...

@georgiana-b
Copy link

georgiana-b commented Aug 9, 2018

Hi guys!

@adler4566 I think I also encountered this problem. I was trying to keep only one cluster in each node that will also be the owner of that cluster. The idea was that none of the nodes should have locally the entire database, only the cluster they own.

I didn't manage to achieve this. If I use autoDeploy: true in distributed-config.json each node is always syncing the entire database, not just its cluster.

@sedirmohammed
Copy link

sedirmohammed commented Aug 9, 2018

Hey @georgiana-b, this is very interesting, because I have exactly the same target as you! But why does this happen? This can‘t be the desired behavour of a distributed database... I work with big data, but with this behavour I can‘t use this database. Maybe someone encountered the same problem and reached to fix it?

@georgiana-b
Copy link

Hi @luigidellaquila!

Just to know if it's worth looking more into this issue, does this feature exist in OrientDB? Is it possible to shard the database so that each node keeps only one cluster locally not the entire database?
If not, are you guys interested in developing this?

@mmacfadden
Copy link
Contributor

@georgiana-b It might be worth taking a look at these issues:

They both very much relate to this topic. One thing to consider is that while you definitely want to be able to either distribute data across clusters, and / or clusters across the server in some distributed hash table method, you may ALSO want some sort of replication factor in case you loose a server. So you might want each cluster to exist on N servers where N >= 1. That way if you loose a single server you don't loose availability of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

8 participants