Skip to content

Conversation

@ijihang
Copy link
Contributor

@ijihang ijihang commented Jun 22, 2021

While the cluster is running, the coordinator node may go down, which may cause the session to be unable to read or write properly.i add a session interface to connect multiple nodes.Users can now define a list that contains multiple servers that can be connected. When one of the servers goes down, try again. If it fails, connect to the next one. If all the nodes in the list fail, an exception will be thrown.

@HTHou HTHou added the Module - Cluster PRs for the cluster module label Jun 22, 2021

private boolean reconnect() {
boolean flag = false;
for (int i = 1; i <= Config.RETRY_NUM; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always try the first RETRY_NUM clients, it's not good. Maybe you can record a nextHostIndex to try next hostId.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in new commit,Instead of trying the first node in the list every time, try the current node again, and then try the next node in the list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should connect nodes randomly, so that we can reduce the pressure on one node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, in new commit,connect nodes randomly

this.enableCacheLeader = enableCacheLeader;
}

public synchronized void clusterOpen() throws IoTDBConnectionException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's no need to create a new open function, you could put the logic which just create SessionConnection who use the nodeUrl in the contructSessionConn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,i modify in new commit,using the old open method

this.enableCacheLeader = enableCacheLeader;
}

public Session(List<String> nodeUrls, String username, String password) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better using java doc to specify the URL format clearly

throws StatementExecutionException, IoTDBConnectionException {
try {
logger.info("{} execute sql {}", defaultSessionConnection.getEndPoint(), sql);
logger.info("{} execute sql {}", defaultEndPoint, sql);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultSessionConnection.getEndPoint() may not equal defaultEndPoint as long as we enable redirection

defaultSessionConnection.getEndPoint(),
sql,
e.getEndPoint());
logger.debug("{} redirect query {} to {}", defaultEndPoint, sql, e.getEndPoint());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same as above

Comment on lines +168 to +169
EndPoint endPoint = parseNodeUrl(nodeUrl);
endPointsList.add(endPoint);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseNodeUrl method may return null, we can not add one null endpoint to the endPointsList

Comment on lines 179 to 188
}
String ip = split[0];
try {
int rpcPort = Integer.parseInt(split[1]);
return endPoint.setIp(ip).setPort(rpcPort);
} catch (NumberFormatException e) {
logger.warn("parse url fail {}", nodeUrl);
}
return endPoint;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user given one wrong URL format, in the code implement, the NumberFormatException may occur, you will return one empty endPoint.
In my opinion, if the seed URL format error, It's better rethrow the exception to the user.

Comment on lines 158 to 160
if (endPoint == null) {
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its better not allow null endPoint added in the endPointList in the previous parseSeedNodeUrls method. In other words, abnormal judgment should be made as soon as possible.

flag = true;
if (transport != null) {
transport.close();
int currHostIndex = endPointList.indexOf(session.defaultEndPoint);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currHostIndex maybe -1, so you should handle it.

} catch (InterruptedException e1) {
logger.error("reconnect is interrupted.", e1);
Thread.currentThread().interrupt();
int tag = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag->tryHostNum

}

private boolean reconnect() {
boolean flag = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag -> connectedSuccess

Copy link
Member

@neuyilan neuyilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@mychaow mychaow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Module - Cluster PRs for the cluster module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants