-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-2199 In SCMNodeManager dnsToUuidMap cannot track multiple DNs on the same host #1551
Conversation
/label ozone |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The checkstyle and findbugs warnings are in files that this patch has not changed. I think the unit test failure is unrelated and integration and acceptance tests seem to be having some issues at the moment. I will rebase and push again to see how another run goes. |
a06fa0c
to
9dcff9b
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Findbug is warning about this method:
As this is the only place dnsToUuidMap gets modified and it is synchronized, I think it is OK. |
I think Integration tests are failing due to HDDS-2187, so once it gets committed we can rebase this and see if it gets a better test run. |
… the given host / address rather than a single node
10db15e
to
b7c5d2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks the patch @sodonnel. Overall the concept looks good to me, I have a few questions about the implementation....
@@ -295,7 +297,33 @@ public ScmInfo getScmInfo() throws IOException { | |||
boolean auditSuccess = true; | |||
try{ | |||
NodeManager nodeManager = scm.getScmNodeManager(); | |||
Node client = nodeManager.getNodeByAddress(clientMachine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand why this big block is not just as simple:
Node client = null;
List<DatanodeDetails> possibleClients =
nodeManager.getNodesByAddress(clientMachine);
if (possibleClients.size()>0){
client = possibleClients.get(0);
}
It seems to be a logic to find a datanode which is on the same host as the client. I am not sure if we need this tricky randomization (or choosing the first possible datanodes): if client is null, we don't need sort (handled by the sort method below), if there are multiple datanodes on the same client we can choose the first one as in the topology sort it doesn't matter which one is chosen.
But please fix me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not certain how this sortDatanodes() call is used. Is it on the read path or write path? I was assuming it was on the read path, but write path may be different if all the cluster DNs are passed into the method - then you would always get a match.
A list of DNs (UUIDs) are passed into the method, and then we retrieve a list of DatanodeDetails running on the client machine. The client machine can then be set to one of those DatanodeDetails, but it is not guaranteed that the first in the list will match on of the UUIDs passed into the method.
Eg this is passed in:
DN0, DN5, DN10, DN15
On the client machine is:
DN1, DN6, DN10 and DN16
So only DN10 is a match with one that is passed it. If we just picked the first one (DN1) it would look like there is no DN on the client machine and then when the list and client machine are passed into sortByDistanceCost() at line 355, it would not give the expected result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at where sortDatanodes() is used, it seems to be from the OM when performing lookup file or lookup key. So that suggests it is only used in the read path, and hence at most 3 DNs should be passed in along with one client address.
The code could be simplified a little, but I think we do need to filter the list of returned nodes down to only the nodes it cares about due to what I said in the comment above.
However, thinking about this some more, I think we can avoid the random selection. In the case where there is only 1 DN per host, the DN matching the client would always be sorted first, so we don't really need to randomize the first node returned if all nodes are on the same host. I will refactor this and see how it looks then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eg this is passed in:
DN0, DN5, DN10, DN15
On the client machine is:
DN1, DN6, DN10 and DN16
Not sure If I understand well. I think there are two parameters
- The host name of the client (eg. DN10)
- List of datanode UUIDs (DN1, DN6, DN10)
The task is to sort the list of datanode UUIDs, with prioritizing the client machine.
First the client hostname is converted to UUID. This is the part which is replaced by this complex block. If it could not been converted it can be null. After that the topology logic sorts the list. sortByDistanceCost
.
IMHO this functionality can be implemented my 4 lines of code, but fix me If I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there are multiple DNs on a given host (host1), you pass in the client as host1.
We do a lookup for host1 and we get DN UUIDs (actually DatanodeDetails objects) DN1, DN6, DN10 and DN16 which are on this host.
The pipeline passed in is DN4, DN5 and DN10. Ie only one of these hosts is on the client machine, but the client machine has other nodes not involved in this pipeline.
If it was just the hostname/IP used for later comparison your logic would be fine. However ...
The matched client DatanodeDetails object is passed to sortByDistanceCost() later in the same method, which calls this for each pipeline node:
NetworkTopologyImpl.getDistanceCost(reader, nodes.get(i));
Where reader is the client Node object we found.
In get distance by cost, the first few lines do this:
if ((node1 != null && node2 != null && node1.equals(node2)) ||
(node1 == null && node2 == null)) {
return 0;
}
Ie, it both parameters are non-null and are the same object returns a distance of zero, otherwise it goes through more logic to calculate a distance.
Going back to the example above - your logic would return DN1 which would not give a zero distance cost in getDistanceByCost() when compared with any of the pipeline nodes.
My more complex logic would return DN10 which would return a zero cost when compared with pipeline node DN10, as they are the same object.
So after a rather long example the summary is that if the cost calculation was based on hostname your logic would be fine, but as it compares the actual node objects I think we need the more complex logic, unfortunately! I have not followed the rest of the logic in getDistanceByCost to see if a cost of zero would fall out in the end, but I suspect it will be some small non-zero value as both nodes will be at the same level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflecting on this issue some more, I think the simplified logic you have suggested is better and the problem is better solved in getDistanceByCost - rather than comparing just the node objects are the same, we should test if they are the same hostname and if so treat that as a zero distance match too.
Unfortunately, as that method takes Node objects rather than DatanodeDetails, this is not trivial to do.
The code path under question here is only relevant for clusters with more than one datanode on the same host, and by definition that is a non-production setup. The only consequence of the change you have suggested over my original code, is that the client may get the wrong 'cost to reach a datanode' sometimes on test clusters - nothing will fail, so the impact of this issue is very low.
Therefore if you are happy, I think we should commit the latest version (which has your simplified logic) and create a followup Jira to look into fixing getDistanceByCost somehow.
String dnsName, String uuid) { | ||
Set<String> dnList = dnsToUuidMap.get(dnsName); | ||
if (dnList == null) { | ||
dnList = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need concurrent HashSet here? dnsToUuidMap is thread safe but what about if one reader and one writer start to use the same HashSet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed this one by creating a set with ConcurrentHashMap.newKeySet(); which create a Concurrent set.
* @param dnsName String representing the hostname or IP of the node | ||
* @param uuid String representing the UUID of the registered node. | ||
*/ | ||
private void addEntryTodnsToUuidMap(String dnsName, String uuid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also can by synchronized and HashSet - > ConcurrentHashSet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed this one by creating a set with ConcurrentHashMap.newKeySet(); which creates a Concurrent set and synchronised the method.
… is also synchronized when modifying dnsToUuidMap
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…emove the random lookup
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
+1 Thanks the updated @sodonnel Let's do the complex part in the follow-up jira. I am pushed it to the trunk... |
…n the same host Closes apache#1551
…n the same host Closes apache#1551
Often in test clusters and tests, we start multiple datanodes on the same host.
In SCMNodeManager.register() there is a map of hostname -> datanode UUID called dnsToUuidMap.
If several DNs register from the same host, the entry in the map will be overwritten and the last DN to register will 'win'.
This means that the method getNodeByAddress() does not return the correct DatanodeDetails object when many hosts are registered from the same address.
This method is only used in SCMBlockProtocolServer.sortDatanodes() to allow it to see if one of the nodes matches the client, but it need to be used by the Decommission code.
Perhaps we could change the getNodeByAddress() method to returns a list of DNs? In normal production clusters, there should only be one returned, but in test clusters, there may be many. Any code looking for a specific DN entry would need to iterate the list and match on the port number too, as host:port would be the unique definition of a datanode.