Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28146: Make ServerManager rsAdmins map thread safe #5461

Merged
merged 5 commits into from
Oct 23, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,13 @@
import java.net.InetAddress;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ConcurrentNavigableMap;
import java.util.concurrent.ConcurrentSkipListMap;
import java.util.concurrent.CopyOnWriteArrayList;
Expand Down Expand Up @@ -128,7 +129,8 @@ public class ServerManager {
* Map of admin interfaces per registered regionserver; these interfaces we use to control
* regionservers out on the cluster
*/
private final Map<ServerName, AdminService.BlockingInterface> rsAdmins = new HashMap<>();
private final ConcurrentMap<ServerName, AdminService.BlockingInterface> rsAdmins =
new ConcurrentHashMap<>();

/** List of region servers that should not get any more new regions. */
private final ArrayList<ServerName> drainingServers = new ArrayList<>();
Expand Down Expand Up @@ -718,14 +720,19 @@ public static void closeRegionSilentlyAndWait(ClusterConnection connection, Serv
public AdminService.BlockingInterface getRsAdmin(final ServerName sn) throws IOException {
AdminService.BlockingInterface admin = this.rsAdmins.get(sn);
if (admin == null) {
LOG.debug("New admin connection to " + sn.toString());
if (sn.equals(master.getServerName()) && master instanceof HRegionServer) {
// A master is also a region server now, see HBASE-10569 for details
admin = ((HRegionServer) master).getRSRpcServices();
} else {
admin = this.connection.getAdmin(sn);
}
this.rsAdmins.put(sn, admin);
return this.rsAdmins.computeIfAbsent(sn, server -> {
LOG.debug("New admin connection to " + server.toString());
if (server.equals(master.getServerName()) && master instanceof HRegionServer) {
// A master is also a region server now, see HBASE-10569 for details
return ((HRegionServer) master).getRSRpcServices();
} else {
try {
return this.connection.getAdmin(server);
} catch (IOException e) {
throw new RuntimeException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should throw this IOException out? This is a problem of computeIfAbsent, as the functional interface does not throw IOException...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, it seems we need some synchronization. A concurrent map is pointless if you do unsyncrhonized writes on it. You could use ConcurrentMapUtils.computeIfAbsentEx to get around the exception issue.

That said, looking at the impls of getRSRpcServices and getAdmin, I don't think we really need to cache these at all. getRSRpcServices directly returns an instance variable. getAdmin already does it's own caching.

So it might make sense to simply remove the rsAdmins map. @Apache9 what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computeIfAbsentEx seems like a good idea here. I don't have an informed opinion regarding whether the caching is really necessary, but I'm happy to rip it out if we think that's worth doing now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, checked the code, as @bbeaudreault mentioned, we already have a cache in ConnectionImplementation,

return (AdminProtos.AdminService.BlockingInterface) computeIfAbsentEx(stubs, key, () -> {

So I think we can just remove the cache here.

}
}
});
}
return admin;
}
Expand Down