-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28436 Use connection url to specify the connection registry inf… #5770
Conversation
@stoty FYI |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionRegistries.java
Outdated
Show resolved
Hide resolved
...java/org/apache/hadoop/hbase/client/TestBasicReadWriteWithDifferentConnectionRegistries.java
Show resolved
Hide resolved
@stoty I named it ConnectionRegistryCreator in the new PR, and added a test in hbase-client for testing URI parsing and fallback. PTAL. Thanks. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
Ping @ndimiduk . Could you please take a look at this? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: the content of the scheme part, Is there a larger discussion somewhere that I've missed?
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionRegistryFactory.java
Show resolved
Hide resolved
* specified connection url {@code uri} will not take effect, we will load all the related | ||
* configurations from the given Configuration instance {@code conf} | ||
*/ | ||
static ConnectionRegistry create(URI uri, Configuration conf, User user) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I did a search on URI vs. URL in Java and I conclude that URI is the correct object type for our use-case.
private static final ImmutableMap<String, ConnectionRegistryCreator> CREATORS; | ||
static { | ||
ImmutableMap.Builder<String, ConnectionRegistryCreator> builder = ImmutableMap.builder(); | ||
for (ConnectionRegistryCreator factory : ServiceLoader.load(ConnectionRegistryCreator.class)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use ServiceLoader
anywhere else? I think that this is an okay use of this runtime feature, however it also means that we're exposing ourselves to user-provided ConnectionRegistryCreator
implementations. I suspect that this is not intentional, given the class is marked IA.Private
. On the other hand, this could be a powerful point of extension for HBase clients running in sophisticated environments.
What do you think about making ConnectionRegistryCreator
IA.Public
and supporting this as a point of client extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also load SaslClientAuthenticationProvider with ServiceLoader.
I think later we could change this to IA.LimitedPrivate(xxx)
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionRegistryCreator.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionRegistryCreator.java
Outdated
Show resolved
Hide resolved
static { | ||
ImmutableMap.Builder<String, ConnectionRegistryCreator> builder = ImmutableMap.builder(); | ||
for (ConnectionRegistryCreator factory : ServiceLoader.load(ConnectionRegistryCreator.class)) { | ||
builder.put(factory.protocol(), factory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wikipedia says that schemes should be case-insensitive. We should normalize these before adding them to the map.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionRegistryFactory.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public String protocol() { | ||
return "hbase+rpc"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part gets messy. I think the convention that has arisen around URI's with a '+' in their scheme section is used to indicate a protocol+transport. What does that mean for us? What do we even call our current RPC implementation of "Hadoop RPC plus protobuf cell back-channel", just "HBase RPC", so hbaserpc
, which maybe is short for hbaserpc+tcp
? Say down the road we support an hbase rpc over HTTP or over GRPC (which itself supports http/2 and http/3 as transports, as well as grpc-web) or over UDP, what then? What do we call our existing Thrift ("hbase+thrift" ?) and REST gateways ("rest+http(s)" ?)
By contrast, the zookeeper client isn't a protocol at all. It's just a location of an expected service type. So then we can call it just "zookeeper" or "zk" for short.
I'm not saying we expect to have all these transport mechanisms, but we should think through what we want this part of our public API to look like and give precise meaning to the scheme section of the URI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just followed the similar way with what we have in phoenix.
https://phoenix.apache.org/classpath_and_url.html
Agree that we should take care of the schema part. Maybe we could start a discussion thread again on the mailing list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO having the first part be just hbase to denote that your talking to hbase makes the most sense. Then the second part can be whatever. Based on what technology we have today and for years past and the foreseeable future, zk and rpc make sense. In this case, rpc doesn't even mean "talk to hbase over rpc" it means "use RpcConnectionRegistry". I could see that being confusing, we could consider calling it bootstrap or something.
The suffix tells our code whether to parse the authority as a list of zk servers or as a list of bootstrap nodes. That unrelated to grpc vs custom rpc, etc.
If someone wanted to create a grpc or http2 or w/e protocol, they'd still be talking to hbase and they'd still need someway to bootstrap the connection. So I bet the scheme would stay the same and which communication protocol to use would be a query parameter.
We could argue what's most to spec but since the +suffix stuff is largely not spec'd I think we should consider user intuitiveness and how our code uses it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose, the way to think about this connection URI is that is it just for locating and initiating a connection to a cluster. Even if we were to pursue a more complete specification as I've advocated, a client still has non-trivial reliance on other configuration parameters. It's unrealistic to expect that we could roll up those details in the scheme portion of the URI. You've convinced me.
Given the menu of options that we have available today, what you have implemented here seems fine. Can we add "hbase" without the "+..." part and let that be the default bootstrap mechanism for the current hbase version? In 2.6 that would be an alias for hbase+zk, for 3.0 that would be an alias for hbase+rpc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the scheme part should contain the 'zk' or 'rpc' information, so we could know how to decode the other part of the URI.
And for other parameters, we could add them as the parameter of the URI in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe another choice is to introduce a parameter like 'registry'. So the URI could be
hbase://zk1:2181,zk2:2181,zk3:2181/xxx?registry=zk
or
hbase://rs1:16010/?registry=rpc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think that option works well because the point of the scheme is to tell the code how to parse the authority. For rpc vs zk, we need different parsing. It seems odd (even if possible) to put that in the query param. To me we could put things in the query param that affect anything else about the connection other than how to parse the uri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IUC the standard Hbase client libraries don't support any other procol than HBaseRPC.
IF they were to ever support say REST and/or Thrift, then we could still add new variant for them as needed.
However, the current API probably has a lot of built-in assumptions of using RPC, and even if that were to be solved, achieving and maintaining parity for the alternate protocols would be monumental job, and we can make up new protocol variants for those when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other issue we run into with Phoenix quite bit is that the cluster configurations still have to match in several aspects, like timeouts, TLS/SASL settings, etc, otherwise the client either can not even connect, or experiences errors due timeout / buffer size etc mismathces.
I think that some of that may also be a problem when configuring replication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other issue we run into with Phoenix quite bit is that the cluster configurations still have to match in several aspects, like timeouts, TLS/SASL settings, etc, otherwise the client either can not even connect, or experiences errors due timeout / buffer size etc mismathces.
I think that some of that may also be a problem when configuring replication.
For replication in hbase, there is configuration map in peer configuration, so we could add configurations specific for connecting to the peer cluster.
*/ | ||
@RunWith(Parameterized.class) | ||
@Category({ MediumTests.class, ClientTests.class }) | ||
public class TestBasicReadWriteWithDifferentConnectionRegistries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test.
Let me conclude a bit. Finally we focus on the scheme part of the URI. @ndimiduk thought the second part should be communication protocol when connecting to hbase cluster, so use For me I prefer we use Thanks. |
If no other feedbacks, I will move forward to update the PR to address the review comments. The basic direction is still the same as beginning, we use Thanks. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Any other concerns? Thanks. |
Gently ping again... This is very important for the final 3.0.0 release, as we want to limit zookeeper in internal use only. Thanks. |
I don't think our compatibility policy allows limiting zookeeper to internal use only in 3.0 , @Apache9. |
Ah, maybe I misguided you... I do not mean we want to completely remove zookeeper in 3.0.0 release, we just want to provide a way to hide zookeeper inside HBase, beside the zookeeper based ways. For now, there is still a place where we must expose zookeeper out, when configuring a replication peer. We should provide a way to specify a remote cluster without zookeeper, but you are still free to use zookeeper there. Thanks. |
I was referring to the email thread on In my reading the conclusion was that we should not remove the ZK connection configuration path for the Connection object, only deprecate it in 3.0. This does not directly affect this ticket, I was only reflecting on your comment about making ZK connections internal only. |
We are on the same page here. This PR still support specifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Let's go with this. Nice work @Apache9 , thanks for promoting a thoughtful conversation.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionRegistryFactory.java
Show resolved
Hide resolved
Thanks @ndimiduk . Will merge recently if no other concerns. @bbeaudreault My plan is to apply this to branch-2+, i.e, 2.7.0+, since there are still other under going works to make this feature ready, especially that we still need to add more documentation about this feature. Please let me know if you want this in 2.6.0 too. |
Given there is more work to do here, might as well keep it to 2.7.0+ |
Got it. Thanks. |
…ormation (apache#5770) Signed-off-by: Istvan Toth <stoty@apache.org> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Reviewed-by: Bryan Beaudreault <bbeaudreault@apache.org> (cherry picked from commit e3761ba)
…ormation (apache#5770) Signed-off-by: Istvan Toth <stoty@apache.org> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Reviewed-by: Bryan Beaudreault <bbeaudreault@apache.org>
…ormation (apache#5770) Signed-off-by: Istvan Toth <stoty@apache.org> Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Reviewed-by: Bryan Beaudreault <bbeaudreault@apache.org>
…ormation