-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28547 Support specifying connection configuration through queri… #5853
Conversation
@stoty @ndimiduk @bbeaudreault FYI This is another choice comparing to #5850 . In #5850, we map some configuration values between URI queries, so users could specify some connection registry specific configurations through URI queries. While in this PR, we just apply all the queries to our configuration, so users could set any configuration they like in the URI. The solution in this PR is more flexible, as there could lots of configurations which need to be specified when connecting to a hbase cluster, not only for connection registry. So I think here we need to consider more, thoughts? Thanks. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
This is super scary. What is the use case? |
To make sure users can fully initialize a connecton to a HBase cluster with a single URI. In the current implementation, we still need to read configurations from the given Configuration instance, especially that if security is enabled, we need to know the keytab location, principal name, etc., not only some timeout. |
Forgive me if I'm wrong. To me there seems to be no difference between the two solutions, we can both specify only the core configuration in the URI, and other configurations are optional and have the same default values. In fact, it is how we plan to use that other configurations are placed in the configuration and only the core configurations are specified in the URI. Also thanks for the great work ! |
In this PR, you are free to reset any configurations you like through the URI queries. But anyway, we could introduce something like allowlist/blocklist to make us feel safer... WDYT? THanks. |
Considering that this PR will bring more freedom to our users, +1 on this PR. Thanks. |
@anmolnar can you expand on your concerns here? |
@@ -30,9 +30,9 @@ | |||
* Connection registry creator implementation for creating {@link ZKConnectionRegistry}. | |||
*/ | |||
@InterfaceAudience.Private | |||
public class ZKConnectionRegistryCreator implements ConnectionRegistryURIFactory { | |||
public class ZKConnectionRegistryURIFactory implements ConnectionRegistryURIFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good name choice 👍
if (StringUtils.isBlank(uri.getRawQuery())) { | ||
return Collections.emptyMap(); | ||
} | ||
return Splitter.on('&').trimResults().splitToStream(uri.getRawQuery()).map(kv -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why manually decode values when you can call getQuery()
, which explicitly does the decoding for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do we have org.apache.http.client.utils.URLEncodedUtils
on the classpath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure but I'm trying to not rely on libs other than guava as much as possible, to minimize our dependency scope...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I would also prioritize a smaller dependency set. There's an open PR for Guava to add some support, but I guess nothing there yet.
For me, if we allow overriding any configurations through URI queries, it may raise security concerns, for example, what if the the URI contains some query parameters about HDFS related configurations? But FWIW, it can only affect the configuration used to communicate with the cluster specified by this URI, so maybe it just dependx on whether you think the target HBase cluster is dangerous or not? If you think the target HBase cluster is safe then there should be no trouble... |
From my perspective, on client-side security, the client application is already a "compromised" actor. I assume that whomever controls that application already has full control over the configuration objects that it presents to the hbase-client library code. Any service application that exposes creation of an hbase-client to user-provided input is already exposed to whatever attacks this approach opens up. Unless of course I'm missing something. |
In real world, after we introduce the connection URI, a possible use case is that, some users may introduce a hbase connector on their system, and let the users of the system submit an URI for connecting, in this way the URI may come from outside their control. But anyway, the URI instance passed to the ConnectionFactory API is still controlled by them, so they could use allowlist/blocklist to filter the queries if they have security concerns, that' true. |
I agree with Nick. I also think it's a bit odd to require two separate configuration points for a working connection. So previously you had to specify a Configuration, and now you can use URL instead. You can audio combine them, but to require both seems odd, and I find it very unlikely that a realistic user wouldn't need to specify a bunch of options, like timeouts, etc. So URL should be able to configure everything imo. But also, in trying to think what would be a security concern if it were compromised. It doesn't matter if someone specified hdfs settings because they won't be used by the client. |
…es of the connection uri
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
+1 from me. Still, I would like to hear @anmolnar 's arguments against. |
Ping @anmolnar |
Thanks @ndimiduk Will merge this PR on next Monday if no other concerns this weekend. |
…es of the connection uri (apache#5853) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
…es of the connection uri