Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new accumulo command #5073

Merged
merged 10 commits into from
Dec 16, 2024
Merged

Adds new accumulo command #5073

merged 10 commits into from
Dec 16, 2024

Conversation

kevinrr888
Copy link
Member

@kevinrr888 kevinrr888 commented Nov 18, 2024

closes #5034

I did a deep dive into the code executed by accumulo check-server-config to figure out what is being checked to see if an instance can be avoided or what could be checked without an instance. Here is root code for reference:

public static void main(String[] args) {
try (var context = new ServerContext(SiteConfiguration.auto())) {
context.getConfiguration();
}
}

Here is a summary of my findings:

  • SiteConfiguration.auto() does validation: checks that the deprecated accumulo-site.xml file doesn't exist, loads properties from the config file (accumulo.properties), checks that both deprecated and the replacement properties aren't both declared in the file, replaces deprecated properties with their non-deprecated versions, and returns a new SiteConfiguration object (the SiteConfiguration object constructor calls ConfigCheckUtil.validate(config) to validate the config passed in)
  • context.getConfiguration() does validation. However, as far as I can tell, the only validation done from this call requires an instance (the only validation seems to stem from ZooBasedConfiguration/PropSnapshot/PropStore/PropStoreKey/etc.). Validates at least the instance.secret property (maybe more). Since this is related to an instance, seems fine that it wouldn't be validated here anyways.
  • new ServerContext(SiteConfiguration.auto()) (= new ServerContext(new ServerInfo(SiteConfiguration.auto()))) does validation. This creates a new ServerInfo(siteConfig) passing that to the ServerContext constructor. There is no validation done in the ServerContext constructor: most fields are memoized suppliers (the only one we access is ServerConfigurationFactory serverConfFactory which is accessed through context.getConfiguration() and we went into that code above). The ones that aren't memoized suppliers don't do any validation. But there is validation done from constructing ServerInfo. We will take what we can (does some form of validation and is not reliant on an instance) and use that in our new check.

With this info, I wrote a new check CheckServerConfigNoInstance which, to the best of my knowledge, is all the validation that the current CheckServerConfig would be able to do without a running instance. So, this PR adds a new accumulo check-server-config-no-inst (in addition to the existing accumulo check-server-config) which performs the checks that accumulo check-server-config does without the checks that require a running Accumulo instance. Useful for (at least mostly) validating the accumulo.properties file without an instance.

I'm not sure which branch this should target. Targeting 2.1 for now since the change could be made as early as 2.1 and it may be a desireable command here, but not sure if we would want to add a new command in the 2.1 line.

Any input/suggestions on this PR would be appreciated.

Adds new `accumulo check-server-config-no-inst` command which performs the checks that `accumulo check-server-config` does without the checks that require a running Accumulo instance. Useful for (at least mostly) validating the `accumulo.properties` file without a running instance.
@kevinrr888 kevinrr888 added this to the 2.1.4 milestone Nov 18, 2024
@kevinrr888 kevinrr888 self-assigned this Nov 18, 2024
@kevinrr888 kevinrr888 linked an issue Nov 18, 2024 that may be closed by this pull request
@dlmarion
Copy link
Contributor

@kevinrr888 - wondering if we could just run the no-instance checks if the error is thrown when an instance doesn't exist so that we can re-use the command? Thoughts?

@kevinrr888
Copy link
Member Author

@kevinrr888 - wondering if we could just run the no-instance checks if the error is thrown when an instance doesn't exist so that we can re-use the command? Thoughts?

I considered that, but thought it might hide some issues. For example, if a user expects that they have a running instance and run accumulo check-server-config without a running instance, it would do less checks than may be expected and would not result in any no-instance errors. To me, it seemed safest to split these into two separate commands and keep the no-instance errors of the original command.

@dlmarion
Copy link
Contributor

I was thinking a stern warning could printed to the screen that no instance is running so we are doing a reduced set of checks. I assume someone is going to run this interactively and not via a script. If we don't want to make that assumption, we could return a non-zero code.

@kevinrr888
Copy link
Member Author

I was thinking a stern warning could printed to the screen that no instance is running so we are doing a reduced set of checks. I assume someone is going to run this interactively and not via a script. If we don't want to make that assumption, we could return a non-zero code.

Yeah, I think a warning like that would be a way to replace the extra command. However, another thing I'm thinking of is it might be tricky to identify what a no-instance exception is. There is a certain exception that currently occurs, but that exception may change.

@dlmarion
Copy link
Contributor

We could just raise our own exception, NoInstanceException, for example.

Deleted CheckServerConfigNoInstance and moved functionality to
CheckServerConfig
@kevinrr888
Copy link
Member Author

I removed the new check and now check if an instance exists before the full set of checks runs in CheckServerConfig. The instance is checked with VolumeManager.getInstanceIDFromHdfs(instanceIdPath, hadoopConfig). Coincidentally, the vars needed for this call perform all* the checks that can be performed without a running instance: SiteConfiguration.auto(), VolumeManagerImpl.get(siteConfig, hadoopConfig), and new ServerDirs(siteConfig, hadoopConfig).

I wanted to avoid just surrounding the existing code:

    try (var context = new ServerContext(SiteConfiguration.auto())) {
      context.getConfiguration();
    }

with a try-catch as the exception that occurs is a few layers deep and may be changed in the future and it would not be clear that changing those exceptions would affect this command. Also, if we surrounded it with a try-catch we would have to identify what is an exception relating to no instance from many potential exceptions. Calling the method directly avoids these issues.

*to the best of my knowledge

@kevinrr888
Copy link
Member Author

One issue with this current impl is that the stack trace will still show an exception (this time a ConnectException) even though the calling code handles all Exceptions. My guess is that the exception is occurring in another thread. So, an output of accumulo check-server-config would look like this with no instance running:

2024-11-19T12:16:25,916 [conf.SiteConfiguration] INFO : Found Accumulo configuration on classpath at /home/krathbun/Desktop/github/fluo-uno/install/accumulo-2.1.4-SNAPSHOT/conf/accumulo.properties
2024-11-19T12:16:26,300 [fs.VolumeManager] ERROR: Problem reading instance id out of hdfs at hdfs://localhost:8020/accumulo/instance_id
java.net.ConnectException: Call From groot/10.0.0.182 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) ~[?:?]
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) ~[?:?]
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:930) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:845) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1571) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1513) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1410) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) ~[hadoop-client-api-3.3.6.jar:?]
	at jdk.proxy2/jdk.proxy2.$Proxy30.getListing(Unknown Source) ~[?:?]
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:689) ~[hadoop-client-api-3.3.6.jar:?]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.base/java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) ~[hadoop-client-api-3.3.6.jar:?]
	at jdk.proxy2/jdk.proxy2.$Proxy31.getListing(Unknown Source) ~[?:?]
	at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1702) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1686) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1113) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:149) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1188) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1185) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1195) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.accumulo.server.fs.VolumeManager.getInstanceIDFromHdfs(VolumeManager.java:214) ~[accumulo-server-base-2.1.4-SNAPSHOT.jar:2.1.4-SNAPSHOT]
	at org.apache.accumulo.server.conf.CheckServerConfig.main(CheckServerConfig.java:54) ~[accumulo-server-base-2.1.4-SNAPSHOT.jar:2.1.4-SNAPSHOT]
	at org.apache.accumulo.server.conf.CheckServerConfig.execute(CheckServerConfig.java:78) ~[accumulo-server-base-2.1.4-SNAPSHOT.jar:2.1.4-SNAPSHOT]
	at org.apache.accumulo.start.Main.lambda$execKeyword$0(Main.java:122) ~[accumulo-start-2.1.4-SNAPSHOT.jar:2.1.4-SNAPSHOT]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:205) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:600) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:652) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:773) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:347) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1632) ~[hadoop-client-api-3.3.6.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1457) ~[hadoop-client-api-3.3.6.jar:?]
	... 28 more
2024-11-19T12:16:26,300 [conf.CheckServerConfig] WARN : Performed only a subset of checks (which passed). There is a problem relating to the instance:
Can't tell if Accumulo is initialized; can't read instance id at hdfs://localhost:8020/accumulo/instance_id
and no further checks could be done. If this is unexpected, make sure an instance is running and re-run the command.

The subset of the checks have still occurred and passed (noted in the output), but the exception unfortunately makes the info less visible and it might be better if the exception was not seen here.

@dlmarion
Copy link
Contributor

Regarding the exception being printed, that might be due to the logging configuration for the client (log4j2.properties). If that is set to direct output to a file, then you likely would not see that. For that reason, any output that we have from the command may want to use System.out or System.err

@kevinrr888
Copy link
Member Author

kevinrr888 commented Nov 19, 2024

Gotcha. Changed to print the warning. That logged error from my running of it was my only concern with this, but if it's fine to you then I have no problems with it.

@dlmarion dlmarion requested a review from ctubbsii November 19, 2024 20:52
Checks the accumulo config at the given file path if one is provided.
Otherwise, attempts to find and check the config file. Useful for
checking the accumulo config without a running instance.
Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested alternate wording, up to you if you decide to take it or not. Otherwise, looks good.

- now only expect a user-provided file path
- avoid unneccessary catch and rethrow of an exception
- update desc and command name

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
@kevinrr888
Copy link
Member Author

Thanks for the reviews @dlmarion @ctubbsii
If I could get a final approval from you Christopher, I'll get this merged in. Thanks

import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;

@AutoService(KeywordExecutable.class)
public class CheckAccumuloConfig implements KeywordExecutable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give the new command name, I'd rename the file to CheckAccumuloProperties. After that, this should be fine to merge.

@kevinrr888 kevinrr888 merged commit 7b8b213 into apache:2.1 Dec 16, 2024
8 checks passed
@kevinrr888 kevinrr888 deleted the 2.1-feature-5034 branch December 16, 2024 15:38
asfgit pushed a commit that referenced this pull request Dec 17, 2024
* Remove main method and move implementation to the execute method in
  the new CheckAccumuloProperties command
* Add the new KeywordExecutable class to the IT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Utility to check configuration cannot be run prior to init
3 participants