-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Apache Accumulo connector and documentation #5030
Conversation
Thanks for sending this! A couple of initial questions:
|
The checkstyle.xml is a leftover artifact from where I had originally developed the connector. It is no different -- I'll remove it. As far as the Copyright, I'll look into it. |
@electrum Would you be open to creating the standard Apache http://www.apache.org/legal/src-headers.html#header-existingcopyright
(emphasis mine) |
Yeah, that sounds like a good idea.
|
We already have a NOTICE file, but it's not at the root of the source tree (primarily because it needs to get packaged and distributed as part of the binary package) https://github.com/prestodb/presto/blob/master/presto-server/NOTICE |
|
||
// Check all the column types, and throw an exception if the types of a map are complex | ||
// While it is a rare case, this is not supported by the Accumulo connector | ||
Set<String> columnNames = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableSet.Builder
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
About tests (sorry, cannot comment on diff because GitHub complains about too long PR): |
The test suites use MiniAccumuloCluster to stand up a single-node instance of HDFS, ZooKeeper, and Accumulo on the node executing the tests, which sounds similar to the goal of Tempto (though I am unfamiliar with Tempto -- I can look into it). Regarding the Travis build, it was failing earlier due to the |
LOG.info("Shutting down MAC"); | ||
accumulo.stop(); | ||
LOG.info("Cleaning up MAC directory"); | ||
FileUtils.forceDelete(macDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if accumulo.stop()
throws, won't you delete macDir
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a directory under /tmp
, but I should move it to its own try/catch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in one flow you delete it, in other not - it's inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Updated version to 0.151-SNAPSHOT, made API changes based on upstream changes, increased AccumuloQueryRunner to use 4 workers, and updated documentation.
Fixed new import locations in AccumuloStringFunctions.
- Removed unused variables - Made many member variables 'final' - Removed unnecessary Javadoc from self-explanatory methods - Removed unnecessary 'throws' declarations - Constructor arguments are now one per line - Connector config properties now start with 'accumulo' - Removed unused public static final variables in session properties - Remove unused properties in presto-accumulo/pom.xml - Remove zookeeper dependency and add runtime scope to hadoop artifacts - Add exclusions to remove dependency version check plugin - Add exclusions for duplicate classes - Remove presto-main from pom.xml, including code updates to remove classes from the presto-main package - Remove forking of test cases - Remove provided scope from some dependencies - Remove accumulo.properties file - Move Logger to top of file - Remove TPC-H test cases - Remove log4j.properties - Standardize new line wrapping - Fix weird AccumuloSplit serialization problem - More exclusions in pom file - Apply API updates from rebase
- Add flush to AccumuloPageSink For large table inserts, the AccumuloPageSink can cause a JVM crash due to the client-side buffering of metrics. This adds a temporary fix while a better solution can be implemented. - Rename FLOAT to REAL - Remove serialVersionUID - Use getConstructor.newInstance - Use Guice to inject Connector vs static singleton Connector remains singleton in testing code to ensure all tests use the same MiniAccumuloCluster instance - Remove setOptionalConfig - Add exception clause for ZKMetadataManager race condition - Remove provided scope from javax.inject and jackson deps - Wrap Range objects in AccumuloSplit - Fix NPE in listTables - Documentation improvements - Validate exception class - Remove use of ConnectorFactoryContext - Make io.airlift.units provided
Some predicates were being pushed to Accumulo for server-side filtering, however the current implementation actually slows queries down. These iterators were stale code -- best to remove them for now until a sufficient implementation surfaces.
- Use StandardErrorCodes where applicable - Remove commented out tests - Remove stale 'addColumn'
It looks like the |
working with data. By default, the first column of the table definition | ||
is set to the Accumulo row ID. This should be the primary key of your | ||
table, and keep in mind that any ``INSERT statements`` containing the same | ||
row ID is effectively an UPDATE as far as Accumulo is concerned, as any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this how Accumulo works in general? It seems to me that this should fail the INSERT, and the user should have to issue a DELETE first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is how it works in general. All inserts will either insert a new row or overwrite any existing values.
@cberner My original intent was having the metadata manager be pluggable, but I don't foresee people really doing that since ZooKeeper has to be there in the first place and is a reasonable way of managing the Presto/Accumulo metadata. I'll remove the property and pluggable nature of it. |
👍 Cool, let me know when it's ready for another look. I think the rest of it looked good to me |
@cberner Back to you! |
Squashed the commits together and merged. Thanks! |
A Presto connector for Apache Accumulo. See the RST in presto-docs for more details. I've completed the CLA.