-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server-generated column statistics #4680
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proto changes look structurally sound; but I have a larger question. Are we fundamentally performing computations here that can't be performed with our existing APIs / table operations? If the answer is "yes", would it be better to try and improve our fundamental operations to handle this use case? ColumnStatisticsRequest looks like a heavy hammer, and I'm sad to see the bespoke logic in server/src/main/java/io/deephaven/server/table/stats/
. (I understand this is mostly a copy of DHE code, but worth raising Qs IMO.)
replication/static/src/main/java/io/deephaven/replicators/ReplicateColumnStats.java
Show resolved
Hide resolved
context.set_code(grpc.StatusCode.UNIMPLEMENTED) | ||
context.set_details('Method not implemented!') | ||
raise NotImplementedError('Method not implemented!') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - every generated implementation looks like this. If you were to implement a python server, you would override this method and offer a real implementation.
Yes, and deliberately so - we do not have a "Count" operation that only counts non-nulls, and we do not have a "unique" operation that only selects the N most unique. Neither is impossible to add, but @rcaudy specifically prefers that for now we just stick to a specialized implementation like DHE has. As one alternative, we could implement the two above aggregations, which would give us these same operations, except now ticking - however, the only consumer of this API is the web UI's column stats popup, which doesn't expect a Table instance, and so doesn't support ticking data. So the only win here would be that we would implement two new aggregation types, and then not need the rest of this new code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a review of everything but the JS* classes.
proto/proto-backplane-grpc/src/main/proto/deephaven/proto/table.proto
Outdated
Show resolved
Hide resolved
replication/static/src/main/java/io/deephaven/replicators/ReplicateColumnStats.java
Show resolved
Hide resolved
server/src/main/java/io/deephaven/server/table/ops/ColumnStatisticsGrpcImpl.java
Show resolved
Hide resolved
server/src/main/java/io/deephaven/server/table/ops/ColumnStatisticsGrpcImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/deephaven/server/table/ops/ColumnStatisticsGrpcImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/deephaven/server/table/stats/CharacterChunkedStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/deephaven/server/table/stats/CharacterChunkedStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/deephaven/server/table/stats/CharacterChunkedStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/deephaven/server/table/stats/CharacterChunkedStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/deephaven/server/table/stats/DateTimeChunkedStats.java
Show resolved
Hide resolved
We discussed this at some length before doing the porting. I would prefer to use our aggs, and thus standardize on one set of code for this. What convinced me that we should just port is:
|
…sticsGrpcImpl.java Co-authored-by: Ryan Caudy <rcaudy@gmail.com>
Co-authored-by: Ryan Caudy <rcaudy@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Also ensures generating BigIntegers with a default range can contain some nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor bug, and some code cleanup suggestions.
Reintroduces the column statistics feature from DHE. This code is copied, with a few changes to how it behaves and how it functions:
Note that until #188 is solved, the JS API and web UI will not display the unique value list.