-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregations (write side) #5082
Conversation
@@ -57,6 +57,10 @@ impl blockchain::Block for Block { | |||
fn parent_ptr(&self) -> Option<BlockPtr> { | |||
None | |||
} | |||
|
|||
fn timestamp(&self) -> BlockTime { | |||
BlockTime::NONE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this type be called something like MaybeBlockTime or something just to indicate it may be none even though it's not an option?
I guess the substreams notion of blocks doesn't fit the aggregations model anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NONE
thing should really go away (maybe except for tests) I just put it there for the few cases where I wasn't sure what's happening (like substreams)
|
||
**TODO** It might be necessary to allow `@aggregate` fields that are only | ||
used for some intervals. We could allow that with syntax like | ||
`@aggregate(fn: .., arg: .., interval: "day")` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like more complicated aggregations could benefit from a mapping to do them? Essentially providing a (x, x) -> x function to generalise.
graph/src/schema/input_schema.rs
Outdated
Sum, | ||
Max, | ||
Min, | ||
Cnt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't lose much by calling this Count, this also reads funny for immature people like me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, yeah, not sure why I wrote that. Changed it
graph/src/schema/input_schema.rs
Outdated
pub fn aggregation<'a>(&self, schema: &'a InputSchema) -> &'a Aggregation { | ||
schema.inner.type_infos[self.aggregation] | ||
.aggregation() | ||
.expect("the aggregation source is an object types") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.expect("the aggregation source is an object types") | |
.expect("the aggregation source is an object type") |
.expect("the aggregation source is an object types") | ||
} | ||
|
||
pub fn agg_type(&self, schema: &InputSchema) -> EntityType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "aggregation_type" is more readable
|
||
/// The field needed for the finalised aggregation for hourly/daily | ||
/// values | ||
fn as_agg_field(&self) -> Field { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
impl Aggregation { | ||
pub fn new(schema: &Schema, pool: &AtomPool, agg_type: &s::ObjectType) -> Self { | ||
let name = pool.lookup(&agg_type.name).unwrap(); | ||
let id_type = IdType::try_from(agg_type).expect("validation caught any issues here"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return an error instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we should never get here since validations reject a subgraph where the type of id
would fail this conversion. The problem is that we first do validations, and then extract things from the validated AST. It would be much better to do the extraction at the same time as validation, but that's a pretty big change that I didn't want to also shoehorn into this PR.
A big part of this and a couple previous PR's is to isolate these kinds of conversions by only doing them while we construct the InputSchema
; previously, a lot of this happened everywhere where the AST was used (since there were no internal data structures to represent our idea of schema information)
|
||
impl Aggregation { | ||
pub fn new(schema: &Schema, pool: &AtomPool, agg_type: &s::ObjectType) -> Self { | ||
let name = pool.lookup(&agg_type.name).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these cases could benefit from a lookup_unchecked or something which does the unwrap. There's no scenario where this lookup can fail? I guess the ObjectType parsing probably adds it here but would probably be worth adding a comment.
There are a couple of similar cases around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any failure here is a programming error since the agg_type.name
should have been interned already. All a lookup_unchecked
could do is lookup(..).unwrap()
anyway, but the knowledge that this lookup can't fail (except for genuine bugs) resides here, in the usage of the pool.
chain/substreams/src/mapper.rs
Outdated
// This is not a great idea: we really always need a timestamp; if | ||
// substreams doesn't give us one, we use a fixed one which will | ||
// lead to all kinds of strange behavior | ||
let timestamp = timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the clock can be assumed to always be there so might be worth adding a log in case we need to do this or a metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it should always be present, I just made it missing an error (which gets rid of one of the iffy uses of BlockTime::NONE
)
The distinction between mutable and immutable object types isn't important enough to express it at the level of the enum.
Make sure we disallow setting and getting interfaces
Also remove `is_immutable` from ObjectTypeExt; code that needs to know the value of these flags should use our own ObjectType
For timeseries, users can not set the timestamp. Instead, the time for the current block is used.
With that, there's also no need for an environment variable that controls whether aggregations are allowed or not.
This PR introduces aggregations (see docs/aggregations.md for details) It only implements the write side, i.e., with this pull request, we populate aggregation tables but there is not yet a way to query aggregations.
Until we have a native
Timestamp
type, timestamps are represented as anInt8
as seconds since the Unix epoch.An example subgraph that aggregates ethereum block numbers can be found here To use it,
graph-node
must be run withGRAPH_MAX_SPEC_VERSION="0.1.1"
. To deploy it,graph-cli
needs to be patched:graph-tooling
repo, using thelutter/agg-minimal
branch (this PR)pnpm install && pnpm build
in thegraph-tooling
checkoutyarn install && pnpm link <path to graph-tooling/packages/cli>
in the checkout of the subgraphgraph-node