New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Aggregations (write side) #5082

Merged

lutter merged 28 commits into master from lutter/agg

Jan 23, 2024

Collaborator

lutter commented Dec 15, 2023 •

edited

Loading

This PR introduces aggregations (see docs/aggregations.md for details) It only implements the write side, i.e., with this pull request, we populate aggregation tables but there is not yet a way to query aggregations.

Until we have a native Timestamp type, timestamps are represented as an Int8 as seconds since the Unix epoch.

An example subgraph that aggregates ethereum block numbers can be found here To use it, graph-node must be run with GRAPH_MAX_SPEC_VERSION="0.1.1". To deploy it, graph-cli needs to be patched:

Check out the graph-tooling repo, using the lutter/agg-minimal branch (this PR)
Run pnpm install && pnpm build in the graph-tooling checkout
Run yarn install && pnpm link <path to graph-tooling/packages/cli> in the checkout of the subgraph
Deploy the subgraph against a local graph-node

fordN requested review from mangas and leoyvens

December 19, 2023 16:46

mangas reviewed

View reviewed changes

chain/substreams/src/chain.rs

@@ @@ -57,6 +57,10 @@ impl blockchain::Block for Block { @@
                   fn parent_ptr(&self) -> Option<BlockPtr> {
                       None
                   }
+                  fn timestamp(&self) -> BlockTime {
+                      BlockTime::NONE

Contributor

mangas Jan 12, 2024

Should this type be called something like MaybeBlockTime or something just to indicate it may be none even though it's not an option?

I guess the substreams notion of blocks doesn't fit the aggregations model anyway

Collaborator Author

lutter Jan 12, 2024

The NONE thing should really go away (maybe except for tests) I just put it there for the few cases where I wasn't sure what's happening (like substreams)

mangas reviewed

View reviewed changes

docs/aggregations.md

+              **TODO** It might be necessary to allow `@aggregate` fields that are only
+              used for some intervals. We could allow that with syntax like
+              `@aggregate(fn: .., arg: .., interval: "day")`

Contributor

mangas Jan 12, 2024

feels like more complicated aggregations could benefit from a mapping to do them? Essentially providing a (x, x) -> x function to generalise.

mangas reviewed

View reviewed changes

graph/src/schema/input_schema.rs Outdated

+                  Sum,
+                  Max,
+                  Min,
+                  Cnt,

Contributor

mangas Jan 12, 2024 •

edited

Loading

I think we don't lose much by calling this Count, this also reads funny for immature people like me :)

Collaborator Author

lutter Jan 12, 2024

Heh, yeah, not sure why I wrote that. Changed it

mangas reviewed

View reviewed changes

graph/src/schema/input_schema.rs Outdated

+                  pub fn aggregation<'a>(&self, schema: &'a InputSchema) -> &'a Aggregation {
+                      schema.inner.type_infos[self.aggregation]
+                          .aggregation()
+                          .expect("the aggregation source is an object types")

Contributor

mangas Jan 12, 2024

Suggested change

      
                        .expect("the aggregation source is an object types")
          
                        .expect("the aggregation source is an object type")

mangas reviewed

View reviewed changes

graph/src/schema/input_schema.rs

+                          .expect("the aggregation source is an object types")
+                  }
+                  pub fn agg_type(&self, schema: &InputSchema) -> EntityType {

Contributor

mangas Jan 12, 2024 •

edited

Loading

I think "aggregation_type" is more readable

mangas reviewed

View reviewed changes

graph/src/schema/input_schema.rs

+                  /// The field needed for the finalised aggregation for hourly/daily
+                  /// values
+                  fn as_agg_field(&self) -> Field {

Contributor

mangas Jan 12, 2024

same here

mangas reviewed

View reviewed changes

graph/src/schema/input_schema.rs

+              impl Aggregation {
+                  pub fn new(schema: &Schema, pool: &AtomPool, agg_type: &s::ObjectType) -> Self {
+                      let name = pool.lookup(&agg_type.name).unwrap();
+                      let id_type = IdType::try_from(agg_type).expect("validation caught any issues here");

Contributor

mangas Jan 12, 2024

Should this return an error instead?

Collaborator Author

lutter Jan 12, 2024

No, we should never get here since validations reject a subgraph where the type of id would fail this conversion. The problem is that we first do validations, and then extract things from the validated AST. It would be much better to do the extraction at the same time as validation, but that's a pretty big change that I didn't want to also shoehorn into this PR.

A big part of this and a couple previous PR's is to isolate these kinds of conversions by only doing them while we construct the InputSchema; previously, a lot of this happened everywhere where the AST was used (since there were no internal data structures to represent our idea of schema information)

mangas reviewed

View reviewed changes

graph/src/schema/input_schema.rs

+              impl Aggregation {
+                  pub fn new(schema: &Schema, pool: &AtomPool, agg_type: &s::ObjectType) -> Self {
+                      let name = pool.lookup(&agg_type.name).unwrap();

Contributor

mangas Jan 12, 2024

I think these cases could benefit from a lookup_unchecked or something which does the unwrap. There's no scenario where this lookup can fail? I guess the ObjectType parsing probably adds it here but would probably be worth adding a comment.

There are a couple of similar cases around

Collaborator Author

lutter Jan 12, 2024

Any failure here is a programming error since the agg_type.name should have been interned already. All a lookup_unchecked could do is lookup(..).unwrap() anyway, but the knowledge that this lookup can't fail (except for genuine bugs) resides here, in the usage of the pool.

mangas reviewed

View reviewed changes

chain/substreams/src/mapper.rs Outdated

+                      // This is not a great idea: we really always need a timestamp; if
+                      // substreams doesn't give us one, we use a fixed one which will
+                      // lead to all kinds of strange behavior
+                      let timestamp = timestamp

Contributor

mangas Jan 12, 2024

IIRC the clock can be assumed to always be there so might be worth adding a log in case we need to do this or a metric.

Collaborator Author

lutter Jan 12, 2024

Since it should always be present, I just made it missing an error (which gets rid of one of the iffy uses of BlockTime::NONE)

mangas approved these changes

View reviewed changes

lutter added 18 commits

January 23, 2024 09:39


          graph: Remove TypeInfo.fields; use the fields of the included type

d1b195f


          graph: Merge TypeInfo and TypeKind into one enum

93a39d4


          graph: Simplify TypeInfo

5911dad

The distinction between mutable and immutable object types isn't important
enough to express it at the level of the enum.


          graph, store: Make using InputSchema::object_type on an interface an …

eb976a3

…error


          graph: Add a very simple schema test for timeseries

9f942a6


          graph: Validate timeseries and aggregation types

d975f94


          graph: Aggregation type

31a8c8c


          store: Test store get/set disallows interfaces

040a433

Make sure we disallow setting and getting interfaces


          graph, runtime: Disallow store_set/get/remove for all but @entity

8483f8f


          graph, store: Generate DDL for aggregations

150dc63


          graph: Provide context when parsing an id fails

a7bd9d5


          graph: Parse @entity flags into our ObjectType

dfc8472

Also remove `is_immutable` from ObjectTypeExt; code that needs to know the
value of these flags should use our own ObjectType


          all: Remove explicit BlockPtr argument from process_mapping_trigger

362572c


          all: Provide the block time to mapping handlers

ef09512


          all: Pass the timestamp of the current block to the store

d6f4813


          core, graph, store: Store block time in PoI

4b7a332


          graph, substreams: Suppress some doc generation to avoid name conflicts

6bd54f8


          graph, store: Track time when last rollup for timeseries happened

fa3ad33

lutter added 2 commits

January 23, 2024 09:39


          graph, runtime: Override user-supplied timestamp in store.set

cc788a9

For timeseries, users can not set the timestamp. Instead, the time for the
current block is used.


          graph, store: Perform rollups of aggregations

28cc721

lutter force-pushed the lutter/agg branch from 77ed816 to cb032d7 Compare

January 23, 2024 17:41

lutter added 8 commits

January 23, 2024 09:55


          all: Thread the spec version into InputSchema validation

c28381e


          graph: Gate aggregations and Int8 id on new spec version

0f6498d

With that, there's also no need for an environment variable that controls
whether aggregations are allowed or not.


          graph, store: Add a min aggregate fn

90587d5


          store, graph: Add first and last aggregate functions

ac6a1dd


          docs: Document aggregations

a54af2b


          graph, store: Rename count to AggregateFn::Count

a5b98d7


          substreams: Make a missing timestamp an error

8eaa3e5


          graph: Fix typo

c3f4ec4

lutter force-pushed the lutter/agg branch from cb032d7 to c3f4ec4 Compare

January 23, 2024 17:58

lutter merged commit c3f4ec4 into master

7 checks passed

lutter deleted the lutter/agg branch

January 23, 2024 18:26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet