docs: introduce new store v3 protocol #665

jm-clius · 2024-02-01T16:03:42Z

Note that this PR will not be merged. It is opened as a convenient way to review the new Store v3 protocol by comparing it to the current Store v2 protocol.

This PR builds on ideas first proposed in https://github.com/waku-org/waku-proto/pull/10/files

Background

This PR introduces a proposal for a new Store protocol design, termed "Store v3" to differentiate it from the current Store "v2" protocol with protocol ID 2.0.0-beta4.

This is a significant redesign and not backwards compatible with previous store versions.

This redesign has at least three purposes:

simplify and flatten the overly nested wire protocol of Store v2, especially around paging info
be "store sync ready": the sync mechanism currently being developed requires looking up full message contents using a set of deterministic message hashes as keys.
improve message reliability with presence checks: Store should provide a lightweight mechanism to confirm that a message is stored by looking up its deterministic message hash without having to retrieve the full message content

What to review

Please only review the Wire Protocol section.
Detailed error handling is purposefully left out of this draft PR, as it complicates the "happy path" specification.
Consider known and planned use cases (Status, third parties, store sync protocol) and whether this proposal addresses the requirements
Focus on the protobuf specification, field names, message names and message composition

richard-ramos · 2024-02-01T16:36:21Z

content/docs/rfcs/13/README.md

+  optional string pubsub_topic = 10;
+  repeated string content_topics = 11;


If your app is using autosharding should the pubsub_topic be omitted?

Also, if using only the content_topics in the criteria, how would you know on which pubsub topic was a message published. Should the pubsub topic be part of the WakuMessageKeyValue? (I imagine the answer to this depends if the pubsub topic of the message important to know at all, in the case of static/named sharding)

If your app is using autosharding

For now autosharding is simply a local API feature and does not translate to any wire protocols (in other words, your local API will derive the pubsub topic from your content topics and fill out the pubsub_topic field in the protobuf on the application's behalf).

if using only the content_topics in the criteria

That would be against the current stipulations, exactly to avoid any ambiguity. The pubsub_topic MUST always be included as qualifier when stipulating a list of content topics to filter against. See https://github.com/vacp2p/rfc/pull/665/files#diff-98f234d0b94fb690d628e58fe8a5b642c873123819753d1044963cf524931a6dR187-R190 but let me know if it can be clearer. :)

richard-ramos · 2024-02-01T16:43:26Z

content/docs/rfcs/13/README.md

+  optional int64 time_end = 13;
+
+  // List of key criteria for lookup queries
+  repeated bytes message_hashes = 20 // Message hashes (keys) to lookup


Since message_hashes are likely an unique index, I wonder if it would make sense to use something like this to use either a criteria or a list of hashes.

message StoreRequest { .... oneof search_criteria { Criteria criteria= 10; MessageHashes message_hashes = 20; } ... } message Criteria { // Filter criteria for content-filtered queries optional string pubsub_topic = 1; repeated string content_topics = 2; optional int64 time_start = 3; optional int64 time_end = 4; // Pagination info. 50 Reserved optional bytes pagination_cursor = 51; // Message hash (key) from where to start query (exclusive) bool pagination_forward = 52; optional uint64 pagination_limit = 53; } message MessageHashes { repeated bytes message_hashes = 1 }

On rethinking, it would make it simpler to split these into 2 different RPC's itself.
Few benefits i see:

This makes the criteria extensible for future use and keeps common criteria like topics ir-respective of type of search.

Makes each searchType mutually exclusive

message HashBasedStoreQuery { string request_id = 1; CommonCriteria criteria= 11; MessageHashes message_hashes = 20; bool return_values = 2; // Response should include full message content } message TimeBasedStoreQuery { string request_id = 1; CommonCriteria criteria= 11; optional int64 time_start = 3; optional int64 time_end = 4; // Pagination info. 50 Reserved optional bytes pagination_cursor = 51; // Message hash (key) from where to start query (exclusive) bool pagination_forward = 52; optional uint64 pagination_limit = 53; } message CommonCriteria { // Filter criteria for content-filtered queries optional string pubsub_topic = 1; repeated string content_topics = 2; } message MessageHashes { repeated bytes message_hashes = 1 }

The params proposed by @chaitanyaprem seems pretty complex, and the message hash based query don't need to depend on pubsub topic and content topics.

It seems we like making the query params splited based on scenarios, why not further use a different protocol id for history query and hash based query? The potential benefits it brings,

less frequent to upgrade some part of the protocol, which means less migration effort in clients.

easy to apply fine-grained accessibility control for different scenarios if needed.

likely no need to upgrade the old protocol id for its old purpose of history query.

not using any vac/store/ such overwhelming identifier, instead use waku/messages for hash based query, waku/messages/history for history based query. Also wondering if it's possible to remove vac..

Thank you for these suggestions and, indeed, these map to three alternatives I did consider. Either of these options may work and I'm certainly willing to consider different pros and cons. I'll briefly summarise my reasoning for coming to the conclusion of a single request-response "flat" structure wire protocol:

As to @richard-ramos idea of a nested oneof - this comes down to the balance of how much of the protocol we want to "enforce" in the protobuf wire protocol vs stipulating this in the spec and code. The problem with protobufs is that they do not lend themselves to backwards compatibility. If you start enforcing some protocol field rules in the protobuf you quickly end up with a wire protocol that cannot be extended for new use cases (hence almost all fields being optional in proto, even if mandatory in the protocol). oneof for example has issues with backwards compatibility (see https://protobuf.dev/programming-guides/proto3/#backward). If, say, we decide that certain content filter criteria may make sense in combination with message hashes for some use cases we'll not be able to change the protocol without breaking the wire encoding. A flat structure in the protobuf allows us to define these rules within the protocol without being dictated by protobuf limitations.

As to @chaitanyaprem idea of multiple RPCs - these would either have to be nested within a single request and single response message or split into two different protocol IDs (a la @kaichaosun's idea). There is no other way for the switch to "know" which message to decode off the wire (HashBased or StoreBased). The issue with a nested structure is again that you tie the protocol to a certain expression of the wire protocol (not to mention it being error-prone and complex, as is evident in the current, overly-nested Store protocol). For example, we don't currently need the CommonCriteria for hash based queries, but, indeed, we may think of a use case where we do in future. One way to do that is to keep that field reserved for the HashBased queries, but, then again, how many fields should we reserve and how many future use cases will we encounter that is not backwards compatible? Should we keep grouping these fields as CommonCriteria even if it is not currently "common" just to cover a future possibility? A flat structure allows all of these scenarios to be specified only in the protocol and code without breaking backwards compatibility (or doing very complex things) on the wire.

As to @kaichaosun's idea of two different protocol IDs - similar to my reply to @chaitanyaprem. If we treat the Store as a key-value store, we certainly need to be able to respond (to both hash-based and content queries) with either the message hash keys, the values or both. The StoreResponse to either type of query would therefore be the same. To me it seems a bit arbitrary to then split the request into two different protocols (that requires different mounting, config, etc. in libp2p, even though it's handling is coupled on the server-side in order to effect the same Store Response). Similarly, we may want to include content criteria to hash based queries in future, which would feel a bit artificial if done in a different protocol altogether.

For example, we don't currently need the CommonCriteria for hash based queries, but, indeed, we may think of a use case where we do in future.

One challenge i see with this approach is whether a StoreNode would itself know it is supposed to have this or it is just an invalid hash being queried. As of now the response would just be not found.
If a StoreNode is only serving specific pubsub or contentTopics, then having this info in the query can result in a different error response by the StoreNode.

richard-ramos · 2024-02-01T16:54:16Z

content/docs/rfcs/13/README.md

+  optional uint32 status_code = 10;
+  optional string status_desc = 11;


What do you think of making the status_code and desc required, returning a 200/OK for succesful cases. Thinking from an implementation perspective, having it optional would mean that there might be situations on which the status_code is empty? when would this happen?

+1 to this.
In-fact i am wondering if should standardize status code across all req/resp protocols and define a common enum to be used across , similar to HTTP ones.
200 - for success
4xx for invalid requests
5xx for server errors

Ah, the good old protobuf idiosyncrasies! There are no required fields in proto3 (https://protobuf.dev/programming-guides/proto3/#field-labels). The status_code is indeed mandatory (see https://github.com/vacp2p/rfc/pull/665/files#diff-98f234d0b94fb690d628e58fe8a5b642c873123819753d1044963cf524931a6dR253), but, very confusingly, the only way to make a field testably mandatory in protobuf is to mark it as optional. If a field is optional the decoder can test if the field has been set or not and react accordingly. If a mandatory field has not been set, the decoder can view this as a spec violation and raise some kind of error. Without the optional keyword the field will be decoded as "implicitly present" - that is, the client will happily decode an unset field as its default value of 0, as if that was the status code populated by the server. With optional the client can test if the store service node populated the status code correctly (as is required by the spec) and discard/ignore as a spec violation if the status_code is unset. Hope this makes sense! Protobuf optional fields can be difficult to explain. 😅

richard-ramos · 2024-02-01T17:01:55Z

content/docs/rfcs/13/README.md

+To filter on time range, the client MUST set `time_start`, `time_end` or both.
+An unset `time_start` SHOULD be interpreted as "from the oldest stored entry".
+An unset `time_end` SHOULD be interpreted as "up to the youngest stored entry".


It's probably worth mentioning that time_* fields should contain unix timestamp in nanoseconds

I don't think nanosecond timestamps are possible in all languages, i remember that probably in js it is not possible. Maybe milliseconds then. But i agree we should mention the granularity.

Since message will have a unique hash, does it make sense to use second for timestamp in every places?

Since message will have a unique hash, does it make sense to use second for timestamp in every places?

This is for the filter criteria for time-based queries, which doesn't rely on hashes. Maybe second granularity itself might be ok.

It's probably worth mentioning that time_* fields should contain unix timestamp in nanoseconds

Good point!

I don't think nanosecond timestamps are possible in all languages

But presumably the epoch time can be converted to nanosecond timestamp with a multiplier? I think it makes sense for these fields to be the same resolution as the timestamp field in the Waku Message itself (which is a nanosecond Unix epoch timestamp). None of the clients can or should generate timestamp with nanosecond specificity (in fact, I would timestamp Waku messages on second or half-second boundaries) - nanoseconds is just the unit in which our timestamp fields are populated.

chaitanyaprem · 2024-02-02T05:09:58Z

content/docs/rfcs/13/README.md

-    INVALID_CURSOR = 1;
-  }
-  Error error = 4;
+message StoreRequest {


Would it be better to name is StoreQuery as we are actually quering the Store and not requesting to Store data?

No strong opinion, but trying to start being more consistent in our Request-Response protocols by being explicit about the ...Request and ...Response messages (see Filter as an example). You could argue that we are sending the wire protocol request to the store node to perform a query, though this boils down to semantics.

Agreed, but when you read StoreRequest it immediately means that we are Requesting information to be stored.
Better to name it StoreQueryRequest then.

Right. Which would indeed be a rename of the protocol to "Store Query" protocol as you suggested elsewhere.

chaitanyaprem · 2024-02-02T05:11:23Z

content/docs/rfcs/13/README.md

-  Error error = 4;
+message StoreRequest {
+  string request_id = 1;
+  bool return_values = 2; // Response should include full message content


Maybe rename it to include_data?

chaitanyaprem · 2024-02-02T05:23:21Z

content/docs/rfcs/13/README.md

+  optional int64 time_end = 13;
+
+  // List of key criteria for lookup queries
+  repeated bytes message_hashes = 20 // Message hashes (keys) to lookup


On rethinking, it would make it simpler to split these into 2 different RPC's itself.
Few benefits i see:

This makes the criteria extensible for future use and keeps common criteria like topics ir-respective of type of search.

Makes each searchType mutually exclusive

message HashBasedStoreQuery { string request_id = 1; CommonCriteria criteria= 11; MessageHashes message_hashes = 20; bool return_values = 2; // Response should include full message content } message TimeBasedStoreQuery { string request_id = 1; CommonCriteria criteria= 11; optional int64 time_start = 3; optional int64 time_end = 4; // Pagination info. 50 Reserved optional bytes pagination_cursor = 51; // Message hash (key) from where to start query (exclusive) bool pagination_forward = 52; optional uint64 pagination_limit = 53; } message CommonCriteria { // Filter criteria for content-filtered queries optional string pubsub_topic = 1; repeated string content_topics = 2; } message MessageHashes { repeated bytes message_hashes = 1 }

chaitanyaprem · 2024-02-02T05:23:45Z

content/docs/rfcs/13/README.md

 }

-message HistoryRPC {
+message StoreResponse {


Maybe rename it to StoreQueryResponse?

chaitanyaprem · 2024-02-02T05:25:07Z

content/docs/rfcs/13/README.md

+  optional uint32 status_code = 10;
+  optional string status_desc = 11;


+1 to this.
In-fact i am wondering if should standardize status code across all req/resp protocols and define a common enum to be used across , similar to HTTP ones.
200 - for success
4xx for invalid requests
5xx for server errors

chaitanyaprem · 2024-02-02T05:25:46Z

content/docs/rfcs/13/README.md

- `receiverTime`: the UNIX time in nanoseconds at which the `WakuMessage` is received by the receiving node.
- `senderTime`: the UNIX time in nanoseconds at which the `WakuMessage` is generated by its sender.
- `pubsubTopic`: the pubsub topic on which the `WakuMessage` is received.
+The store protocol operates as a query protocol for a key-value store of historical Waku messages,


Should we start calling it Store Query Protocol?

chaitanyaprem · 2024-02-02T05:30:09Z

content/docs/rfcs/13/README.md

+To filter on time range, the client MUST set `time_start`, `time_end` or both.
+An unset `time_start` SHOULD be interpreted as "from the oldest stored entry".
+An unset `time_end` SHOULD be interpreted as "up to the youngest stored entry".


I don't think nanosecond timestamps are possible in all languages, i remember that probably in js it is not possible. Maybe milliseconds then. But i agree we should mention the granularity.

chaitanyaprem · 2024-02-02T05:32:39Z

content/docs/rfcs/13/README.md

-  `INVALID_CURSOR` means that the `cursor` field of `HistoryQuery` does not match with the `Index` of any of the `WakuMessage` persisted by the queried node.
+### Waku message sorting
+
+The key-value entries in the store MUST be time-sorted by the `WakuMessage` `timestamp` attribute.


Would't it better to also indicate that messages can be stored grouping by pubsubTopic and in fact segregated by contentTopic for easy management and search?

I am thinking it would narrow search space this way and also help to come up with different architectures of underlying Stores.

If I understand your suggestion correctly, this is about the actual storage schema and improving that? This would be the domain of the archive, which this RFC is not concerned with. The aim is simply to design the query interface language with no recommendations on how persistence/retrieval can be optimised.

Ok, but i am worried about a query being done on whole archive all the time whereas almost all the times the client querying is aware of pubsub and contentTopics for which it is querying.
Was not aware that archive was a different rfc, will go through it.

True, but this would still be up to how the Store service node decides to efficiently store and query nodes in the DB and should not be included in specification for the query language. There is no RFC for the archive currently.

chaitanyaprem · 2024-02-02T05:35:15Z

content/docs/rfcs/13/README.md

+
+A `StoreResponse` with a populated `pagination_cursor` indicates that more stored entries match the query than included in the response.
+
+A `StoreResponse` without a populated `pagination_cursor` indicates that


I a also wondering if response can include some indication showing that data older than requestedTime is purged or cleaned-up?
Otherwise there is no way for client to know if this was the only data available for that time-period or just that this node is limited due to its configuration.
This would prompt the client to go to another storenode to get additional data.

chaitanyaprem · 2024-02-02T05:37:45Z

content/docs/rfcs/13/README.md

+To filter on content topic, the client MUST populate _both_ the `pubsub_topic` _and_ `content_topics` field.
+The client MUST NOT populate either `pubsub_topic` or `content_topics` and leave the other unset.
+Both fields MUST either be set or unset.
+A mixed content topic filter with just one of either `pubsub_topic` or `content_topics` set, SHOULD be regarded as an invalid request.


Any specific reason for not allowing only pubsubTopic filter if we are allowing general StoreRequest without contentFilter itself?

This is at least partly because within Waku, the pubsubTopic is simply a "routing qualifier" for specific content topics. In fact, every content filter scenario that we know about so far (also for Filter protocol) is designed to work on a set of a content topics. However, in order for these content topics to be "fully qualified" (i.e. globally unique) they need to be namespaced within a pubsub topic. All this is a long way of saying the Waku protocols are designed with this implicit assumption that pubsub topic does not (or should not) carry any "content information" so app features shouldn't be built in a way that violate this. For Filter this assumption also simplifies the filter matching quite a bit. Admittedly for Store it will be relatively simple to allow filtering on pubsub topic, but my assumption is that the exact same concept of "content filter" should be applicable (and interchangeable) between Filter and Store.

Of course, open to discuss loosening this requirement for Store if we are convinced about specific use case.

I understand that for Filter it makes more sense as it is used only by nodes that use light-protocols. But Store queries are used by all types of nodes and was wondering if it helps in some scenarios to have pubsubTopic only filter.

Few scenarios i can think of:

a Store node only serving a single pubsubTopic comes back online and wants to catch-up. In this case it can start with a history-query with that specific pubsubTopic and then do a sync to ensure it has all the messages. There are alternative ways to sync we can think of for this specific case, but this could be the simplest way.

A use-case with static sharding e.g like Status where-in they have too many content-topics to specify for a Store-Query and wouldn't it be easier if they use a filter with pubsubTopic only when the node comes online after being offline for some time. But, that may eventually end-up in fine-grain filtering at application/client once all messages are received.

Will think more on this and add if i see any additional scenarios for this specific query.

kaichaosun · 2024-02-02T07:38:22Z

content/docs/rfcs/13/README.md

@@ -16,7 +16,7 @@ This specification explains the `13/WAKU2-STORE` protocol which enables querying
 stored by other nodes. 
 It also supports pagination for more efficient querying of historical messages. 

-**Protocol identifier***: `/vac/waku/store/2.0.0-beta4`


Is it safe to remove it since Status app already use this protocol, should such deletion follow a deprecation process?

We will not remove this protocol until all clients have been migrated. In fact, the reason for changing the protocol ID is for us to be able to support both protocols simultaneously, at least for a while. The old RFC will also be kept (clearly indicating that it's a legacy version of the protocol).

kaichaosun · 2024-02-02T08:01:53Z

content/docs/rfcs/13/README.md

+To filter on time range, the client MUST set `time_start`, `time_end` or both.
+An unset `time_start` SHOULD be interpreted as "from the oldest stored entry".
+An unset `time_end` SHOULD be interpreted as "up to the youngest stored entry".


Since message will have a unique hash, does it make sense to use second for timestamp in every places?

kaichaosun · 2024-02-02T08:25:38Z

content/docs/rfcs/13/README.md

+  optional int64 time_end = 13;
+
+  // List of key criteria for lookup queries
+  repeated bytes message_hashes = 20 // Message hashes (keys) to lookup


The params proposed by @chaitanyaprem seems pretty complex, and the message hash based query don't need to depend on pubsub topic and content topics.

It seems we like making the query params splited based on scenarios, why not further use a different protocol id for history query and hash based query? The potential benefits it brings,

less frequent to upgrade some part of the protocol, which means less migration effort in clients.

easy to apply fine-grained accessibility control for different scenarios if needed.

likely no need to upgrade the old protocol id for its old purpose of history query.

not using any vac/store/ such overwhelming identifier, instead use waku/messages for hash based query, waku/messages/history for history based query. Also wondering if it's possible to remove vac..

kaichaosun · 2024-02-02T08:48:21Z

content/docs/rfcs/13/README.md

+and [deterministic message hash](https://rfc.vac.dev/spec/14/#deterministic-message-hashing) as key.
+The store can be queried to return either a set of keys or a set of key-value pairs.
+Within the store protocol, Waku message keys and values MUST be represented in a `WakuMessageKeyValue` message.
+This message MUST contain the deterministic `message_hash` as key.


Waku currently doesn't provide any anti-fraud mechanism for store node, which means client must verify the message hash is the one required. Message history query and hash based query better not return message_hash for clarity.

The params proposed by @chaitanyaprem seems pretty complex, and the message hash based query don't need to depend on pubsub topic and content topics. It seems we like making the query params splited based on scenarios, why not further use a different protocol id for history query and hash based query

Now that you mention and i give it a rethink, rather than making the query complex, we can just have 2 different RPC's in same Store protocol one for messageHashQuery and one for TimeRange based Query. That would simplify the protobuf as well. Anyways there is no rule that a protobuf serve only 1 RPC.

I don't think have different protocol ID's for each RPC is a good idea especially since both are different kind of queries (i.e Store Queries).

I had updated my suggestion as well.

kaichaosun · 2024-02-02T08:57:53Z

content/docs/rfcs/13/README.md

+the client MUST NOT set any of the content filter fields,
+namely `pubsub_topic`, `content_topic`, `time_start`, or `time_end`.
+
+### Presence queries


Without anti-fraud, introduce presence of message query seems not valuable and add extra complexity.

In what way would you say this adds complexity? In fact, this is simply a hash based query and I'm just explaining this as one use case of hash-based queries. Trust assumptions are the same as for any other store query scenario. As for its value - this allows any client to query a store node for presence of a message hash known to the client, which makes it easy for e.g. lightpush clients to verify that their pushed messages have indeed been pushed to the network. Of course, the client would have to "trust" the store node or do redundant random checks to various store nodes to minimise probability of sybil attacks, but that is the case currently for all other store scenarios.

For presence queries, a malicious service provider could easily return the requested hash to client and the hash is actually missed to be persistent in the network. To handle the scenario, the client needs to either maintain multiple such service providers, or use a trusted node somehow, this extra process brings the complexity, which should be designed/communicated carefully.

For message hash based query, since client can verify the returned message match with requested hash, such attack is not possible.

For time ranged history query, it's only a convenient method to catch up, client should not trust the integrity of message histories brings with this query, instead client need to check the Merkle root of messages belongs to interested content topic (this is not implemented). And with performant sync protocol in the future (this is the reason why I think sync protocol should not only used between store nodes), there may be less requirements for such history query.

since client can verify the returned message match with requested hash, such attack is not possible.

But equally a malicious node may ignore the request forcing the client to resend the message or get confirmation from other store nodes. I agree that the client can also do "full checks" by simply requesting the entire message contents. The idea however is to provide a very lightweight method for clients to build reliability based on the redundancy and trust assumptions that make sense for their use case.
In every scenario there are two options:

Have some outside information that the node you're using is trustworthy - this is currently the case for all store node implementations and clients like Status Communities plan to use this model for a very long time (Community Owners set up their own trusted Store service nodes). A lightweight confirmation mechanism makes sense for such a use case and this use case is going to be with us for the long term. The protocol should provision for such use case without prescribing for the specific client application context.

Use redundancy and randomization to prevent Sybil attacks, similar to how gossipsub etc. works. This would be the case once Store service is fully decentralized, if clients choose to use this. For regular redundant checks a lightweight mechanism is necessary (a client shouldn't be expected to retrieve its own published message multiple times from multiple nodes). This doesn't preclude more thorough checks where full message is downloaded and checked against a hash - it just describes one way in which the protocol can be used to build a lightweight "do you know about these set of messages?" mechanism. Think of it as the hasKey() check for a key-value store.

this is the reason why I think sync protocol should not only used between store nodes

I agree. The idea is to build this into a truly decentralized reliability layer.

Community Owners set up their own trusted Store service nodes

This is a good point, and we should explicitly add such statement in the spec. I'm not very sure but my feeling is that use different protocol id even spec id could help make such information more noticeable to users.

But equally a malicious node may ignore the request forcing the client to resend the message or get confirmation from other store nodes.

Yeah, I agree there are more forms of the attack, I'm wondering how to improve or organize the spec to summarize our discussion and make spec readers in the same page.

SionoiS · 2024-02-13T13:48:36Z

content/docs/rfcs/13/README.md

-  }
-  Error error = 4;
+message StoreRequest {
+  string request_id = 1;


Maybe use an integer and/or call it nonce?

I would not depart unnecessarily from what is an established pattern for our other req-resp protocols

SionoiS · 2024-02-13T14:05:39Z

content/docs/rfcs/13/README.md

+  optional uint32 status_code = 10;
+  optional string status_desc = 11;
+
+  repeated WakuMessageKeyValue messages = 20;


Maybe a map ?

After coding the nwaku impl. I think that it would be harder to do.

Indeed. Especially if only one or the other field is present.

jm-clius · 2024-02-20T18:10:40Z

Thank you for all the comments so far.

I've addressed some of these in fcc214f

Specifically:

changed the name of the wire protocol to Store Query (we can perhaps consider Waku Store in future to be the combination of the Query and Archive protocols?). This also affects the libp2p protocol ID
changed return_values to include_data
clarified that timestamps are Unix epoch timestamps in nanoseconds

cc @kaichaosun @chaitanyaprem @SionoiS

richard-ramos

I'm requesting a change on the data type for timestamps because I ran into an interop issue between go-waku and nwaku due to go-waku's protobuffers using int64 and nwaku's impl using zint, and also because zint makes more sense according to the comments below

content/docs/rfcs/13/README.md

Co-authored-by: richΛrd <info@richardramos.me>

SionoiS · 2024-05-06T19:10:23Z

content/docs/rfcs/13/README.md

+  optional bytes message_hash = 1; // Globally unique key for a Waku Message
+  optional waku.message.v1.WakuMessage message = 2; // Full message content as value


When querying with hashes, there is no way to know which shard a message was sent on. WakuMessage does not contain the pubsub topic.

We can either add pubsub topic to WakuMessage or in the WakuMessageKeyValue

I'd vote for it to be part of the WakuMessageKeyValue so the WakuMessage protobuffer is not modified!

Good points. We indeed need the pubsub_topic. @richard-ramos @SionoiS @chaitanyaprem please take a look at 5d69ab5 and see if you agree with the addition.

Looks good.
Just wondering if we should return pubsubTopic or cluster and shard since anyways named sharding would be deprecated.

Not a strong opinion about this, but when we previously discussed this the intuition was:

Waku core protocols still operate on the idea of pubsub-topic (i.e. string) defined subnetworks, as a reflection of its libp2p base

the string formatting we provide for cluster and shard is run on top of this as a type of application, but it does not change the fundamental network definition

Of course, part of keeping it string is to remain consistent with filter, relay, lightpush for now too and avoid the potential inconsistency in cross-protocol composition. Perhaps if we do upgrade protocols to cluster-shard we should do all protocols at once (i.e. not now... :) )?

Agreed, better to upgrade all protos in one shot rather than partly. This will avoid confusion as well.

SionoiS

#665 (comment)

This blocks the implementation of Waku sync.

kaichaosun · 2024-05-07T11:30:45Z

content/docs/rfcs/13/README.md

-message ContentFilter {
-  string contentTopic = 1;
-}
+message WakuMessageKeyValue {


Maybe naming it WakuMessageExt?

content/docs/rfcs/13/README.md

Co-authored-by: Danish Arora <35004822+danisharora099@users.noreply.github.com>

danisharora099 · 2024-06-04T15:53:11Z

content/docs/rfcs/13/README.md

-  PagingInfo pagingInfo = 4;
+  // Full message content and associated pubsub_topic as value
+  optional waku.message.v1.WakuMessage message = 2;
+  optional string pubsub_topic = 3;


should this indeed be optional? what happens in case no pubsub topic is provided to the query?

It's not optional for content-based queries. The optional keyword in the protobuf simply means that clients can test if the field has been explicitly set or not (a requirement to distinguish unset fields from fields simply having the default value) - whether or not a field is mandatory is determined in the protocol specification, not the protobuf. See for example these rules for how pubsub_topic should be used for content queries: https://github.com/vacp2p/rfc/pull/665/files#diff-98f234d0b94fb690d628e58fe8a5b642c873123819753d1044963cf524931a6dR195-R198

jm-clius · 2024-06-06T14:06:09Z

Closing this draft, as the new Store protocol specification has been merged here: https://github.com/waku-org/specs/blob/master/standards/core/store.md

docs: introduce new store v3 protocol

10dc910

richard-ramos reviewed Feb 1, 2024

View reviewed changes

chaitanyaprem reviewed Feb 2, 2024

View reviewed changes

kaichaosun reviewed Feb 2, 2024

View reviewed changes

This was referenced Feb 5, 2024

Message hash based query #662

Closed

feat: allow query messages based on message hash from waku store waku-org/go-waku#1010

Closed

jm-clius mentioned this pull request Feb 13, 2024

feat: reference implementation for Store v3 protocol waku-org/nwaku#2425

Closed

2 tasks

SionoiS reviewed Feb 13, 2024

View reviewed changes

docs: change to Store Query protocol and clean up

fcc214f

SionoiS mentioned this pull request Mar 27, 2024

feat: store v3 waku-org/nwaku#2431

Merged

5 tasks

richard-ramos suggested changes Apr 25, 2024

View reviewed changes

content/docs/rfcs/13/README.md Outdated Show resolved Hide resolved

This was referenced Apr 25, 2024

feat: storeV3 client waku-org/go-waku#1028

Merged

StoreV3 protobuffer is missing waku-org/waku-proto#22

Open

fix: use sint64 for timestamps

21f5201

Co-authored-by: richΛrd <info@richardramos.me>

SionoiS approved these changes Apr 26, 2024

View reviewed changes

SionoiS reviewed May 6, 2024

View reviewed changes

SionoiS suggested changes May 6, 2024

View reviewed changes

fix: add pubsub_topic to WakuMessageKeyValue

5d69ab5

SionoiS approved these changes May 7, 2024

View reviewed changes

kaichaosun reviewed May 7, 2024

View reviewed changes

richard-ramos approved these changes May 7, 2024

View reviewed changes

danisharora099 reviewed Jun 4, 2024

View reviewed changes

content/docs/rfcs/13/README.md Outdated Show resolved Hide resolved

fix: missing semicolon

16d4fb7

Co-authored-by: Danish Arora <35004822+danisharora099@users.noreply.github.com>

danisharora099 reviewed Jun 4, 2024

View reviewed changes

jm-clius mentioned this pull request Jun 5, 2024

docs: add new version of store protocol waku-org/specs#21

Merged

jm-clius closed this Jun 6, 2024

		optional string pubsub_topic = 10;
		repeated string content_topics = 11;

		optional uint32 status_code = 10;
		optional string status_desc = 11;


		A `StoreResponse` with a populated `pagination_cursor` indicates that more stored entries match the query than included in the response.

		A `StoreResponse` without a populated `pagination_cursor` indicates that

		optional bytes message_hash = 1; // Globally unique key for a Waku Message
		optional waku.message.v1.WakuMessage message = 2; // Full message content as value

docs: introduce new store v3 protocol #665

docs: introduce new store v3 protocol #665

Conversation

jm-clius commented Feb 1, 2024

Background

What to review

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richard-ramos Feb 1, 2024 • edited Loading

Choose a reason for hiding this comment

chaitanyaprem Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richard-ramos Feb 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaichaosun Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaitanyaprem Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaichaosun Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaitanyaprem Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

kaichaosun Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaichaosun Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

jm-clius Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jm-clius commented Feb 20, 2024

richard-ramos left a comment

Choose a reason for hiding this comment

SionoiS May 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SionoiS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richard-ramos Feb 1, 2024 •

edited

Loading

chaitanyaprem Feb 2, 2024 •

edited

Loading

richard-ramos Feb 1, 2024 •

edited

Loading

kaichaosun Feb 2, 2024 •

edited

Loading

chaitanyaprem Feb 2, 2024 •

edited

Loading

kaichaosun Feb 2, 2024 •

edited

Loading

chaitanyaprem Feb 2, 2024 •

edited

Loading

kaichaosun Feb 2, 2024 •

edited

Loading

kaichaosun Feb 8, 2024 •

edited

Loading

jm-clius Feb 8, 2024 •

edited

Loading

SionoiS May 6, 2024 •

edited

Loading