-
Notifications
You must be signed in to change notification settings - Fork 14
Conversation
optional string pubsub_topic = 10; | ||
repeated string content_topics = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your app is using autosharding should the pubsub_topic be omitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if using only the content_topics in the criteria, how would you know on which pubsub topic was a message published. Should the pubsub topic be part of the WakuMessageKeyValue? (I imagine the answer to this depends if the pubsub topic of the message important to know at all, in the case of static/named sharding)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your app is using autosharding
For now autosharding is simply a local API feature and does not translate to any wire protocols (in other words, your local API will derive the pubsub topic from your content topics and fill out the pubsub_topic
field in the protobuf on the application's behalf).
if using only the content_topics in the criteria
That would be against the current stipulations, exactly to avoid any ambiguity. The pubsub_topic
MUST always be included as qualifier when stipulating a list of content topics to filter against. See https://github.com/vacp2p/rfc/pull/665/files#diff-98f234d0b94fb690d628e58fe8a5b642c873123819753d1044963cf524931a6dR187-R190 but let me know if it can be clearer. :)
content/docs/rfcs/13/README.md
Outdated
optional int64 time_end = 13; | ||
|
||
// List of key criteria for lookup queries | ||
repeated bytes message_hashes = 20 // Message hashes (keys) to lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since message_hashes are likely an unique index, I wonder if it would make sense to use something like this to use either a criteria or a list of hashes.
message StoreRequest {
....
oneof search_criteria {
Criteria criteria= 10;
MessageHashes message_hashes = 20;
}
...
}
message Criteria {
// Filter criteria for content-filtered queries
optional string pubsub_topic = 1;
repeated string content_topics = 2;
optional int64 time_start = 3;
optional int64 time_end = 4;
// Pagination info. 50 Reserved
optional bytes pagination_cursor = 51; // Message hash (key) from where to start query (exclusive)
bool pagination_forward = 52;
optional uint64 pagination_limit = 53;
}
message MessageHashes {
repeated bytes message_hashes = 1
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On rethinking, it would make it simpler to split these into 2 different RPC's itself.
Few benefits i see:
- This makes the criteria extensible for future use and keeps common criteria like topics ir-respective of type of search.
- Makes each searchType mutually exclusive
message HashBasedStoreQuery {
string request_id = 1;
CommonCriteria criteria= 11;
MessageHashes message_hashes = 20;
bool return_values = 2; // Response should include full message content
}
message TimeBasedStoreQuery {
string request_id = 1;
CommonCriteria criteria= 11;
optional int64 time_start = 3;
optional int64 time_end = 4;
// Pagination info. 50 Reserved
optional bytes pagination_cursor = 51; // Message hash (key) from where to start query (exclusive)
bool pagination_forward = 52;
optional uint64 pagination_limit = 53;
}
message CommonCriteria {
// Filter criteria for content-filtered queries
optional string pubsub_topic = 1;
repeated string content_topics = 2;
}
message MessageHashes {
repeated bytes message_hashes = 1
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The params proposed by @chaitanyaprem seems pretty complex, and the message hash based query don't need to depend on pubsub topic and content topics.
It seems we like making the query params splited based on scenarios, why not further use a different protocol id for history query and hash based query? The potential benefits it brings,
- less frequent to upgrade some part of the protocol, which means less migration effort in clients.
- easy to apply fine-grained accessibility control for different scenarios if needed.
- likely no need to upgrade the old protocol id for its old purpose of history query.
- not using any
vac/store/
such overwhelming identifier, instead usewaku/messages
for hash based query,waku/messages/history
for history based query. Also wondering if it's possible to removevac
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these suggestions and, indeed, these map to three alternatives I did consider. Either of these options may work and I'm certainly willing to consider different pros and cons. I'll briefly summarise my reasoning for coming to the conclusion of a single request-response "flat" structure wire protocol:
As to @richard-ramos idea of a nested oneof
- this comes down to the balance of how much of the protocol we want to "enforce" in the protobuf wire protocol vs stipulating this in the spec and code. The problem with protobufs is that they do not lend themselves to backwards compatibility. If you start enforcing some protocol field rules in the protobuf you quickly end up with a wire protocol that cannot be extended for new use cases (hence almost all fields being optional
in proto, even if mandatory in the protocol). oneof
for example has issues with backwards compatibility (see https://protobuf.dev/programming-guides/proto3/#backward). If, say, we decide that certain content filter criteria may make sense in combination with message hashes for some use cases we'll not be able to change the protocol without breaking the wire encoding. A flat structure in the protobuf allows us to define these rules within the protocol without being dictated by protobuf limitations.
As to @chaitanyaprem idea of multiple RPCs - these would either have to be nested within a single request and single response message or split into two different protocol IDs (a la @kaichaosun's idea). There is no other way for the switch to "know" which message to decode off the wire (HashBased
or StoreBased
). The issue with a nested structure is again that you tie the protocol to a certain expression of the wire protocol (not to mention it being error-prone and complex, as is evident in the current, overly-nested Store protocol). For example, we don't currently need the CommonCriteria
for hash based queries, but, indeed, we may think of a use case where we do in future. One way to do that is to keep that field reserved
for the HashBased
queries, but, then again, how many fields should we reserve and how many future use cases will we encounter that is not backwards compatible? Should we keep grouping these fields as CommonCriteria
even if it is not currently "common" just to cover a future possibility? A flat structure allows all of these scenarios to be specified only in the protocol and code without breaking backwards compatibility (or doing very complex things) on the wire.
As to @kaichaosun's idea of two different protocol IDs - similar to my reply to @chaitanyaprem. If we treat the Store as a key-value store, we certainly need to be able to respond (to both hash-based and content queries) with either the message hash keys, the values or both. The StoreResponse
to either type of query would therefore be the same. To me it seems a bit arbitrary to then split the request into two different protocols (that requires different mounting, config, etc. in libp2p, even though it's handling is coupled on the server-side in order to effect the same Store Response). Similarly, we may want to include content criteria to hash based queries in future, which would feel a bit artificial if done in a different protocol altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, we don't currently need the CommonCriteria for hash based queries, but, indeed, we may think of a use case where we do in future.
One challenge i see with this approach is whether a StoreNode would itself know it is supposed to have this or it is just an invalid hash being queried. As of now the response would just be not found.
If a StoreNode is only serving specific pubsub or contentTopics, then having this info in the query can result in a different error response by the StoreNode.
optional uint32 status_code = 10; | ||
optional string status_desc = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of making the status_code and desc required, returning a 200/OK for succesful cases. Thinking from an implementation perspective, having it optional would mean that there might be situations on which the status_code is empty? when would this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this.
In-fact i am wondering if should standardize status code across all req/resp protocols and define a common enum to be used across , similar to HTTP ones.
200 - for success
4xx for invalid requests
5xx for server errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the good old protobuf idiosyncrasies! There are no required
fields in proto3 (https://protobuf.dev/programming-guides/proto3/#field-labels). The status_code
is indeed mandatory (see https://github.com/vacp2p/rfc/pull/665/files#diff-98f234d0b94fb690d628e58fe8a5b642c873123819753d1044963cf524931a6dR253), but, very confusingly, the only way to make a field testably mandatory in protobuf is to mark it as optional
. If a field is optional
the decoder can test if the field has been set or not and react accordingly. If a mandatory field has not been set, the decoder can view this as a spec violation and raise some kind of error. Without the optional
keyword the field will be decoded as "implicitly present" - that is, the client will happily decode an unset field as its default value of 0
, as if that was the status code populated by the server. With optional
the client can test if the store service node populated the status code correctly (as is required by the spec) and discard/ignore as a spec violation if the status_code
is unset. Hope this makes sense! Protobuf optional
fields can be difficult to explain. 😅
content/docs/rfcs/13/README.md
Outdated
To filter on time range, the client MUST set `time_start`, `time_end` or both. | ||
An unset `time_start` SHOULD be interpreted as "from the oldest stored entry". | ||
An unset `time_end` SHOULD be interpreted as "up to the youngest stored entry". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth mentioning that time_* fields should contain unix timestamp in nanoseconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think nanosecond timestamps are possible in all languages, i remember that probably in js it is not possible. Maybe milliseconds then. But i agree we should mention the granularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since message will have a unique hash, does it make sense to use second
for timestamp in every places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since message will have a unique hash, does it make sense to use
second
for timestamp in every places?
This is for the filter criteria for time-based queries, which doesn't rely on hashes. Maybe second granularity itself might be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth mentioning that time_* fields should contain unix timestamp in nanoseconds
Good point!
I don't think nanosecond timestamps are possible in all languages
But presumably the epoch time can be converted to nanosecond timestamp with a multiplier? I think it makes sense for these fields to be the same resolution as the timestamp
field in the Waku Message itself (which is a nanosecond Unix epoch timestamp). None of the clients can or should generate timestamp with nanosecond specificity (in fact, I would timestamp Waku messages on second or half-second boundaries) - nanoseconds is just the unit in which our timestamp fields are populated.
content/docs/rfcs/13/README.md
Outdated
INVALID_CURSOR = 1; | ||
} | ||
Error error = 4; | ||
message StoreRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to name is StoreQuery
as we are actually quering the Store and not requesting to Store data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion, but trying to start being more consistent in our Request-Response protocols by being explicit about the ...Request
and ...Response
messages (see Filter as an example). You could argue that we are sending the wire protocol request to the store node to perform a query, though this boils down to semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but when you read StoreRequest
it immediately means that we are Requesting information to be stored.
Better to name it StoreQueryRequest
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Which would indeed be a rename of the protocol to "Store Query" protocol as you suggested elsewhere.
content/docs/rfcs/13/README.md
Outdated
Error error = 4; | ||
message StoreRequest { | ||
string request_id = 1; | ||
bool return_values = 2; // Response should include full message content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename it to include_data
?
content/docs/rfcs/13/README.md
Outdated
optional int64 time_end = 13; | ||
|
||
// List of key criteria for lookup queries | ||
repeated bytes message_hashes = 20 // Message hashes (keys) to lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On rethinking, it would make it simpler to split these into 2 different RPC's itself.
Few benefits i see:
- This makes the criteria extensible for future use and keeps common criteria like topics ir-respective of type of search.
- Makes each searchType mutually exclusive
message HashBasedStoreQuery {
string request_id = 1;
CommonCriteria criteria= 11;
MessageHashes message_hashes = 20;
bool return_values = 2; // Response should include full message content
}
message TimeBasedStoreQuery {
string request_id = 1;
CommonCriteria criteria= 11;
optional int64 time_start = 3;
optional int64 time_end = 4;
// Pagination info. 50 Reserved
optional bytes pagination_cursor = 51; // Message hash (key) from where to start query (exclusive)
bool pagination_forward = 52;
optional uint64 pagination_limit = 53;
}
message CommonCriteria {
// Filter criteria for content-filtered queries
optional string pubsub_topic = 1;
repeated string content_topics = 2;
}
message MessageHashes {
repeated bytes message_hashes = 1
}
content/docs/rfcs/13/README.md
Outdated
} | ||
|
||
message HistoryRPC { | ||
message StoreResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename it to StoreQueryResponse
?
optional uint32 status_code = 10; | ||
optional string status_desc = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this.
In-fact i am wondering if should standardize status code across all req/resp protocols and define a common enum to be used across , similar to HTTP ones.
200 - for success
4xx for invalid requests
5xx for server errors
content/docs/rfcs/13/README.md
Outdated
- `receiverTime`: the UNIX time in nanoseconds at which the `WakuMessage` is received by the receiving node. | ||
- `senderTime`: the UNIX time in nanoseconds at which the `WakuMessage` is generated by its sender. | ||
- `pubsubTopic`: the pubsub topic on which the `WakuMessage` is received. | ||
The store protocol operates as a query protocol for a key-value store of historical Waku messages, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we start calling it Store Query Protocol?
content/docs/rfcs/13/README.md
Outdated
To filter on time range, the client MUST set `time_start`, `time_end` or both. | ||
An unset `time_start` SHOULD be interpreted as "from the oldest stored entry". | ||
An unset `time_end` SHOULD be interpreted as "up to the youngest stored entry". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think nanosecond timestamps are possible in all languages, i remember that probably in js it is not possible. Maybe milliseconds then. But i agree we should mention the granularity.
`INVALID_CURSOR` means that the `cursor` field of `HistoryQuery` does not match with the `Index` of any of the `WakuMessage` persisted by the queried node. | ||
### Waku message sorting | ||
|
||
The key-value entries in the store MUST be time-sorted by the `WakuMessage` `timestamp` attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would't it better to also indicate that messages can be stored grouping by pubsubTopic and in fact segregated by contentTopic for easy management and search?
I am thinking it would narrow search space this way and also help to come up with different architectures of underlying Stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your suggestion correctly, this is about the actual storage schema and improving that? This would be the domain of the archive, which this RFC is not concerned with. The aim is simply to design the query interface language with no recommendations on how persistence/retrieval can be optimised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but i am worried about a query being done on whole archive all the time whereas almost all the times the client querying is aware of pubsub and contentTopics for which it is querying.
Was not aware that archive was a different rfc, will go through it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but this would still be up to how the Store service node decides to efficiently store and query nodes in the DB and should not be included in specification for the query language. There is no RFC for the archive currently.
content/docs/rfcs/13/README.md
Outdated
|
||
A `StoreResponse` with a populated `pagination_cursor` indicates that more stored entries match the query than included in the response. | ||
|
||
A `StoreResponse` without a populated `pagination_cursor` indicates that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I a also wondering if response can include some indication showing that data older than requestedTime is purged or cleaned-up?
Otherwise there is no way for client to know if this was the only data available for that time-period or just that this node is limited due to its configuration.
This would prompt the client to go to another storenode to get additional data.
To filter on content topic, the client MUST populate _both_ the `pubsub_topic` _and_ `content_topics` field. | ||
The client MUST NOT populate either `pubsub_topic` or `content_topics` and leave the other unset. | ||
Both fields MUST either be set or unset. | ||
A mixed content topic filter with just one of either `pubsub_topic` or `content_topics` set, SHOULD be regarded as an invalid request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason for not allowing only pubsubTopic
filter if we are allowing general StoreRequest without contentFilter itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is at least partly because within Waku, the pubsubTopic
is simply a "routing qualifier" for specific content topics. In fact, every content filter scenario that we know about so far (also for Filter protocol) is designed to work on a set of a content topics. However, in order for these content topics to be "fully qualified" (i.e. globally unique) they need to be namespaced within a pubsub topic. All this is a long way of saying the Waku protocols are designed with this implicit assumption that pubsub topic does not (or should not) carry any "content information" so app features shouldn't be built in a way that violate this. For Filter this assumption also simplifies the filter matching quite a bit. Admittedly for Store it will be relatively simple to allow filtering on pubsub topic, but my assumption is that the exact same concept of "content filter" should be applicable (and interchangeable) between Filter and Store.
Of course, open to discuss loosening this requirement for Store if we are convinced about specific use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that for Filter it makes more sense as it is used only by nodes that use light-protocols. But Store queries are used by all types of nodes and was wondering if it helps in some scenarios to have pubsubTopic only filter.
Few scenarios i can think of:
- a Store node only serving a single pubsubTopic comes back online and wants to catch-up. In this case it can start with a history-query with that specific pubsubTopic and then do a sync to ensure it has all the messages. There are alternative ways to sync we can think of for this specific case, but this could be the simplest way.
- A use-case with static sharding e.g like Status where-in they have too many content-topics to specify for a Store-Query and wouldn't it be easier if they use a filter with pubsubTopic only when the node comes online after being offline for some time. But, that may eventually end-up in fine-grain filtering at application/client once all messages are received.
Will think more on this and add if i see any additional scenarios for this specific query.
@@ -16,7 +16,7 @@ This specification explains the `13/WAKU2-STORE` protocol which enables querying | |||
stored by other nodes. | |||
It also supports pagination for more efficient querying of historical messages. | |||
|
|||
**Protocol identifier***: `/vac/waku/store/2.0.0-beta4` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to remove it since Status app already use this protocol, should such deletion follow a deprecation process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will not remove this protocol until all clients have been migrated. In fact, the reason for changing the protocol ID is for us to be able to support both protocols simultaneously, at least for a while. The old RFC will also be kept (clearly indicating that it's a legacy version of the protocol).
content/docs/rfcs/13/README.md
Outdated
To filter on time range, the client MUST set `time_start`, `time_end` or both. | ||
An unset `time_start` SHOULD be interpreted as "from the oldest stored entry". | ||
An unset `time_end` SHOULD be interpreted as "up to the youngest stored entry". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since message will have a unique hash, does it make sense to use second
for timestamp in every places?
content/docs/rfcs/13/README.md
Outdated
optional int64 time_end = 13; | ||
|
||
// List of key criteria for lookup queries | ||
repeated bytes message_hashes = 20 // Message hashes (keys) to lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The params proposed by @chaitanyaprem seems pretty complex, and the message hash based query don't need to depend on pubsub topic and content topics.
It seems we like making the query params splited based on scenarios, why not further use a different protocol id for history query and hash based query? The potential benefits it brings,
- less frequent to upgrade some part of the protocol, which means less migration effort in clients.
- easy to apply fine-grained accessibility control for different scenarios if needed.
- likely no need to upgrade the old protocol id for its old purpose of history query.
- not using any
vac/store/
such overwhelming identifier, instead usewaku/messages
for hash based query,waku/messages/history
for history based query. Also wondering if it's possible to removevac
..
and [deterministic message hash](https://rfc.vac.dev/spec/14/#deterministic-message-hashing) as key. | ||
The store can be queried to return either a set of keys or a set of key-value pairs. | ||
Within the store protocol, Waku message keys and values MUST be represented in a `WakuMessageKeyValue` message. | ||
This message MUST contain the deterministic `message_hash` as key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waku currently doesn't provide any anti-fraud mechanism for store node, which means client must verify the message hash is the one required. Message history query and hash based query better not return message_hash
for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The params proposed by @chaitanyaprem seems pretty complex, and the message hash based query don't need to depend on pubsub topic and content topics.
It seems we like making the query params splited based on scenarios, why not further use a different protocol id for history query and hash based query
Now that you mention and i give it a rethink, rather than making the query complex, we can just have 2 different RPC's in same Store protocol one for messageHashQuery
and one for TimeRange
based Query. That would simplify the protobuf as well. Anyways there is no rule that a protobuf serve only 1 RPC.
I don't think have different protocol ID's for each RPC is a good idea especially since both are different kind of queries (i.e Store Queries).
I had updated my suggestion as well.
the client MUST NOT set any of the content filter fields, | ||
namely `pubsub_topic`, `content_topic`, `time_start`, or `time_end`. | ||
|
||
### Presence queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without anti-fraud, introduce presence of message query seems not valuable and add extra complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way would you say this adds complexity? In fact, this is simply a hash based query and I'm just explaining this as one use case of hash-based queries. Trust assumptions are the same as for any other store query scenario. As for its value - this allows any client to query a store node for presence of a message hash known to the client, which makes it easy for e.g. lightpush clients to verify that their pushed messages have indeed been pushed to the network. Of course, the client would have to "trust" the store node or do redundant random checks to various store nodes to minimise probability of sybil attacks, but that is the case currently for all other store scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For presence queries, a malicious service provider could easily return the requested hash to client and the hash is actually missed to be persistent in the network. To handle the scenario, the client needs to either maintain multiple such service providers, or use a trusted node somehow, this extra process brings the complexity, which should be designed/communicated carefully.
For message hash based query, since client can verify the returned message match with requested hash, such attack is not possible.
For time ranged history query, it's only a convenient method to catch up, client should not trust the integrity of message histories brings with this query, instead client need to check the Merkle root of messages belongs to interested content topic (this is not implemented). And with performant sync protocol in the future (this is the reason why I think sync protocol should not only used between store nodes), there may be less requirements for such history query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since client can verify the returned message match with requested hash, such attack is not possible.
But equally a malicious node may ignore the request forcing the client to resend the message or get confirmation from other store nodes. I agree that the client can also do "full checks" by simply requesting the entire message contents. The idea however is to provide a very lightweight method for clients to build reliability based on the redundancy and trust assumptions that make sense for their use case.
In every scenario there are two options:
- Have some outside information that the node you're using is trustworthy - this is currently the case for all store node implementations and clients like Status Communities plan to use this model for a very long time (Community Owners set up their own trusted Store service nodes). A lightweight confirmation mechanism makes sense for such a use case and this use case is going to be with us for the long term. The protocol should provision for such use case without prescribing for the specific client application context.
- Use redundancy and randomization to prevent Sybil attacks, similar to how gossipsub etc. works. This would be the case once Store service is fully decentralized, if clients choose to use this. For regular redundant checks a lightweight mechanism is necessary (a client shouldn't be expected to retrieve its own published message multiple times from multiple nodes). This doesn't preclude more thorough checks where full message is downloaded and checked against a hash - it just describes one way in which the protocol can be used to build a lightweight "do you know about these set of messages?" mechanism. Think of it as the
hasKey()
check for a key-value store.
this is the reason why I think sync protocol should not only used between store nodes
I agree. The idea is to build this into a truly decentralized reliability layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Community Owners set up their own trusted Store service nodes
This is a good point, and we should explicitly add such statement in the spec. I'm not very sure but my feeling is that use different protocol id even spec id could help make such information more noticeable to users.
But equally a malicious node may ignore the request forcing the client to resend the message or get confirmation from other store nodes.
Yeah, I agree there are more forms of the attack, I'm wondering how to improve or organize the spec to summarize our discussion and make spec readers in the same page.
} | ||
Error error = 4; | ||
message StoreRequest { | ||
string request_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use an integer and/or call it nonce
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not depart unnecessarily from what is an established pattern for our other req-resp protocols
optional uint32 status_code = 10; | ||
optional string status_desc = 11; | ||
|
||
repeated WakuMessageKeyValue messages = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a map
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After coding the nwaku impl. I think that it would be harder to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Especially if only one or the other field is present.
Thank you for all the comments so far. I've addressed some of these in fcc214f Specifically:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm requesting a change on the data type for timestamps because I ran into an interop issue between go-waku and nwaku due to go-waku's protobuffers using int64
and nwaku's impl using zint
, and also because zint
makes more sense according to the comments below
Co-authored-by: richΛrd <info@richardramos.me>
content/docs/rfcs/13/README.md
Outdated
optional bytes message_hash = 1; // Globally unique key for a Waku Message | ||
optional waku.message.v1.WakuMessage message = 2; // Full message content as value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When querying with hashes, there is no way to know which shard a message was sent on. WakuMessage
does not contain the pubsub topic.
We can either add pubsub topic to WakuMessage
or in the WakuMessageKeyValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd vote for it to be part of the WakuMessageKeyValue
so the WakuMessage
protobuffer is not modified!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. We indeed need the pubsub_topic
. @richard-ramos @SionoiS @chaitanyaprem please take a look at 5d69ab5 and see if you agree with the addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Just wondering if we should return pubsubTopic or cluster and shard since anyways named sharding would be deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a strong opinion about this, but when we previously discussed this the intuition was:
- Waku core protocols still operate on the idea of pubsub-topic (i.e. string) defined subnetworks, as a reflection of its libp2p base
- the string formatting we provide for cluster and shard is run on top of this as a type of application, but it does not change the fundamental network definition
Of course, part of keeping it string
is to remain consistent with filter, relay, lightpush for now too and avoid the potential inconsistency in cross-protocol composition. Perhaps if we do upgrade protocols to cluster-shard we should do all protocols at once (i.e. not now... :) )?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, better to upgrade all protos in one shot rather than partly. This will avoid confusion as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This blocks the implementation of Waku sync.
message ContentFilter { | ||
string contentTopic = 1; | ||
} | ||
message WakuMessageKeyValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe naming it WakuMessageExt
?
Co-authored-by: Danish Arora <35004822+danisharora099@users.noreply.github.com>
PagingInfo pagingInfo = 4; | ||
// Full message content and associated pubsub_topic as value | ||
optional waku.message.v1.WakuMessage message = 2; | ||
optional string pubsub_topic = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this indeed be optional
? what happens in case no pubsub topic is provided to the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not optional for content-based queries. The optional
keyword in the protobuf simply means that clients can test if the field has been explicitly set or not (a requirement to distinguish unset fields from fields simply having the default value) - whether or not a field is mandatory is determined in the protocol specification, not the protobuf. See for example these rules for how pubsub_topic
should be used for content queries: https://github.com/vacp2p/rfc/pull/665/files#diff-98f234d0b94fb690d628e58fe8a5b642c873123819753d1044963cf524931a6dR195-R198
Closing this draft, as the new Store protocol specification has been merged here: https://github.com/waku-org/specs/blob/master/standards/core/store.md |
Relates to waku-org/research#81
Note that this PR will not be merged. It is opened as a convenient way to review the new Store v3 protocol by comparing it to the current Store v2 protocol.
This PR builds on ideas first proposed in https://github.com/waku-org/waku-proto/pull/10/files
Background
This PR introduces a proposal for a new Store protocol design, termed "Store v3" to differentiate it from the current Store "v2" protocol with protocol ID
2.0.0-beta4
.This is a significant redesign and not backwards compatible with previous store versions.
This redesign has at least three purposes:
What to review