Replies: 3 comments 3 replies
-
Regarding compactness of proofs and the extractor, it is important to highlight that these two are contradicting requirements, at least in their practical forms. It is easy to see that the accumulated amount of proofs needed to actually reconstruct the data is at least the same as the size of the actual data. Thus, the more compact the proofs, the more proofs are required, somewhat eliminating the very reason why proofs are compacted in the first place. |
Beta Was this translation helpful? Give feedback.
-
Another aspect that is worth mentioning that we might want to incorporate into the write up above is recursive proof aggregation. What I mean by that is, revealing proofs (sending over the network) and generating proofs, aren't necessarily the same thing. We might want the storage nodes generate proofs locally on an ongoing basis and aggregating this into a single proof that will be then sent over the network on request. This strengthens the proving mechanism in two ways 1) it allows to retroactively detect downtime and failures 2) it makes it harder to outsource storing of the data on third parties. This then raises a third requirement - recursive aggregatable proofs.
|
Beta Was this translation helpful? Give feedback.
-
Tagging @staheri14 as per previous discussion |
Beta Was this translation helpful? Give feedback.
-
Dagger remote auditing compression and aggregation
Introduction
The current remote auditing scheme in Dagger is based on the Shacham and Waters Compact Proofs of Retrievability. This scheme relies on bilinear pairings and interactive verification that we extend with a globally verifiable random oracle (a.k.a blockchain) to make it non-interactive. The scheme is public, in the sense that anyone can challenge and verify the prover with publicly available data produced during the setup phase of the scheme.
High level scheme requirements
The basic requirements of our remote auditing scheme are:
The S&W PoR scheme gives us 1, 2, 3 and 5 however, the public parameters (a.k.a metadata) size is linear in relation to the size of the dataset and block size. Proofs are also large, in the order of KBs - one block worth of aggregated data plus one aggregated tag.
S&W high level overview
The goals of the current S&W scheme is to determine if the dataset is still recoverable. The "Retrievability" in the "Proofs of Retrievability" indicates that the scheme aims to detect wether the data is still retrievable in it's entirety - the data is still recoverable, or wether it isn't - the data has been lost. Another aspect of the scheme, is that it guarantees data can be retrieved with enough queries (extractability) because the proof contains the queried data in aggregated form.
It accomplishes this with the following steps:
Whats important to understand about the scheme is that it sends the data being verified (albeit in aggregated form) as part of the proof. This is important because it protects against potential per-image attacks.
Strictly speaking, the PoR scheme aims at both recoverability and retrievability. The proofs should allow detecting if the available data is enough to reconstruct the original dataset, this is related to the erasure coding briefly mentioned in the setup phase and it also allows for retrievability, where after enough queries, the total aggregated data should be enough to reconstruct the original. The later property is of lesser importance for our particular use case, however the former is of upmost importance.
A simpler way of interpreting the described property, is that the scheme should prevent pre-generating a number of proofs in the future and forgetting the original data.
Compression and aggregation requirements
What to aggregate and compress?
The first question to ask is what needs to be aggregated and compressed?
The two most obvious pieces are the authenticators and the proofs. It would be desirable to store the authenticators in aggregated form (some vector and/or polynomial commitment schemes seem to support this), because they are linear with respect to the size of both the dataset and the number of blocks it is composed of.
At first glance, it would seem that the proofs are already at least partially aggregated, both with regards to the authenticators and the sent data, however even in it's current aggregated form, the proofs are massive, in the order of KBs, this isn't scalable across millions of datasets, or even just a few and specially when interfacing with the blockchain, thus further compression and aggregation is required.
Can we get rid of the aggregated data sent as part of the proof?
As mentioned before, the aggregated data has two functions:
We don't care about extractability in this context, so we can safely ignore it, however we do need the security guarantees of the first property.
Requirements
The basic requirement for the proof generation is to produce a succinct proof that guarantees (w.h.p.) that the original data was involved in the computation of the proof.
The parts that require compression and aggregation thus are:
Where to aggregate?
Another question is where aggregation should happen? In the case of the authenticators it is safe to say that the best place is during the setup phase and it might very well be implicit in it once we move away from the standard BLS based S&W scheme.
In the case of the proof, it is a bit more ambiguous. Aggregation and compression can take place either on the prover or in a special "aggregator" node that will perform the aggregation. A potentially naive way might imply taking the current BLS based scheme verbatim and running it as a SNARK either on the prover side or sending the aggregated data and tag to the special aggregator node that will execute the SNARK.
It might seem at first, that the aggregator role is redundant and everything can be performed on the prover side, however consider that many nodes are producing several proofs constantly and this need to be eventually committed to some persistence layer, very likely an external blockchain. This makes further compression and aggregation a requirement. In both cases, wether the prover produces the standard BLS proofs and sends them to the aggregator for further processing or the prover executes the SNARK locally and sends the result of the execution to the aggregator for further compression and aggregation an aggregator role seems to be required.
Recursive aggregatable proofs
Another important requirement is recursive aggregatable proofs.
Revealing proofs (sending over the network) and generating proofs, aren't necessarily the same thing. We might want the storage nodes generate proofs locally on an ongoing basis and aggregating this into a single proof that will be then sent over the network on request.
This strengthens the proving mechanism in two ways 1) it allows to retroactively detect downtime and failures 2) it makes it harder to outsource storing of the data on third parties.
This then raises a third requirement - recursive aggregatable proofs.
c
takes proofb
as input, which takes proofa
as inputIntuitions and research directions
There are several intuition and potential research directions to explore. For example, would it possible to aggregate the authenticators into a commitment to overcome its linear size requirements? Another aspect to look at is what is the best construct that would guarantee that a proof was generated on request using the original data - this can potentially be a SNARK or some other ZK system.
So the potential topics to explore are:
Beta Was this translation helpful? Give feedback.
All reactions