-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symlink Metadata and Double Packing Avoidance #256
Conversation
Codecov Report
@@ Coverage Diff @@
## main #256 +/- ##
==========================================
+ Coverage 60.13% 60.40% +0.27%
==========================================
Files 36 36
Lines 2744 2743 -1
Branches 677 677
==========================================
+ Hits 1650 1657 +7
+ Misses 631 622 -9
- Partials 463 464 +1
|
impl From<&NodeType> for String { | ||
fn from(r#type: &NodeType) -> Self { | ||
match r#type { | ||
NodeType::PrivateDirectory => "wnfs/priv/dir".into(), | ||
NodeType::PrivateFile => "wnfs/priv/file".into(), | ||
NodeType::PublicDirectory => "wnfs/pub/dir".into(), | ||
NodeType::PublicFile => "wnfs/pub/file".into(), | ||
NodeType::TemporalSharePointer => "wnfs/share/temporal".into(), | ||
NodeType::SnapshotSharePointer => "wnfs/share/snapshot".into(), | ||
} | ||
r#type.to_string() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
/// Create a new Symlink PrivateFile | ||
pub async fn new_symlink( | ||
path: String, | ||
parent_bare_name: Namefilter, | ||
time: DateTime<Utc>, | ||
rng: &mut impl RngCore, | ||
) -> Result<Self> { | ||
// Header stays the same | ||
let header = PrivateNodeHeader::new(parent_bare_name, rng); | ||
// Symlinks have no file content | ||
let content = FileContent::Inline { data: vec![] }; | ||
// Create a new Metadata object | ||
let mut metadata: Metadata = Metadata::new(time); | ||
// Write the original path into the Metadata HashMap | ||
metadata | ||
.0 | ||
.insert(String::from("symlink"), Ipld::String(path)); | ||
// Return self with PrivateFileContent | ||
Ok(Self { | ||
header, | ||
content: PrivateFileContent { | ||
persisted_as: OnceCell::new(), | ||
metadata, | ||
previous: BTreeSet::new(), | ||
content, | ||
}, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I know you must have thought of making it a new node type. Is there a reason you chose this approach instead?
Thanks for taking on this. We were going to work on this at some point but the spec is not there yet. For private file system in particular it would be nice to have a symlink that would not expose the secret information (the entire header in this case) of the target node. @matheus23 has more ideas about how this should work. He is on vacation now but he is surely going to comment here when he is back. I see this PR reviving the discussion on the topic and could be a base for further work. |
async fn attach_link( | ||
self: &mut Rc<Self>, | ||
node: PrivateNode, | ||
path_segments: &[String], | ||
search_latest: bool, | ||
forest: &mut Rc<PrivateForest>, | ||
store: &impl BlockStore, | ||
) -> Result<()> { | ||
let (path, node_name) = crate::utils::split_last(path_segments)?; | ||
let SearchResult::Found(dir) = self.get_leaf_dir_mut(path, search_latest, forest, store).await? else { | ||
bail!(FsError::NotFound); | ||
}; | ||
|
||
ensure!( | ||
!dir.content.entries.contains_key(node_name), | ||
FsError::FileAlreadyExists | ||
); | ||
|
||
dir.content | ||
.entries | ||
.insert(node_name.clone(), PrivateLink::from(node)); | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting use-case. So the in-memory tree structure might be good enough for fast copies but not updating the ancestry will leak secrets about unrelated directories. I'm trying to think of a way to handle this use-case. Will need to meet with @matheus23 for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...
I want to point out that this does something to the way that permissions work in WNFS:
In WNFS, each PrivateNode
that's stored has a bunch of inumber
s in its accumulator (the Namefilter
in this version).
One can think of each inumber
as a "path segment" in a path.
You can prove (to a third party) write access to a certain PrivateNode
by providing a signature of an inumber
, signed by the "root owner" of a WNFS.
If you simply copy over a PrivateRef
from one place to another, you won't be updating the whole Namefilter
structure (this is what the attach
logic is doing!).
Concretely this means:
If you have a file at /Doc/Test.png
and you symlink it to /Images/Test.png
, then someone who has write access to /Images
won't be able to modify Test.png
! (Read access will work just fine)
This breaks an invariant in WNFS: That if you have write access to a directory, you'll have write access to its contents (recursively).
I guess in theory you could fix this by including both /Doc
's and /Images
' inumber
s in the Test.png
PrivateNode
.
Can we go back to the drawing board on this? That'd probably be the "right" way of solving deduplication in rs-wnfs (at least on the file level). LMK if that makes sense to you @organizedgrime and let's brainstorm a bit about how this could look like API-wise (probably helpful if we do it live).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in theory you could fix this by including both
/Doc
's and/Images
'inumber
s in theTest.png
PrivateNode
.
I thought about this again and now I think this doesn't work (with name accumulators).
About the symlink let file = root.open_mut(...).await?;
file.get_metadata_mut().put("symlink", Ipld::(...)); (uh fuck we're still missing a I hope you understand that I'm somewhat hesitant to merge a For now: Let's meet live & talk about this again! |
Let me throw out another proposal: With Would you be fine if the deduplication only happened on the file content level? I.e. you'd get efficient deduplication if there's two huge files in the tree you're persisting, but you wouldn't get deduplication when there's two directories with exactly equivalent content? If that's the case, I'd propose a function (If This trades off some metadata leakage (you can see that some blocks in the private forest are linked to from multiple places, perhaps giving you an idea of a file's size, e.g. if you expect this to only happen to a single file) for performance / storage. |
This would be a more short-term thing that I'd be more happy in merging until we have a symlink specification that covers a bunch of use-cases. |
Ok, coming out of the meeting it sounds like this
can be assumed for banyan's stuff atm. Which makes me want to basically say: How about we do the |
@organizedgrime thoughts on the idea in this comment? #256 (comment) |
Having thought about this again, I wouldn't recommend my suggestion anymore. We could try solving this in a layer slightly below the forest, but that'd need a spec extension. Actually it'd be quite similar to the approach used in this PR, so perhaps we revive it. In that case I'd prefer to keep it as a special case for files only, since there's no hierarchy for delegate write access from these files. |
This happened: wnfs-wg/spec#71 Implementation is happening in #306, see this bullet point in the description:
Once that PR lands, we can implement (very efficient) copying of |
Closing in favor of #319 |
Enclosed in this pull request are two relatively trivial changes to PrivateDirectory and PrivateFile which allow us to get more out of our rs-wnfs use case.
attach
andcp
functions withinPrivateDirectory
now haveattach_link
andcp_link
versions. The issue we encountered while implementing this tool in Tomb is that despite our being able to correctly identify when files are duplicates of one another, copying them using thecp
function still creates a newPrivateNode
in thePrivateForest
which, when encrypted and sent to theBlockStore
employed, does not result in CID collisions and results in the double packing of the same file on disk. This is because thecp
function described updates the ancestry and rotates the key associated with thePrivateNode
being copied. This is not something we want to do, we want these CID collisions to occur in our use case so at to minimize our disk footprint. Thus,attach_link
andcp_link
were made as lightweight versions of the original function which maintain ancestry and keys across this linking/copying.write_symlink
function inPrivateDirectory
andsymlink_origin
function inPrivateFile
are also new. They create an interface for writing aPrivateFile
to aPrivateDirectory
where the contents of the file itself are empty, but encoded in theMetadata
is information, a singleString
value associated with the key "symlink".write_symlink
provides an interface for creating suchPrivateFile
s,symlink_origin
provides an interface for retrieving anOption<String>
for a givenPrivateFile
, which will either be the symlink path it points to, or nothing at all if it does not contain saidMetadata
.Strictly speaking, if any of these changes are unwelcome we can implement them ourselves using a trait that is applied to the
PrivateDirectory
structure.