Current chunking: content-aware? de-duped?

Three questions about the current/stable dat client and ecosystem library behavior:

1. are imported files chunked in a "content-aware" way, eg using Rabin fingerprinting? I've seen mention of this in the past (eg, https://blog.datproject.org/2016/02/01/dat-1-0-is-ready/), and I see https://github.com/datproject/rabin, but quick queries of the hyperdrive code base don't turn anything up.
2. does hyperdrive handle full-file de-duplication? Eg, if the same file is added under different names, or a file is added and removed, will the metadata feed point to the same blocks in the data feed?
3. does hyperdrive handle partial-file de-duplication? Eg, if a long .csv file has text changed in the middle (not changing the chunking or overall length), will only the mutated chunk get appended to the data feed? The current metadata implementation seems to be "chunk offset and length" based, so i'm not sure how a sparse set of chunks would be found.

Questions 1+2 are just curiosity about current behavior; I don't see anything in the spec that would prevent clients for implementing these optimizations in the future. Question 3 comes after working on an implementation; maybe I need to go back and re-read the spec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Current chunking: content-aware? de-duped? #77

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Current chunking: content-aware? de-duped? #77

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions