Skip to content
This repository was archived by the owner on Dec 29, 2021. It is now read-only.
This repository was archived by the owner on Dec 29, 2021. It is now read-only.

Current chunking: content-aware? de-duped? #77

@bnewbold

Description

@bnewbold

Three questions about the current/stable dat client and ecosystem library behavior:

  1. are imported files chunked in a "content-aware" way, eg using Rabin fingerprinting? I've seen mention of this in the past (eg, https://blog.datproject.org/2016/02/01/dat-1-0-is-ready/), and I see https://github.com/datproject/rabin, but quick queries of the hyperdrive code base don't turn anything up.
  2. does hyperdrive handle full-file de-duplication? Eg, if the same file is added under different names, or a file is added and removed, will the metadata feed point to the same blocks in the data feed?
  3. does hyperdrive handle partial-file de-duplication? Eg, if a long .csv file has text changed in the middle (not changing the chunking or overall length), will only the mutated chunk get appended to the data feed? The current metadata implementation seems to be "chunk offset and length" based, so i'm not sure how a sparse set of chunks would be found.

Questions 1+2 are just curiosity about current behavior; I don't see anything in the spec that would prevent clients for implementing these optimizations in the future. Question 3 comes after working on an implementation; maybe I need to go back and re-read the spec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions