RW: add a new function "allocate" which reserves some space for a new file #34

hannesm · 2022-10-11T10:06:44Z

This is especially useful for append-only stores and set_partial.

hannesm · 2022-10-11T10:23:31Z

I guess there's still the question how the API should look like. The use case is that one task does a HTTP download and save it to disk -- the size of the artifact is extracted from the HTTP header. While that download is in progress, the archive should not be served to a client (otherwise that would be corrupt). So, either the application itself needs to keep a set of keys not to hand out to clients, or there should be another piece of API that resembles the application to say "I'm finished with writing this artifact, it can now be used in the wild".

Maybe it can be as simple as the left return value is a function unit -> unit, and the KV implementation can then do the necessary bookkeeping to mark the thing as active (in the tar implementation, it'd be adding the key to the map). WDYT?

palainp · 2022-10-13T14:01:14Z

This sounds interesting for that use case! Concerning the API, it really seems to be like int posix_fallocate(int fd, off_t offset, off_t len); which uses an extra offset (that will be 0 in ocaml-tar if I understand correctly). In order not to cut off any future use, would it be possible to add something like this offset?

reynir · 2022-10-17T14:01:53Z

I think allocate is interesting. A question comes to my mind: what is the file initialized with? Is it zeroes? Whatever happens to be there already? Is reading just not allowed? Is it implementation defined? Undefined?

Another function I think we can implement in tar would be a streaming write:

type writer = (Cstruct.t -> (unit, write_error) result Lwt.t) -> unit Lwt.t
val write_stream : t -> key -> writer -> (unit, write_error) result Lwt.t
(** [write_stream t key writer] calls [writer] with a function that writes a file incrementally.
    The file will be committed once [writer] returns. *)

The return type of writer could be changed to e.g. [`Commit | `Abort] Lwt.t to support aborting a write. In tar this would be implemented by delaying writing the header and the first block of the file until writer returns. Writing other files is not possible meanwhile in the case of tar.

While we're adding to Mirage_kv.write_error I think an error signalling that the file can't grow is interesting (maybe it is unsupported by the filesystem, e.g. tar in most cases).

I think it is worth keeping in mind that we are allowed to add functions in implementations without adding them in Mirage_kv.RO and Mirage_kv.RW, and maybe a good step forward is to experiment in implementations what interface makes the most sense.

hannesm · 2022-12-01T12:39:28Z

While I agree that implementations can extend the interface, I find an allocate function very useful.

The underlying semantics are a bit tricky, on the one hand with tar in mind, I'd like a HTTP download to directly push data to disk, on the other hand I'd like only fully checksummed data to be present on disk. This means, an allocate should write the header (but maybe use .part as filename) - and simultaneous allocate(s) are possible - but only when finished the header is for real written (i.e. with the right fileame), and reading is only then possible. If the power fails during any point, we'll be stuck with a broken / bad part on disk (that fortunately doesn't have the expected filename). Then again, inventing filenames isn't good as well, since a user may want to have both foo.part and foo to exist. So, eventually the allocate should simply write the header and fill the content with 0, and the application (opam-mirror) should after a successful download use rename to rename the intermediate file to the real filename.

WDYT? So (a) implement rename in tar (b) add as @palainp suggested an ?offset: to allocate (c) remove the finalise / writer complications?

Together with #37 I'd like to get this into the next major version of mirage-kv.

palainp · 2022-12-02T14:52:20Z

Reading back the documentation I'm now not sure about the ?offset:, as we have If [key] already exists, [Error (Already_present key)] is returned. I can't see any scenario where offset should be non-zero.

The man page for posix_fallocate indicates that subsequent writes to bytes in the specified range (note: [offset ; offset+len[ ) are guaranteed not to fail because of lack of disk space. but to me, using only on new files does make sense only when offset=0 :)

…last modified timestamp) for a new file. This is especially useful for append-only stores and set_partial.

hannesm · 2022-12-03T09:38:57Z

I updated this PR: rebased onto the main branch, and used an Int63.t for the size. I added a second commit to use Ptime.t instead of the int * int64. I'd welcome reviews (it seems at least @reynir @dinosaure @palainp may be interested, maybe @yomimono as well -- as well as @mirage/core)

hannesm · 2022-12-12T09:28:55Z

any opinion/approval of this PR?

CHANGES: * Use ptime directly for RO.last_modified, instead of the int * int64 pair (mirage/mirage-kv#34) * Add RW.allocate to allocate a key and fill it with zero bytes (mirage/mirage-kv#34) * RO.list: return Key.t instead of string (mirage/mirage-kv#37, fixes mirage/mirage-kv#33) * Introduce a custom error for RW.rename with a source which is a prefix of destination (mirage/mirage-kv#37, fixes mirage/mirage-kv#31) * Use Optint.Int63.t for RO.size, RO.get_partial, RO.set_partial (mirage/mirage-kv#37, fixes mirage/mirage-kv#32) * Remove RW.batch (mirage/mirage-kv#37, fixes mirage/mirage-kv#29 mirage/mirage-kv#36, discussed at the MirageOS meeting in November, and on the mirageos-devel mailing list in January 2022) * Key.pp: escape the entire string, not individual fragments (mirage/mirage-kv#35)

hannesm force-pushed the add-allocate branch from 8695f11 to 5717c25 Compare December 3, 2022 09:36

hannesm added 2 commits December 3, 2022 10:36

RW: add a new function "allocate" which reserves some bytes (and the …

e47bce9

…last modified timestamp) for a new file. This is especially useful for append-only stores and set_partial.

use ptime directly

9161b54

hannesm force-pushed the add-allocate branch from 5717c25 to 9161b54 Compare December 3, 2022 09:37

palainp approved these changes Dec 12, 2022

View reviewed changes

reynir approved these changes Dec 12, 2022

View reviewed changes

hannesm merged commit 05c581d into mirage:main Dec 12, 2022

hannesm deleted the add-allocate branch December 12, 2022 11:11

hannesm mentioned this pull request Dec 12, 2022

[new release] mirage-kv (6.0.0) ocaml/opam-repository#22648

Merged

cuihtlauac mentioned this pull request Dec 16, 2022

Repair build broken by mirage-kv 6.0.0 ocaml/ocaml.org#713

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RW: add a new function "allocate" which reserves some space for a new file #34

RW: add a new function "allocate" which reserves some space for a new file #34

hannesm commented Oct 11, 2022

hannesm commented Oct 11, 2022

palainp commented Oct 13, 2022

reynir commented Oct 17, 2022

hannesm commented Dec 1, 2022

palainp commented Dec 2, 2022

hannesm commented Dec 3, 2022

hannesm commented Dec 12, 2022

RW: add a new function "allocate" which reserves some space for a new file #34

RW: add a new function "allocate" which reserves some space for a new file #34

Conversation

hannesm commented Oct 11, 2022

hannesm commented Oct 11, 2022

palainp commented Oct 13, 2022

reynir commented Oct 17, 2022

hannesm commented Dec 1, 2022

palainp commented Dec 2, 2022

hannesm commented Dec 3, 2022

hannesm commented Dec 12, 2022