Replies: 1 comment
-
Hey @borestad thanks for bringing this up and sorry for the delayed response. The use case you describe makes total sense, but I'm honestly unsure how I feel about adding support for it. First off, I worry that we could end up coming up with use cases for all sorts of different signals and end up with a heavily polluted CLI with lots of knobs most callers don't care about (and presumably a larger / slower To the specific question of file contents / hashes as a caching signal, one concrete concern I have is overhead. Checking a file's modtime is (fairly) cheap and independent of the file's size or other metadata. Whereas computing a hash is O(n) and, depending on the chosen algorithm, potentially quite slow. Granted, it'd be faster to do it in-process than to shell-out, but it's plausible computing the hashes could overshadow the subprocess overhead for sufficiently large files. Even if we added an optimization like only recomputing on modtime changes, presumably in practice such files would be expected to be modified-but-not-changed fairly frequently (otherwise why bother checking the contents) so I expect this would require an O(n) pass over the contents on many / most invocations. I wonder if it would be possible in your use-case to either a) not update the file unless it's actually (expected to be) being changed or b) re-compute the hash on write and then update a separate file only when it changes, then having Curious for your thoughts 🙂 |
Beta Was this translation helpful? Give feedback.
-
Hi @dimo414 !
In some of my pipelines, I can't rely on the modtime, since the files may be updated anytime (but contain same or different content/shasum)
My proposal is to create an alternative to
--modtime
to be used within a scope.Example of how todays mechanism works
I'd like the last command to not be run unless the shasum actually changed (without creating something "subshell hackish" like the discussion below)
Similar? #26
Beta Was this translation helpful? Give feedback.
All reactions