[general] data.table in-place modifications can lead to reproducibility failure in targets pipline #1041
Closed
MilesMcBain
started this conversation in
General
Replies: 2 comments 6 replies
-
See also a similar discussion in the future package. https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html |
Beta Was this translation helpful? Give feedback.
5 replies
-
Thanks Will. Amazing turn-around on an improvement as always. 🥇 So to summarise, my options are now:
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Help
Description
I've been mixing {data.table} and {targets} and the flow on effects of in-place modifications in the context of targets pipeline have caught me out a couple of times.
Put simply: In-place modifications of a data.table
target A
within the context of anothertarget B
leads to changes that persist intarget A
so long astarget A
is held in memory. If thetarget A
is later read from cache, the changes made duringtarget B
are not present. This can lead to inconsistent results in downstream computations.There are workarounds:
I am also wondering if {targets} could have some safeties that would detect or mitigate this situation. For example, targets could verify target dependency hashes before computation on a per-target basis. If a difference is detected, it could either warn and reload, or error.
Here's an example of the problem:
Created on 2023-03-29 with reprex v2.0.2
Beta Was this translation helpful? Give feedback.
All reactions