Releases: ropensci/targets
Cloud metadata and settings to reduce overhead
targets 1.3.0
Invalidating changes
Because of these changes, upgrading to this version of targets
will unavoidably invalidate previously built targets in existing pipelines. Your pipeline code should still work, but any targets you ran before will most likely need to rerun after the upgrade.
- In the
hash_deps()
method of the metadata class, exclude symbols which are not actually dependencies, rather than just giving them empty strings. This change decouples the dependency hash from the hash of the target's command (#1108).
Cloud metadata
- Continuously upload metadata files to the cloud during
tar_make()
,tar_make_clustermq()
, andtar_make_future()
(#1109). Upload them to the repository specified in therepository_meta
tar_option_set()
option, and use the bucket and prefix set in theresources
tar_option_set()
option.repository_meta
defaults to the existingrepository
tar_option_set()
option. - Add new functions
tar_meta_download()
,tar_meta_upload()
,tar_meta_sync()
, andtar_meta_delete()
to directly manage cloud metadata outside the pipeline (#1109).
Other changes
- Fix solution of #1103 so the copy fallback actually runs (@jds485, #1102, #1103).
- Switch back to
tempdir()
for #1103. - Move
path_scratch_dir_network()
tofile.path(tempdir(), "targets")
and make suretar_destroy("all")
andtar_destroy("cloud")
delete it. - Display
tar_mermaid()
subgraphs with transparent fills and black borders. - Allow
database$get_data()
to work with list columns. - Disallow functions that access the local data store (including metadata) from inside a target while the pipeline is running (#1055, #1063). The only exception to this is local file targets such as
tarchetypes
literate programming target factories liketar_render()
andtar_quarto()
. - In the
hash_deps()
method of the metadata class, use a new customsort_chr()
function which temporarily sets theLC_COLLATE
locale to"C"
for sorting. This ensures lexicographic comparisons are consistent across platforms (#1108). - In
tar_source()
, use thefile
argument andkeep.source = TRUE
to help with interactive debugging (#1120). - Deprecated
seconds_interval
intar_config_get()
,tar_make()
,tar_make_clustermq()
andtar_make_future()
. Replace it withseconds_meta
(to control how often metadata gets saved) andseconds_reporter
(to control how often to print messages to the R console) (#1119). - Respect
seconds_meta
andseconds_reporter
for writing metadata and console messages even for currently building targets (#1055). - Retry all cloud REST API calls with HTTP error codes (429, 500-599) with the exponential backoff algorithm from
googleAuthR
(#1112). - For
format = "url"
, only retry on the HTTP error codes above. - Make cloud temp file instances unique in order to avoid file conflicts with the same target.
- Un-deprecate
seconds_interval
andseconds_timeout
fromtar_resources_url()
, and implementmax_tries
arguments intar_resources_aws()
andtar_resources_gcp()
(#1127). - Use
file
andkeep.source
inparse()
incallr
utils and target Markdown. - Automatically convert
"file_fast"
format to"file"
format for cloud targets. - In
tar_prune()
andtar_delete()
, do not try to delete pattern targets which have no cloud storage. - Add new arguments
seconds_timeout
,close_connection
,s3_force_path_style
totar_resources_aws()
to support the analogous arguments inpaws.storage::s3()
(#1134, @snowpong).
CRAN patch
- Fix a documentation issue in an Rd file.
Storage improvements
targets 1.2.1
- Add
tar_prune_list()
(#1090, @mglev1n). - Wrap
file.rename()
intryCatch()
and fall back on a copy-then-remove workaround (@jds485, #1102, #1103). - Stage temporary cloud upload/download files in
tools::R_user_dir(package = "targets", which = "cache")
instead oftempdir()
.tar_destroy(destroy = "cloud")
andtar_destroy(destroy = "all")
remove any leftover files from failed uploads/downloads (@jds485, #1102, #1103). - Use
paws.storage
instead of all ofpaws
.
Improved crew integration
targets 1.2.0
crew
integration
- Do not assume S3 classes when validating
crew
controllers. - Suggest a crew controller in the
_targets.R
file fromuse_targets()
. - Make
tar_crew()
compatible withcrew
>= 0.3.0. - Rename argument
terminate
toterminate_controller
intar_make()
. - Add argument
use_crew
intar_make()
and add an option intar_config_set()
to make it configurable. - Write progress data and metadata in
target_prepare()
.
Other improvements
CRAN patch 3
targets 1.1.3
- Decide on
nanonext
usage intime_seconds_local()
at runtime and not installation time. That way, ifnanonext
is removed aftertargets
is installed, functions intargets
still work. Fixes the CRAN issues seen intarchetypes
,jagstargets
, andgittargets
.
Remarks
R CMD check shows a NOTE with messages such as "#STDOFF 2:05:08.9". This is caused by an issue in the arrow
package (apache/arrow#35594) which is in "Suggests:" in the DESCRIPTION file of targets
. The NOTE will go away on its own when the next arrow
is released to CRAN.
CRAN patch 2
targets 1.1.2
- Remove
crew
-related startup messages.
Remarks
R CMD check shows a NOTE with messages such as "#STDOFF 2:05:08.9". This is caused by an issue in the arrow
package (apache/arrow#35594) which is in "Suggests:" in the DESCRIPTION file of targets
. The NOTE will go away on its own when the next arrow
is released to CRAN.
CRAN patch
targets 1.1.1
- Pre-compute
cli
colors and bullets to improve performance in RStudio. - Use
packageStartupMessage()
for package startup messages.
Remarks
R CMD check shows a NOTE with messages such as "#STDOFF 2:05:08.9". This is caused by apache/arrow#35594 because the arrow
package is in "Suggests", in the DESCRIPTION file of targets
. The NOTE will go away on its own when the next arrow
is released to CRAN.
Major improvements to robustness, speed, and {crew} integration
targets 1.1.0
Bug fixes
- Send targets to the appropriate controller in a controller group when
crew
is used.
General improvements
- Call
gc()
more appropriately whengarbage_collection
isTRUE
intar_target()
. - Add
garbage_collection
arguments totar_make()
,tar_make_clustermq()
, andtar_make_future()
to add optional garbage collection before targets are sent to workers. This is different and independent from thegarbage_collection
argument oftar_target()
. In high-performance computing scenarios, the former controls what happens on the main controlling process, whereas the latter controls what happens on the worker. - Add
garbage_collection
andseconds_interval
arguments totar_make()
,tar_make_clustermq()
,tar_make_future()
, andtar_config_set()
. - Downsize the
tar_runtime
object. - Remove the 100 Kb file size cutoff for determining whether to trust the file timestamp or recompute the hash when checking if a file is up to date (#1062). Instate the
"file_fast"
format and thetrust_object_timestamps
option intar_option_set()
as safer alternatives. - Consolidate store constructors.
- Allow
crew
controller groups (#1065, @mglev1n). - Expose more exponential backoff configuration parameters through
tar_backoff()
. Thebackoff
argument oftar_option_set()
now accepts output fromtar_backoff()
, and supplying a numeric is deprecated. - Fix the exponential backoff rules in the
crew
scheduling algorithm. - Implement
tar_resources_network()
to configure retries and timeouts for internal HTTP/HTTPS requests in specialized targets withformat = "url"
,repository = "aws"
, andrepository = "gcp"
. Also applies to syncing target files across network file systems in the case ofstorage = "worker"
orformat = "file"
, which previously had a hard-codedseconds_interval = 0.1
andseconds_timeout = 60
. - Deprecate
seconds_interval
andseconds_timeout
intar_resources_url()
in favor of the new equivalent arguments oftar_resources_network()
- Safely withhold a target from its
crew
controller when the controller is saturated (#1074, @mglev1n). - Use exponential backoff when appending a target back to the queue in the case of a saturated
crew
controller.
Speedups
- Cache info about all of
_targets/objects/
intar_callr_inner_try()
and update the cache as targets are saved to_targets/objects/
to avoid the overhead of repeated calls tofile.exists()
andfile.info()
(#1056). - Trust the timestamps by default when checking whether files in
_targets/objects/
are up to date (#1062).tar_option_set(trust_object_timestamps = FALSE)
ignores the timestamps and recomputes the hashes. - Write to
_targets/meta/meta
and_targets/meta/progress
in timed batches instead of line by line (#1055). - Reporters now print progress messages in timed batches instead of line by line (#1055).
- The summary and forecast reporters are much faster because they avoid going through data frames.
- Avoid
tempfile()
when working with the scratch directory. - Use
nanonext::mclock()
instead ofproc.time()
when there is no risk of forked processes. - Replace
withr
with slightly faster/leaner base R alternatives. - Efficiently catch changes to the working directory instead of overburdening the pipeline with calls to
setwd()
(#1057). - Invoke
tar_options
methods in the internals instead oftar_option_get()
. - Avoid
gsub()
instore_init()
. - Avoid repeated calls to
meta$get_record()
inbuilder_should_run()
. - Mock the store object when creating a record from a metadata row.
- Avoid
cli::col_none()
to reduce the number of ANSI characters printed to the R console.
Integration with {crew}
targets 1.0.0
targets
is moving to version 1.0.0 because it is significantly more mature than previous versions. Specifically,
tar_make()
now integrates withcrew
, which will significantly improve the waytargets
does high-performance computing going forward.- All other functionality in
targets
has stabilized. There is still room for smaller new features, but none as large ascrew
integration, none that will fundamentally change how the package operates.
Major improvements
- Support distributed computing through the
crew
package intar_make()
(#753).crew
itself is still in its early stages and currently lacks the launcher plugins to match theclustermq
andfuture
backends, but long-term,crew
will be the predominant high-performance computing backend.
Minor improvements
- Add a new
store_copy_object()
to the store class to enable"fst_dt"
and other formats to make deep copies when needed (#1041, @MilesMcBain). - Add a new
copy
argument to allowtar_format()
formats to set thestore_copy_object()
method (#1041, @MilesMcBain). - Shorten the output string returned by
tar_format()
when default methods are used. - Add a
change_directory
argument totar_source()
(#1040, @dipterix). - In
format = "url"
targets, implement retries and timeouts when connecting to URLs. The default timeout is 10 seconds, and the default retry interval is 1 second. Both are configurable viatar_resources_url()
(#1048). - Use
parallelly::freePort()
intar_random_port()
. - Rename a target and a function in the
tar_script()
example pipeline (#1033, @b-rodrigues). - Edit the description.
CRAN patch
targets 0.14.3
- Handle encoding errors while trying to process error and warning messages (#1019, @adrian-quintario).
- Fix S3 generic/method consistency.