Skip to content

Releases: ropensci/targets

Cloud metadata and settings to reduce overhead

11 Sep 17:56
Compare
Choose a tag to compare

targets 1.3.0

Invalidating changes

Because of these changes, upgrading to this version of targets will unavoidably invalidate previously built targets in existing pipelines. Your pipeline code should still work, but any targets you ran before will most likely need to rerun after the upgrade.

  • In the hash_deps() method of the metadata class, exclude symbols which are not actually dependencies, rather than just giving them empty strings. This change decouples the dependency hash from the hash of the target's command (#1108).

Cloud metadata

  • Continuously upload metadata files to the cloud during tar_make(), tar_make_clustermq(), and tar_make_future() (#1109). Upload them to the repository specified in the repository_meta tar_option_set() option, and use the bucket and prefix set in the resources tar_option_set() option. repository_meta defaults to the existing repository tar_option_set() option.
  • Add new functions tar_meta_download(), tar_meta_upload(), tar_meta_sync(), and tar_meta_delete() to directly manage cloud metadata outside the pipeline (#1109).

Other changes

  • Fix solution of #1103 so the copy fallback actually runs (@jds485, #1102, #1103).
  • Switch back to tempdir() for #1103.
  • Move path_scratch_dir_network() to file.path(tempdir(), "targets") and make sure tar_destroy("all") and tar_destroy("cloud") delete it.
  • Display tar_mermaid() subgraphs with transparent fills and black borders.
  • Allow database$get_data() to work with list columns.
  • Disallow functions that access the local data store (including metadata) from inside a target while the pipeline is running (#1055, #1063). The only exception to this is local file targets such as tarchetypes literate programming target factories like tar_render() and tar_quarto().
  • In the hash_deps() method of the metadata class, use a new custom sort_chr() function which temporarily sets the LC_COLLATE locale to "C" for sorting. This ensures lexicographic comparisons are consistent across platforms (#1108).
  • In tar_source(), use the file argument and keep.source = TRUE to help with interactive debugging (#1120).
  • Deprecated seconds_interval in tar_config_get(), tar_make(), tar_make_clustermq() and tar_make_future(). Replace it with seconds_meta (to control how often metadata gets saved) and seconds_reporter (to control how often to print messages to the R console) (#1119).
  • Respect seconds_meta and seconds_reporter for writing metadata and console messages even for currently building targets (#1055).
  • Retry all cloud REST API calls with HTTP error codes (429, 500-599) with the exponential backoff algorithm from googleAuthR (#1112).
  • For format = "url", only retry on the HTTP error codes above.
  • Make cloud temp file instances unique in order to avoid file conflicts with the same target.
  • Un-deprecate seconds_interval and seconds_timeout from tar_resources_url(), and implement max_tries arguments in tar_resources_aws() and tar_resources_gcp() (#1127).
  • Use file and keep.source in parse() in callr utils and target Markdown.
  • Automatically convert "file_fast" format to "file" format for cloud targets.
  • In tar_prune() and tar_delete(), do not try to delete pattern targets which have no cloud storage.
  • Add new arguments seconds_timeout, close_connection, s3_force_path_style to tar_resources_aws() to support the analogous arguments in paws.storage::s3() (#1134, @snowpong).

CRAN patch

10 Aug 17:49
Compare
Choose a tag to compare
  • Fix a documentation issue in an Rd file.

Storage improvements

10 Aug 17:24
Compare
Choose a tag to compare

targets 1.2.1

  • Add tar_prune_list() (#1090, @mglev1n).
  • Wrap file.rename() in tryCatch() and fall back on a copy-then-remove workaround (@jds485, #1102, #1103).
  • Stage temporary cloud upload/download files in tools::R_user_dir(package = "targets", which = "cache") instead of tempdir(). tar_destroy(destroy = "cloud") and tar_destroy(destroy = "all") remove any leftover files from failed uploads/downloads (@jds485, #1102, #1103).
  • Use paws.storage instead of all of paws.

Improved crew integration

26 Jun 15:13
Compare
Choose a tag to compare

targets 1.2.0

crew integration

  • Do not assume S3 classes when validating crew controllers.
  • Suggest a crew controller in the _targets.R file from use_targets().
  • Make tar_crew() compatible with crew >= 0.3.0.
  • Rename argument terminate to terminate_controller in tar_make().
  • Add argument use_crew in tar_make() and add an option in tar_config_set() to make it configurable.
  • Write progress data and metadata in target_prepare().

Other improvements

  • Allow users to set the default label and level_separation arguments through tar_config_set() (#1085, @Moohan).

CRAN patch 3

23 May 13:07
Compare
Choose a tag to compare

targets 1.1.3

  • Decide on nanonext usage in time_seconds_local() at runtime and not installation time. That way, if nanonext is removed after targets is installed, functions in targets still work. Fixes the CRAN issues seen in tarchetypes, jagstargets, and gittargets.

Remarks

R CMD check shows a NOTE with messages such as "#STDOFF 2:05:08.9". This is caused by an issue in the arrow package (apache/arrow#35594) which is in "Suggests:" in the DESCRIPTION file of targets. The NOTE will go away on its own when the next arrow is released to CRAN.

CRAN patch 2

23 May 00:49
Compare
Choose a tag to compare

targets 1.1.2

  • Remove crew-related startup messages.

Remarks

R CMD check shows a NOTE with messages such as "#STDOFF 2:05:08.9". This is caused by an issue in the arrow package (apache/arrow#35594) which is in "Suggests:" in the DESCRIPTION file of targets. The NOTE will go away on its own when the next arrow is released to CRAN.

CRAN patch

22 May 20:11
Compare
Choose a tag to compare

targets 1.1.1

  • Pre-compute cli colors and bullets to improve performance in RStudio.
  • Use packageStartupMessage() for package startup messages.

Remarks

R CMD check shows a NOTE with messages such as "#STDOFF 2:05:08.9". This is caused by apache/arrow#35594 because the arrow package is in "Suggests", in the DESCRIPTION file of targets. The NOTE will go away on its own when the next arrow is released to CRAN.

Major improvements to robustness, speed, and {crew} integration

22 May 17:04
Compare
Choose a tag to compare

targets 1.1.0

Bug fixes

  • Send targets to the appropriate controller in a controller group when crew is used.

General improvements

  • Call gc() more appropriately when garbage_collection is TRUE in tar_target().
  • Add garbage_collection arguments to tar_make(), tar_make_clustermq(), and tar_make_future() to add optional garbage collection before targets are sent to workers. This is different and independent from the garbage_collection argument of tar_target(). In high-performance computing scenarios, the former controls what happens on the main controlling process, whereas the latter controls what happens on the worker.
  • Add garbage_collection and seconds_interval arguments to tar_make(), tar_make_clustermq(), tar_make_future(), and tar_config_set().
  • Downsize the tar_runtime object.
  • Remove the 100 Kb file size cutoff for determining whether to trust the file timestamp or recompute the hash when checking if a file is up to date (#1062). Instate the "file_fast" format and the trust_object_timestamps option in tar_option_set() as safer alternatives.
  • Consolidate store constructors.
  • Allow crew controller groups (#1065, @mglev1n).
  • Expose more exponential backoff configuration parameters through tar_backoff(). The backoff argument of tar_option_set() now accepts output from tar_backoff(), and supplying a numeric is deprecated.
  • Fix the exponential backoff rules in the crew scheduling algorithm.
  • Implement tar_resources_network() to configure retries and timeouts for internal HTTP/HTTPS requests in specialized targets with format = "url", repository = "aws", and repository = "gcp". Also applies to syncing target files across network file systems in the case of storage = "worker" or format = "file", which previously had a hard-coded seconds_interval = 0.1 and seconds_timeout = 60.
  • Deprecate seconds_interval and seconds_timeout in tar_resources_url() in favor of the new equivalent arguments of tar_resources_network()
  • Safely withhold a target from its crew controller when the controller is saturated (#1074, @mglev1n).
  • Use exponential backoff when appending a target back to the queue in the case of a saturated crew controller.

Speedups

  • Cache info about all of _targets/objects/ in tar_callr_inner_try() and update the cache as targets are saved to _targets/objects/ to avoid the overhead of repeated calls to file.exists() and file.info() (#1056).
  • Trust the timestamps by default when checking whether files in _targets/objects/ are up to date (#1062). tar_option_set(trust_object_timestamps = FALSE) ignores the timestamps and recomputes the hashes.
  • Write to _targets/meta/meta and _targets/meta/progress in timed batches instead of line by line (#1055).
  • Reporters now print progress messages in timed batches instead of line by line (#1055).
  • The summary and forecast reporters are much faster because they avoid going through data frames.
  • Avoid tempfile() when working with the scratch directory.
  • Use nanonext::mclock() instead of proc.time() when there is no risk of forked processes.
  • Replace withr with slightly faster/leaner base R alternatives.
  • Efficiently catch changes to the working directory instead of overburdening the pipeline with calls to setwd() (#1057).
  • Invoke tar_options methods in the internals instead of tar_option_get().
  • Avoid gsub() in store_init().
  • Avoid repeated calls to meta$get_record() in builder_should_run().
  • Mock the store object when creating a record from a metadata row.
  • Avoid cli::col_none() to reduce the number of ANSI characters printed to the R console.

Integration with {crew}

24 Apr 15:54
Compare
Choose a tag to compare

targets 1.0.0

targets is moving to version 1.0.0 because it is significantly more mature than previous versions. Specifically,

  1. tar_make() now integrates with crew, which will significantly improve the way targets does high-performance computing going forward.
  2. All other functionality in targets has stabilized. There is still room for smaller new features, but none as large as crew integration, none that will fundamentally change how the package operates.

Major improvements

  • Support distributed computing through the crew package in tar_make() (#753). crew itself is still in its early stages and currently lacks the launcher plugins to match the clustermq and future backends, but long-term, crew will be the predominant high-performance computing backend.

Minor improvements

  • Add a new store_copy_object() to the store class to enable "fst_dt" and other formats to make deep copies when needed (#1041, @MilesMcBain).
  • Add a new copy argument to allow tar_format() formats to set the store_copy_object() method (#1041, @MilesMcBain).
  • Shorten the output string returned by tar_format() when default methods are used.
  • Add a change_directory argument to tar_source() (#1040, @dipterix).
  • In format = "url" targets, implement retries and timeouts when connecting to URLs. The default timeout is 10 seconds, and the default retry interval is 1 second. Both are configurable via tar_resources_url() (#1048).
  • Use parallelly::freePort() in tar_random_port().
  • Rename a target and a function in the tar_script() example pipeline (#1033, @b-rodrigues).
  • Edit the description.

CRAN patch

08 Mar 12:33
Compare
Choose a tag to compare

targets 0.14.3

  • Handle encoding errors while trying to process error and warning messages (#1019, @adrian-quintario).
  • Fix S3 generic/method consistency.