diff --git a/docs/source/design.md b/docs/source/design.md index 9d72c6f..b960d89 100644 --- a/docs/source/design.md +++ b/docs/source/design.md @@ -378,7 +378,7 @@ We call File an example of an *External Value*, because Files can change in an e redun naturally understands which Files are used for input vs output based on whether they are passed as arguments or returned as results, respectively. Note, it can lead to confusing behavior to pass a File as input to a Task, alter it, and then return it as a result. That would lead to the recorded call node to be immediately out of date (its input File hash doesn't match anymore). The user should be careful to avoid this pattern. -See the [Validity](values.md#Validity) section below for additional discussion on the general feature. +See the [Validity](values.md#validity) section below for additional discussion on the general feature. ### Shell scripting @@ -575,7 +575,7 @@ def main(): Depicted above is an example call graph and job tree (left) for an execution of a workflow (right). When each task is called, a CallNode is recorded along with all the Values used as arguments and return values. As tasks (`main`) call child tasks, children CallNodes are recorded (`task1` and `task2`). "Horizontal" dataflow is also recorded between sibling tasks, such as `task1` and `task2`. Each node in the call graph is identified by a unique hash and each Job and Execution is identified by a unique UUID. This information is stored by default in the redun database `.redun/redun.db`. -The redun backend database provides a durable record of these call graphs for every execution redun performs. This not only provides the backend storage for caching, it also is queryable by users to explore the call graph, using the `redun log`, `redun console`, and `redun repl` commands. For example, if we know that a file `/tmp/data` was produced by redun, we can find out exactly which execution did so, and hence can retrieve information about the code and inputs used to do so. See [querying call graphs](db.md#Querying-call-graphs) for more. +The redun backend database provides a durable record of these call graphs for every execution redun performs. This not only provides the backend storage for caching, it also is queryable by users to explore the call graph, using the `redun log`, `redun console`, and `redun repl` commands. For example, if we know that a file `/tmp/data` was produced by redun, we can find out exactly which execution did so, and hence can retrieve information about the code and inputs used to do so. See [querying call graphs](db.md#querying-call-graphs) for more. ## Advanced topics @@ -588,7 +588,7 @@ It's common to use workflow engines to implement Extract Transform Load (ETL) pi - With files, we were able to double check if their current state was consistent with our cache by hashing them. With a database or API, it's typically not feasible to hash a whole database. Is there something else we could do? - The redun cache contains cached results from all previous runs. Conveniently, that allows for fast reverting to old results if code or input data is changed back to the old state. However, for a stateful system like a database, we likely can't just re-execute arbitrary tasks in any order. Similar to database migration frameworks (South, Alembic, etc), we may need to roll back past tasks before applying new ones. -redun provides solutions to several of these challenges using a concept called (Handles)[values.md#Handles-for-ephemeral-and-stateful-values]. +redun provides solutions to several of these challenges using a concept called [Handles](values.md#handles-for-ephemeral-and-stateful-values). ### Running without a scheduler @@ -618,7 +618,7 @@ task is far more independent of the parent scheduler, able to interact with the resolve complex expressions or recursive tasks. Third, federated task for submitting to a REST proxy is fire-and-forget; see -[Federated task](tasks.md#Federated-task) It will trigger a +[Federated task](tasks.md#federated-task) It will trigger a completely separate redun execution to occur, but it only provides the execution id back to the caller. It doesn't make sense for the REST proxy to be a full executor, since it's not capable enough to handle arbitrary tasks, by design it only handles federated tasks. Plus, diff --git a/docs/source/scheduler.md b/docs/source/scheduler.md index ebb5038..0e2f9e8 100644 --- a/docs/source/scheduler.md +++ b/docs/source/scheduler.md @@ -178,7 +178,7 @@ with CSE, that the cached value is appropriate to use. Task caching operates at the granularity of a single call to a `Task` with concrete arguments. Recall that the result of a `Task` might be a value, or another expression that needs further evaluation. In its normal mode, caching uses single -reductions, stepping through the evaluation. See the [Results caching](design.md#Result-caching) +reductions, stepping through the evaluation. See the [Results caching](design.md#result-caching) section, for more information on how this recursive checking works. Consider the following example: @@ -206,9 +206,9 @@ To evaluate `out`, the following three task executions might be considered for c For CSE, we could simply assume that the code was identical for a task, but for caching, need to actually check that the code is identical, as defined by the -[hash of the Task](tasks.md#Task-hashing). Since `Value` objects can represent state in addition +[hash of the Task](tasks.md#task-hashing). Since `Value` objects can represent state in addition to their natural values, we need to check that the output is actually valid before using a cache -result; see [Validity](values.md#Validity). +result; see [Validity](values.md#validity). The normal caching mode (so-called "full") is fully recursive (i.e., uses single reductions), hence the scheduler must visit every node in the entire call graph produced by an expression, diff --git a/docs/source/tasks.md b/docs/source/tasks.md index 51e26ba..0a87fae 100644 --- a/docs/source/tasks.md +++ b/docs/source/tasks.md @@ -240,12 +240,12 @@ Lastly, several task options, such as [`image`](config.md) or [`memory`](config. Generally not a user-facing option, this is a `Optional[Set[CacheResult]]` specifying an upper bound on which kind of cache results are may be used (default: `None`, indicating that any are allowed). ### `cache` -A bool (default: `true`) that defines whether the backend cache can be used to fast-forward through the task's execution. See [Scheduler](scheduler.md#Configuration-options) for more explanation. +A bool (default: `true`) that defines whether the backend cache can be used to fast-forward through the task's execution. See [Scheduler](scheduler.md#configuration-options) for more explanation. A value of `true` is implemented by setting `cache_scope=CacheScope.BACKEND` and `false` by setting `cache_scope=CacheScope.CSE`. ### `cache_scope` -A `CacheScope` enum value (default: `CacheScope.BACKEND`) that indicates the upper bound on what scope a cache result may come from. See [Scheduler](scheduler.md#Configuration-options) for more explanation. +A `CacheScope` enum value (default: `CacheScope.BACKEND`) that indicates the upper bound on what scope a cache result may come from. See [Scheduler](scheduler.md#configuration-options) for more explanation. * `NONE`: Disable both CSE and cache hits * `CSE`: Only reuse computations from within this execution @@ -254,7 +254,7 @@ A `CacheScope` enum value (default: `CacheScope.BACKEND`) that indicates the upp ### `check_valid` An enum value `CacheCheckValid` (or a string that can be coerced, default: `"full"`) that defines whether the entire subtree of results is checked for validity (`"full"`) or whether just this task's ultimate results need to be valid (`"shallow"`). This can be used to dramatically speed up resuming large workflows. -See [Scheduler](scheduler.md#Configuration-options) for more explanation. +See [Scheduler](scheduler.md#configuration-options) for more explanation. ### `config_args` @@ -716,7 +716,7 @@ which are additional config files that are allowed to specify additional `federa In addition to primary federated tasks, we provide tools to support REST-based proxy. See `redun.federated_tasks.rest_federated_task` and `redun.federated_tasks.launch_federated_task`. The proxy has two main features. First, it is designed to help facilitate a fire-and-forget approach -to launching jobs (see [Running without a scheduler](design.md#Running-without-a-scheduler) ), +to launching jobs (see [Running without a scheduler](design.md#running-without-a-scheduler) ), which is useful in implementing a UI. Second, it can help arrange for permissions, such as facilitating AWS role switches.