[Data] Allow disabling Task Fusion / Documenting how to avoid it

### Description

When using Ray Data's `map_batches` operation in a pipeline with multiple stages, if two or more consecutive `map_batches` have identical resource requirements (e.g., both request `num_cpus=1`), Ray **may** fuse these stages into a single task. This fusion can be problematic when the stages have very different characteristics—such as one being I/O intensive and the other compute intensive—because:

1. Both tasks get scheduled together, leading to inefficient resource usage.
2. Autoscaling and scheduling cannot independently optimize for the distinct needs of each stage.
3. One slow stage (e.g., I/O) can bottleneck the overall pipeline, even if the next stage could be run in parallel.

Suppose you have:

Stage 1: I/O intensive, originally set to num_cpus=1
Stage 2: Compute intensive, also set to num_cpus=1

Ray may fuse these into a single task, causing the above issues.

### Workaround
To prevent fusion and allow each stage to be scheduled and autoscaled independently, assign different resource requirements to each map_batches stage. For example:

1. Set the I/O intensive stage to num_cpus=0.5
2. Set the compute intensive stage to num_cpus=1.0

This ensures Ray treats each stage as a separate task, enabling better autoscaling and parallelism.

## Request
1. Feature: Provide a way to explicitly disable task fusion for map_batches, or document this fusion behavior and the recommended workaround.
2. Documentation: Clearly explain in the docs how resource requirements affect task fusion, and how to avoid unintended fusion when building multi-stage pipelines with heterogeneous workloads.
3. Long Shot - Use `time(completion of map_batches per row)` of task to understand which stages can be fused



### Use case

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data] Allow disabling Task Fusion / Documenting how to avoid it #54433

Description

Workaround

Request

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Data] Allow disabling Task Fusion / Documenting how to avoid it #54433

Description

Description

Workaround

Request

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions