[Feature] Speed up the resume process of IterBased loop

### What is the feature?

https://github.com/open-mmlab/mmengine/blob/2c4516c62294964065d058d98799402f50afdef6/mmengine/runner/loops.py#L281
现有的恢复方式会对dataloader 迭代 n 个step，当n较大时，速度会很慢，因为执行了实际的数据加载和处理逻辑。 是否有比较好的方式只迭代index，不执行实际的数据加载流程。
1. 一种可能的方式是和用户约定一个返回虚拟数据的数据集接口，在恢复时返回虚拟数据，
```python
class Dataset:

    def __getitem__(self, index):
        if self._skip_flag:
            return # Fake data
        # 处理数据
        return Real data

    def skip(self):
        self._skip_flag = True

    def resume(self):
        self._skip_flag = False



# loop中的处理逻辑
            if (
                hasattr(self.dataloader.dataset, "skip")
                and callable(self.dataloader.dataset.skip)
                and hasattr(self.dataloader.dataset, "resume")
                and callable(self.dataloader.dataset.resume)
            ):
                self.dataloader.dataset.skip()
                for _ in range(self._iter):
                    next(self.dataloader_iterator)
                self.dataloader.dataset.resume()
            else:
                for _ in range(self._iter):
                    next(self.dataloader_iterator)
```
2. 方式一还是需要用户进行配合，是否可以对dataloader进行操作从而无感知的快速跳过？
```python
                iter_batch_sampler = iter(self.dataloader.batch_sampler)
                for _ in range(self._iter):
                    next(iter_batch_sampler)
```
尝试直接迭代batch_sampler 在worker=0的时候是正常的，在多worker的时候恢复数据顺序出现错误。 像知道有没有什么比较好的解决方案

### Any other context?

https://discuss.pytorch.org/t/is-there-any-way-to-skip-steps-in-a-dataloader/123201
https://pytorch.org/data/main/dataloader2.html

Snapshot the state of data-preprocessing pipeline (WIP)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Speed up the resume process of IterBased loop #1520

What is the feature?

Any other context?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

[Feature] Speed up the resume process of IterBased loop #1520

Description

What is the feature?

Any other context?

Activity

zhouzaida commented on May 17, 2024

chtzs commented on May 24, 2024

hujh1994 commented on Apr 7, 2025

wanghao9610 commented on Apr 12, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions