Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Dataset to use Compose for transforms #7784

Merged
merged 12 commits into from
May 31, 2024

Conversation

surajpaib
Copy link
Contributor

Fixes #7646

Description

A few sentences describing the changes proposed in this pull request.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Suraj Pai <b.pai@maastrichtuniversity.nl>
Signed-off-by: Suraj Pai <b.pai@maastrichtuniversity.nl>
@surajpaib surajpaib marked this pull request as ready for review May 21, 2024 02:44
@ericspod
Copy link
Member

Hi @surajpaib thanks for the contribution. I think that the other classes inheriting from Dataset should be modified as well to pass along these arguments. They should have kwargs added to their constructors which is then passed to the superconstructor calls. There are other dataset classes which call apply_transform which maybe should be modified as well. The tests I think need to be a bit more thorough, currently they check the type of produced data which isn't indicative of the differing behaviour, you should be inspecting the return value somehow to demonstrate the added arguments have made a difference.

@surajpaib
Copy link
Contributor Author

Hi @ericspod

Thanks for your comments. I'll replace the type check tests with return value based tests for expected functionality.

As for the classes inheriting from Dataset, I think there's a bit of a design discussion to be had.

PersistentDataset and CacheDataset both expect Compose objects for the transform

if not isinstance(transform, Compose):

So ideally one would specify a Compose transform and set the map_items=False there for the behaviour that this fix is expected to handle. However for Dataset since the apply_transform method is applied directly without the Compose wrapping, even if a Compose is specified, the default arguments to apply_transform will be used (the reason for this PR).

If we want to make everything consistent, it would be the easiest to replace the apply_transform call in the Dataset with a Compose forcing similar to PersistentDataset and CacheDataset. This way we could avoid having to specify kwargs at all and potentially complicating how these kwargs are attributed to different fns/classes when applied to inherited classes, etc.

Do you foresee any issues with forcing Compose in the Dataset as well? As it calls apply_transform internally, I don't think there would be any change in functionality.

@ericspod
Copy link
Member

Hi @surajpaib your second proposal may make more sense actually. What we can do in the constructor for Dataset is check to see if the transform given is a Compose instance, if not and also not None this would be wrapped in Compose with default arguments. Users can change this behaviour by passing a Compose object themselves. We'd need a different test to verify passing Compose or not is correct but I think this solution would be more consistent as you say. If you want to modify this PR we can look at it again. Thanks!

surajpaib and others added 2 commits May 23, 2024 01:00
Signed-off-by: Suraj Pai <b.pai@maastrichtuniversity.nl>
@surajpaib surajpaib changed the title Support kwargs on Datasetfor apply_transform fn Update Dataset to use Compose for transforms May 23, 2024
@surajpaib surajpaib changed the title Update Dataset to use Compose for transforms Refactor Dataset to use Compose for transforms May 23, 2024
@KumoLiu
Copy link
Contributor

KumoLiu commented May 24, 2024

ci error log:

======================================================================
ERROR: test_shape_2 (tests.test_arraydataset.TestArrayDataset)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
  File "/Users/runner/work/MONAI/MONAI/tests/test_arraydataset.py", line 100, in test_shape
    data1 = dataset[0]
  File "/Users/runner/work/MONAI/MONAI/monai/data/dataset.py", line 1367, in __getitem__
    return self.dataset[index]
  File "/Users/runner/work/MONAI/MONAI/monai/data/dataset.py", line 106, in __getitem__
    return self._transform(index)
  File "/Users/runner/work/MONAI/MONAI/monai/data/dataset.py", line 1262, in _transform
    data.extend(to_list(dataset[index]))
  File "/Users/runner/work/MONAI/MONAI/monai/data/dataset.py", line 106, in __getitem__
    return self._transform(index)
  File "/Users/runner/work/MONAI/MONAI/monai/data/dataset.py", line 92, in _transform
    return self.transform(data_i)
TypeError: __call__() missing 1 required positional argument: 'lazy'

----------------------------------------------------------------------

Signed-off-by: Suraj Pai <b.pai@maastrichtuniversity.nl>
@surajpaib
Copy link
Contributor Author

Hi @ericspod @KumoLiu

The updated PR is ready for review.

Noting some of the changes from the PR below,

  1. All dataset transforms use Compose now. I've replaced the apply_transform way of applying the transforms with Compose for the Dataset, ZipDataset and NPZDictItemDataset classes.
  2. Removed the Compose checks from PersistentDataset and CacheDataset as these are now done in the Dataset base class.
  3. Added tests for the map_items=False and tuple input scenario for Dataset in test_dataset.py. Modified tests for test_profiling.py and test_arraydataset.py for the Compose changes made. The functionality tested does not change.

Overall, I think these changes make the transform application across Datasets more uniform and simplify underlying code as well.

monai/data/dataset.py Show resolved Hide resolved
monai/data/dataset.py Show resolved Hide resolved
surajpaib and others added 2 commits May 29, 2024 15:59
Signed-off-by: Suraj Pai <b.pai@maastrichtuniversity.nl>
@KumoLiu
Copy link
Contributor

KumoLiu commented May 30, 2024

/build

monai/data/dataset.py Show resolved Hide resolved
monai/data/dataset.py Outdated Show resolved Hide resolved
Copy link
Member

@ericspod ericspod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this now if everyone else is.

surajpaib and others added 2 commits May 30, 2024 13:04
Co-authored-by: Ben Murray <ben.murray@gmail.com>
Signed-off-by: Suraj Pai <b.pai@maastrichtuniversity.nl>
@KumoLiu
Copy link
Contributor

KumoLiu commented May 31, 2024

/build

@KumoLiu KumoLiu merged commit 4029c42 into Project-MONAI:dev May 31, 2024
28 checks passed
slicepaste pushed a commit to slicepaste/MONAI that referenced this pull request Jun 3, 2024
Fixes Project-MONAI#7646 

### Description

A few sentences describing the changes proposed in this pull request.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [x] New tests added to cover the changes.
- [ ] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [x] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.
- [x] In-line docstrings updated.
- [ ] Documentation updated, tested `make html` command in the `docs/`
folder.

---------

Signed-off-by: Suraj Pai <b.pai@maastrichtuniversity.nl>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Co-authored-by: Ben Murray <ben.murray@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide ability to override map_items=True set by default for Dataset
4 participants