Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cityscapes AutoLabelling dataset #1000

Merged
merged 7 commits into from
May 15, 2023

Conversation

lkdci
Copy link
Contributor

@lkdci lkdci commented May 14, 2023

Cityscapes AutoLabelled dataset were introduced by NVIDIA research group.
paper: Hierarchical Multi-Scale Attention for Semantic Segmentation", https://arxiv.org/abs/2005.10821
Official repo: https://github.com/NVIDIA/semantic-segmentation

This PR includes:

  • CityscapesConcatDataset to support combination of cityscapes subsets.
  • example ddrnet recipe with AL dataset.

@dagshub
Copy link

dagshub bot commented May 14, 2023

Copy link
Contributor

@Louis-Dupont Louis-Dupont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, if you can just add a test for this dataloader, just to make sure that it runs
https://github.com/Deci-AI/super-gradients/blob/master/tests/unit_tests/dataloader_factory_test.py

Thanks 🙏

@lkdci
Copy link
Contributor Author

lkdci commented May 15, 2023

Hi @Louis-Dupont there is a design conflict about the Dataloader creation.

A dataloader - dataset creation strategy can be done in two different way,

First Approach - dataloader factory:

train_dataloader: cityscapes_train

This approach is problematic since it hinder loading default parameters from a default yaml file defined in code. Then when passing the dataset_params through the config recipe, we force default value that we might not want to include, but they are injected within the code, which contradict the yaml approach for building configs.

see this example:

dataloader factory method:

def my_dataset_train(...):
   return get_data_loader(config_name="my_dataset_default", dataset_cls=MyDataset)

my_dataset_default.yaml:

...
train_dataloader_params:
  sampler: "my_data_sampler"

then in main_recipe_config.yaml:

train_dataloader: my_dataset_train

Following this examples we are not able to initiate my dataset without the sampler field, and we might easily miss is injected in the first place into the dataloader params. (This issue was reported before for the coco dataset with infinite sampler in previous versions.)

Seccond Approach - dataset factory:

Explicitly define the dataset type to use without using the wrapper dataloader factory:

Following the previous example, my_dataset_default.yaml, we add the dataset key:

my_dataset_default.yaml:

...
train_dataloader_params:
  dataset: MyDataset
  sampler: "my_data_sampler"

In contrary to the previous approach we are not bounded by the above default params, and we can set a different dataset params file:

my_dataset_custom_params.yaml:

...
train_dataloader_params:
  dataset: MyDataset

Then in then in main_recipe_config.yaml explicitly choose the required dataset_params to use:

defaults:
  - dataset_params: my_dataset_custom_params

IMO this approach is preferable with better visibility, and doesn't involves hidden behavior within the dataloader factory code.

Why not supporting both approaches?

Both approaches are supported within SG, but there is bug to use both for a given dataset, and the following error is raised:

Error
Traceback (most recent call last):
  File "/home/lior.kadoch/PycharmProjects/super-gradients/tests/unit_tests/dataloader_factory_test.py", line 286, in test_cityscapes_al_train_creation
    dl_train = cityscapes_auto_labelling_train()
  File "/home/lior.kadoch/PycharmProjects/super-gradients/src/super_gradients/training/dataloaders/dataloaders.py", line 548, in cityscapes_auto_labelling_train
    return get_data_loader(
  File "/home/lior.kadoch/PycharmProjects/super-gradients/src/super_gradients/training/dataloaders/dataloaders.py", line 80, in get_data_loader
    dataloader = DataLoader(dataset=dataset, **dataloader_params)
TypeError: type object got multiple values for keyword argument 'dataset'

@lkdci lkdci requested a review from Louis-Dupont May 15, 2023 13:31
Copy link
Contributor

@Louis-Dupont Louis-Dupont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Louis-Dupont Louis-Dupont merged commit b4608f6 into master May 15, 2023
@Louis-Dupont Louis-Dupont deleted the feature/ALG-1373_cityscapes_auto_label branch May 15, 2023 14:06
avideci pushed a commit that referenced this pull request May 23, 2023
* CityscapesConcatDataset

* documentation

* ddrnet recipe

* unit test

* docs

* add to init
avideci pushed a commit that referenced this pull request May 23, 2023
* CityscapesConcatDataset

* documentation

* ddrnet recipe

* unit test

* docs

* add to init
geoffrey-g-delhomme pushed a commit to geoffrey-g-delhomme/super-gradients that referenced this pull request May 26, 2023
* CityscapesConcatDataset

* documentation

* ddrnet recipe

* unit test

* docs

* add to init
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants