Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor datamodule/model testing #329

Merged
merged 21 commits into from
Dec 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions conf/defaults.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
config_file: null # This lets the user pass a config filename to load other arguments from

program: # These are the arguments that define how the train.py script works
seed: 1337
seed: 0
output_dir: output
data_dir: data
log_dir: logs
Expand All @@ -16,16 +16,17 @@ experiment: # These are arugments specific to the experiment we are running
root_dir: ${program.data_dir}
seed: ${program.seed}
batch_size: 32
num_workers: 4
num_workers: 0


# The values here are taken from the defaults here https://pytorch-lightning.readthedocs.io/en/1.3.8/common/trainer.html#init
# this probably should be made into a schema, e.g. as shown https://omegaconf.readthedocs.io/en/2.0_branch/structured_config.html#merging-with-other-configs
trainer: # These are the parameters passed to the pytorch lightning Trainer object
logger: True
checkpoint_callback: True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of these defaults have been renamed. The previous values give deprecation warnings. We may want to not hardcode these values and instead let pytorch-lightning assign these values automatically.

callbacks: null
default_root_dir: null
detect_anomaly: False
enable_checkpointing: True
gradient_clip_val: 0.0
gradient_clip_algorithm: 'norm'
process_position: 0
Expand All @@ -43,16 +44,15 @@ trainer: # These are the parameters passed to the pytorch lightning Trainer obje
accumulate_grad_batches: 1
max_epochs: null
min_epochs: null
max_steps: null
max_steps: -1
min_steps: null
max_time: null
limit_train_batches: 1.0
limit_val_batches: 1.0
limit_test_batches: 1.0
limit_predict_batches: 1.0
val_check_interval: 1.0
flush_logs_every_n_steps: 100
log_every_n_steps: 50
log_every_n_steps: 1
accelerator: null
sync_batchnorm: False
precision: 32
Expand All @@ -66,9 +66,7 @@ trainer: # These are the parameters passed to the pytorch lightning Trainer obje
reload_dataloaders_every_epoch: False
auto_lr_find: False
replace_sampler_ddp: True
terminate_on_nan: False
auto_scale_batch_size: False
prepare_data_per_node: True
plugins: null
amp_backend: 'native'
move_metrics_to_cpu: False
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ experiment:
root_dir: "tests/data/bigearthnet"
bands: "all"
num_classes: ${experiment.module.num_classes}
batch_size: 128
batch_size: 1
num_workers: 0
16 changes: 16 additions & 0 deletions conf/task_defaults/bigearthnet_s1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
experiment:
task: "bigearthnet"
module:
loss: "bce"
classification_model: "resnet18"
learning_rate: 1e-3
learning_rate_schedule_patience: 6
weights: "random"
in_channels: 2
num_classes: 19
datamodule:
root_dir: "tests/data/bigearthnet"
bands: "s1"
num_classes: ${experiment.module.num_classes}
batch_size: 1
num_workers: 0
16 changes: 16 additions & 0 deletions conf/task_defaults/bigearthnet_s2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
experiment:
task: "bigearthnet"
module:
loss: "bce"
classification_model: "resnet18"
learning_rate: 1e-3
learning_rate_schedule_patience: 6
weights: "random"
in_channels: 12
num_classes: 19
datamodule:
root_dir: "tests/data/bigearthnet"
bands: "s2"
num_classes: ${experiment.module.num_classes}
batch_size: 1
num_workers: 0
2 changes: 1 addition & 1 deletion conf/task_defaults/byol.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ experiment:
- "de-test"
test_splits:
- "de-test"
batch_size: 64
batch_size: 1
num_workers: 0
29 changes: 29 additions & 0 deletions conf/task_defaults/chesapeake_cvpr_5.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
experiment:
task: "chesapeake_cvpr"
module:
loss: "ce"
segmentation_model: "unet"
encoder_name: "resnet50"
encoder_weights: null
encoder_output_stride: 16
learning_rate: 1e-3
learning_rate_schedule_patience: 6
in_channels: 4
num_classes: 5
num_filters: 1
ignore_zeros: False
imagenet_pretraining: False
datamodule:
root_dir: "tests/data/chesapeake/cvpr"
train_splits:
- "de-test"
val_splits:
- "de-test"
test_splits:
- "de-test"
patches_per_tile: 2
patch_size: 64
batch_size: 2
num_workers: 0
class_set: ${experiment.module.num_classes}
use_prior_labels: False
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ experiment:
learning_rate_schedule_patience: 6
in_channels: 4
num_classes: 7
num_filters: 256
num_filters: 1
ignore_zeros: False
imagenet_pretraining: False
datamodule:
root_dir: "tests/data/chesapeake/cvpr"
train_splits:
Expand All @@ -20,8 +21,9 @@ experiment:
- "de-test"
test_splits:
- "de-test"
patches_per_tile: 200
patch_size: 256
batch_size: 64
patches_per_tile: 2
patch_size: 64
batch_size: 2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BYOLTask tests fail with a batch size of 1 and I have no idea why. SemanticSegmentationTask tests work fine with a batch size of 1. Oh, the mysteries of life...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any preprocessing methods use .squeeze(), then they will remove the batch dimension which will in turn break the forward pass

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update these to define the only dim that should be squeezed like .squeeze(dim=1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried this (both dim=0 and dim=1) and as soon as it fixes one issue, it creates another one. I don't think I have the time to debug this any further, but if anyone wants to submit a follow-up PR to fix this I would be very happy.

num_workers: 0
class_set: ${experiment.module.num_classes}
use_prior_labels: False
29 changes: 29 additions & 0 deletions conf/task_defaults/chesapeake_cvpr_prior.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
experiment:
task: "chesapeake_cvpr"
module:
loss: "ce"
segmentation_model: "unet"
encoder_name: "resnet50"
encoder_weights: null
encoder_output_stride: 16
learning_rate: 1e-3
learning_rate_schedule_patience: 6
in_channels: 4
num_classes: 5
num_filters: 1
ignore_zeros: False
imagenet_pretraining: False
datamodule:
root_dir: "tests/data/chesapeake/cvpr"
train_splits:
- "de-test"
val_splits:
- "de-test"
test_splits:
- "de-test"
patches_per_tile: 2
patch_size: 64
batch_size: 2
num_workers: 0
class_set: ${experiment.module.num_classes}
use_prior_labels: True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These datamodule settings (use_prior_labels: True) work with BYOLTask but not with SemanticSegmentationTask and I have no idea why.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what error you're getting but BYOLTask doesn't use the masks so not surprised it passes. The prior labels I believe are soft probabilities so I don't think we've set up the SegmentationTask loss to handle that.

4 changes: 3 additions & 1 deletion conf/task_defaults/cowc_counting.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ experiment:
model: resnet18
learning_rate: 1e-3
learning_rate_schedule_patience: 2
pretrained: False
datamodule:
root_dir: "tests/data/cowc_counting"
batch_size: 32
seed: 0
batch_size: 1
num_workers: 0
3 changes: 2 additions & 1 deletion conf/task_defaults/cyclone.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ experiment:
pretrained: False
datamodule:
root_dir: "tests/data/cyclone"
batch_size: 32
seed: 0
batch_size: 1
num_workers: 0
2 changes: 1 addition & 1 deletion conf/task_defaults/etci2021.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ experiment:
ignore_zeros: True
datamodule:
root_dir: "tests/data/etci2021"
batch_size: 32
batch_size: 1
num_workers: 0
2 changes: 1 addition & 1 deletion conf/task_defaults/eurosat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ experiment:
num_classes: 10
datamodule:
root_dir: "tests/data/eurosat"
batch_size: 128
batch_size: 1
num_workers: 0
4 changes: 2 additions & 2 deletions conf/task_defaults/landcoverai.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ experiment:
verbose: false
in_channels: 3
num_classes: 6
num_filters: 256
num_filters: 1
ignore_zeros: False
datamodule:
root_dir: "tests/data/landcoverai"
batch_size: 32
batch_size: 1
num_workers: 0
4 changes: 2 additions & 2 deletions conf/task_defaults/naipchesapeake.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ experiment:
learning_rate_schedule_patience: 2
in_channels: 4
num_classes: 13
num_filters: 64
num_filters: 1
ignore_zeros: False
datamodule:
naip_root_dir: "tests/data/naip"
chesapeake_root_dir: "tests/data/chesapeake/BAYWIDE"
batch_size: 32
batch_size: 2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a batch size of 1 (what I'm using on all other tests) this breaks and I don't know why.

num_workers: 0
patch_size: 32
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,18 @@ experiment:
loss: "jaccard"
segmentation_model: "unet"
encoder_name: "resnet18"
encoder_weights: null
encoder_weights: null
learning_rate: 1e-3
learning_rate_schedule_patience: 6
verbose: false
in_channels: 26
num_classes: 2
num_filters: 256
num_filters: 1
ignore_zeros: True
datamodule:
root_dir: "tests/data/oscd"
batch_size: 32
batch_size: 1
num_workers: 0
val_split_pct: 0.1
val_split_pct: 0.5
bands: "all"
num_patches_per_tile: 128
num_patches_per_tile: 1
21 changes: 21 additions & 0 deletions conf/task_defaults/oscd_rgb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
experiment:
task: "oscd"
module:
loss: "jaccard"
segmentation_model: "unet"
encoder_name: "resnet18"
encoder_weights: null
learning_rate: 1e-3
learning_rate_schedule_patience: 6
verbose: false
in_channels: 6
num_classes: 2
num_filters: 1
ignore_zeros: True
datamodule:
root_dir: "tests/data/oscd"
batch_size: 1
num_workers: 0
val_split_pct: 0.5
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val_split_pct == 0 breaks the tests and I don't know why

bands: "rgb"
num_patches_per_tile: 1
2 changes: 1 addition & 1 deletion conf/task_defaults/resisc45.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ experiment:
num_classes: 45
datamodule:
root_dir: "tests/data/resisc45"
batch_size: 128
batch_size: 1
num_workers: 0
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,7 @@ experiment:
ignore_zeros: False
datamodule:
root_dir: "tests/data/sen12ms"
batch_size: 32
band_set: "all"
batch_size: 1
num_workers: 0
seed: 0
20 changes: 20 additions & 0 deletions conf/task_defaults/sen12ms_s1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
experiment:
task: "sen12ms"
module:
loss: "focal"
segmentation_model: "fcn"
num_filters: 1
encoder_name: "resnet18"
encoder_weights: null
encoder_output_stride: 16
learning_rate: 1e-3
learning_rate_schedule_patience: 2
in_channels: 2
num_classes: 11
ignore_zeros: False
datamodule:
root_dir: "tests/data/sen12ms"
band_set: "s1"
batch_size: 1
num_workers: 0
seed: 0
19 changes: 19 additions & 0 deletions conf/task_defaults/sen12ms_s2_all.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
experiment:
task: "sen12ms"
module:
loss: "ce"
segmentation_model: "unet"
encoder_name: "resnet18"
encoder_weights: null
encoder_output_stride: 16
learning_rate: 1e-3
learning_rate_schedule_patience: 2
in_channels: 13
num_classes: 11
ignore_zeros: False
datamodule:
root_dir: "tests/data/sen12ms"
band_set: "s2-all"
batch_size: 1
num_workers: 0
seed: 0
19 changes: 19 additions & 0 deletions conf/task_defaults/sen12ms_s2_reduced.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
experiment:
task: "sen12ms"
module:
loss: "ce"
segmentation_model: "unet"
encoder_name: "resnet18"
encoder_weights: null
encoder_output_stride: 16
learning_rate: 1e-3
learning_rate_schedule_patience: 2
in_channels: 6
num_classes: 11
ignore_zeros: False
datamodule:
root_dir: "tests/data/sen12ms"
band_set: "s2-reduced"
batch_size: 1
num_workers: 0
seed: 0
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
experiment:
task: "so2sat"
module:
loss: "ce"
loss: "focal"
classification_model: "resnet18"
learning_rate: 1e-3
learning_rate_schedule_patience: 6
Expand All @@ -10,6 +10,7 @@ experiment:
num_classes: 17
datamodule:
root_dir: "tests/data/so2sat"
batch_size: 128
batch_size: 1
num_workers: 0
bands: "rgb"
unsupervised_mode: False
Loading