Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ValueError: train_batch_size for EfficientAd should be 1. #2491

Open
1 task done
surprise335 opened this issue Jan 9, 2025 · 2 comments
Open
1 task done

Comments

@surprise335
Copy link

Describe the bug

I used the EfficientAd network to train a classification task and encountered the error message.My training dataset contains 164 images, with 140 images of the "normal" type and 24 images of the "anomalous" type.

Dataset

Other (please specify in the text field below)

Model

Other (please specify in the field below)

Steps to reproduce the behavior

.

OS information

OS information:

  • OS: [e.g. Ubuntu 20.04]
  • Python version: [e.g. 3.10.0]
  • Anomalib version: [e.g. 0.3.6]
  • PyTorch version: [e.g. 1.9.0]
  • CUDA/cuDNN version: [e.g. 11.1]
  • GPU models and configuration: [e.g. 2x GeForce RTX 3090]
  • Any other relevant information: [e.g. I'm using a custom dataset]

Expected behavior

11

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

NO

Logs

@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
F1Score class exists for backwards compatibility. It will be removed in v1.1. Please use BinaryF1Score from torchmetrics instead
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name                  | Type                     | Params | Mode 
---------------------------------------------------------------------------
0 | model                 | EfficientAdModel         | 8.1 M  | train
1 | _transform            | Compose                  | 0      | train
2 | normalization_metrics | MetricCollection         | 0      | train
3 | image_threshold       | F1AdaptiveThreshold      | 0      | train
4 | pixel_threshold       | F1AdaptiveThreshold      | 0      | train
5 | image_metrics         | AnomalibMetricCollection | 0      | train
6 | pixel_metrics         | AnomalibMetricCollection | 0      | train
---------------------------------------------------------------------------
8.1 M     Trainable params
0         Non-trainable params
8.1 M     Total params
32.235    Total estimated model params size (MB)
Training: |                                                                                                                                                                                          | 0/? [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/ubuntu/ltt-files/anomalib-main/train_EfficientAd.py", line 33, in <module>
    train()
  File "/home/ubuntu/ltt-files/anomalib-main/train_EfficientAd.py", line 26, in train
    engine.fit(datamodule=datamodule, model=model)
  File "/home/ubuntu/ltt-files/anomalib-main/src/anomalib/engine/engine.py", line 549, in fit
    self.trainer.fit(model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
  File "/home/ubuntu/anaconda3/envs/anomalib/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 543, in fit
    call._call_and_handle_interrupt(
  File "/home/ubuntu/anaconda3/envs/anomalib/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/anomalib/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 579, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/ubuntu/anaconda3/envs/anomalib/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 986, in _run
    results = self._run_stage()
  File "/home/ubuntu/anaconda3/envs/anomalib/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1030, in _run_stage
    self.fit_loop.run()
  File "/home/ubuntu/anaconda3/envs/anomalib/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 201, in run
    self.on_run_start()
  File "/home/ubuntu/anaconda3/envs/anomalib/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 328, in on_run_start
    call._call_lightning_module_hook(trainer, "on_train_start")
  File "/home/ubuntu/anaconda3/envs/anomalib/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 159, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/ubuntu/ltt-files/anomalib-main/src/anomalib/models/image/efficient_ad/lightning_model.py", line 246, in on_train_start
    raise ValueError(msg)
ValueError: train_batch_size for EfficientAd should be 1.
Training: |          | 0/? [00:00<?, ?it/s]

Code of Conduct

  • I agree to follow this project's Code of Conduct
@surprise335
Copy link
Author

My python Code:

from anomalib import TaskType
from anomalib.models import Patchcore, Padim, Stfpm, Draem, EfficientAd
from anomalib.engine import Engine
from anomalib.deploy import ExportType
from anomalib.callbacks import ModelCheckpoint
from anomalib.data import Folder
#from loguru import logger

def train():
    # Create the datamodule
    datamodule = Folder(
    name="PointTwo",
    root="datasets/PointTwo",
    normal_dir="nomal",
    abnormal_dir="annomal",
    #test_split_mode=TestSplitMode.SYNTHETIC,
    task=TaskType.CLASSIFICATION,
    val_split_ratio=0.2,

    )
    datamodule.setup()
    model = EfficientAd()
    engine = Engine(max_epochs=100, task=TaskType.CLASSIFICATION,
                    callbacks=[ModelCheckpoint(dirpath='checkpoint/', every_n_epochs=50, save_last=True)])
    # Train the model
    engine.fit(datamodule=datamodule, model=model)
    # Export trained weight
    engine.export(export_type=ExportType.OPENVINO,
                  model=model,
                  export_root='anomalib_weight')

if __name__ == "__main__":
    train()

@FedericoDeBona
Copy link

Based on the provided information, you need to adjust the train batch size in the datamodule configuration. Here’s the corrected Python code snippet:

datamodule = Folder(
  name="PointTwo",
  root="datasets/PointTwo",
  normal_dir="nomal",
  abnormal_dir="annomal",
  #test_split_mode=TestSplitMode.SYNTHETIC,
  task=TaskType.CLASSIFICATION,
  val_split_ratio=0.2,
  train_batch_size=1, # Setting batch size to 1 because that's how it's done in the paper
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants