[Dataset] Add VisA dataset #824

djdameln · 2022-12-30T15:57:21Z

Description

This PR adds the Visual Anomaly (VisA) dataset.
The dataset follows the same format as MVTec, so we could re-use the make_mvtec_dataset function.
The make_mvtec_dataset function was slightly modified to make the mask file naming convention a bit more flexible (mvtec uses "000_mask.png", while visa uses "000.png").
There was a lot of duplication in the download and extract functionality of the different datasets, so this was moved to a shared location.

Currently targeted to feature branch, but will re-target to main once #822 has been merged.

Some examples:

Known Issues

~~CI will probably fail, because the dataset is not yet installed on the CI machine.~~

Changes

Bug fix (non-breaking change which fixes an issue)
Refactor (non-breaking change which refactors the code base)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist

My code follows the pre-commit style and check guidelines of this project.
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing tests pass locally with my changes
I have added a summary of my changes to the CHANGELOG (not for minor changes, docs and tests).

* move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add base classes * update docstring * fix imports * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * fix typo Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com>

* move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add base classes * update docstring * fix imports * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add pedestrian and avenue datasets and video utils * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * add basic visualization for video datasets * simplify ucsdped implementation * add ucsd and avenue to __all__ * add default value for task * add tests for ucsd and avenue * add tests for video dataset and utils * add download info for avenue dataset * add download info for ucsd pedestrian dataset * more consistent naming * fix path to masks folder in gt dir * pass original image in batch to facilitate visualization * convert mask files for avenue * suppress warning due to torchvision bug * fix bug in avenue masks * store visualizations for each video in separate folder * rename parameters * add warning for clip_length > 1 * fix dataset tests * fix labels tensor shape bug * add pyav to requirements * add description for avenue dataset * use pathlib * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * Update anomalib/data/utils/video.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * Update anomalib/data/ucsd_ped.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * import video dataset from base * fix bug when collecting ucsd samples * clean up datamodules tests * fix tests * remove redundant test cases * retrieve masks as numpy array * use pathlib * variable name * pathlib * use preprocesser from arguments * fix indexing bug Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> Co-authored-by: Samet Akcay <samet.akcay@intel.com>

* make val split ratio configurable * use DeprecationWarning, update config key

* add basic support for detection task * use enum for task type * formatting * small bugfix * add unit tests for bounding box conversion * update error message * use as_tensor * typing and docstring * explicit keyword arguments * simplify bbox handling in video dataset * docstring consistency * add missing licenses * add whitespace for readability * add missing license * Update anomalib/data/utils/boxes.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * Revert "Update anomalib/data/utils/boxes.py" This reverts commit cec6138. * add test case for custom collate function * docstring * add integration tests for detection dataloading * extend and clean up datamodules tests * add detection task type to visualizer tests * only show pred_boxes during inference * add detection support for torch inference * add detection support for openvino inference * test inference for all task types * pylint Co-authored-by: Samet Akcay <samet.akcay@intel.com>

* update deprecation messages * raise warnings as DeprecationWarning

* mask -> mask_dir * properly handle absolute and relative paths * make root path parameter optional * formatting * path -> root * update docs * remove options hint for name parameter * refactor function * Update anomalib/config/config.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * Update anomalib/config/config.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> * make root and abnormal_dir optional * Update anomalib/data/folder.py Co-authored-by: Samet Akcay <samet.akcay@intel.com> Co-authored-by: Samet Akcay <samet.akcay@intel.com>

* move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add synthetic dataset class * move augmenter to data directory * add base classes * update docstring * use synthetic dataset in base datamodule * fix imports * clean up synthetic anomaly dataset implementation * fix mistake in augmenter * change default split ratio * remove accidentally added file * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * add logging message * use val_split_ratio for synthetic validation set * pathlib * make synthetic anomaly available for test set * update configs * add tests * simplify test set splitting logic * update docstring * add missing licence * split_normal_and_anomalous -> split_by_label * VideoAnomalib -> AnomalibVideo Co-authored-by: Joao P C Bertoldo <24547377+jpcbertoldo@users.noreply.github.com>

* properly handle NoneType mask_dir and add test case * fix wrong deprecation handling

* deprecate PreProcessor * update configs * update deprecation messages * update video dataset * update inference dataset * move transforms to data module * update and extend transform tests * fix cyclic import * add validity checks for image size and center crop * pass image size as tuple * update path to get_transforms * update error message * fix center crop tuple conversion * update inferencers * remove draem transform config * update changelog * fix cyclic import * add crop size vs image size check * improve readability * mypy * use enum to configure input normalization * update lightning inference * update inference dataset

handle empty box predictions

…e'` (#801) * enable none as split mode * use get to retrieve config keys * update deprecation message and config key

* apply pixel threshold to bbox detections * allow visualizing normal boxes * normalize box scores * fix bbox logic in base anomaly module * boxes_scores -> box_scores * fix inferencers

* infer box scores from anomaly maps * discard single pixel boxes * revert discard single pixel boxes * add test case for bbox scores * update torch inferencer * minor refactor

samet-akcay

Thanks!

ORippler · 2023-01-30T07:41:26Z

Thanks for integrating the dataset! Thoughts on adding a comparison of methods similar as done for MVTec in the Readme? Would be helpful for those in the community with little compute available :D

samet-akcay · 2023-01-30T10:03:53Z

Hi @ORippler, yeah, it's on our long to-do list. We hope to add it soon :)

djdameln and others added 25 commits October 31, 2022 11:26

Update lightning_inference.py

aac5a47

merge main

ab6cb57

Make val split ratio configurable (#760)

cb06714

* make val split ratio configurable * use DeprecationWarning, update config key

[Datamodules] Update deprecation messages (#764)

ccec2f6

* update deprecation messages * raise warnings as DeprecationWarning

merge main

8141f2f

Bugfixes for Datamodules feature branch (#800)

8601330

* properly handle NoneType mask_dir and add test case * fix wrong deprecation handling

[Datamodules] Fix bug in bbox score to image score conversion (#803)

57d3b4e

handle empty box predictions

Improve handling of test_split_mode='none' and `val_split_mode='non…

192ba94

…e'` (#801) * enable none as split mode * use get to retrieve config keys * update deprecation message and config key

fix to float transform

b21f12c

Detection improvements (#820)

ced7bc9

* apply pixel threshold to bbox detections * allow visualizing normal boxes * normalize box scores * fix bbox logic in base anomaly module * boxes_scores -> box_scores * fix inferencers

merge main

690cb1b

update changelog

4cf8577

update csflow config to new format

89661ba

remove unused imports

9114c7d

line length

1c903f4

refactor make_mvtec_dataset to improve flexibility

b02f2cc

add visa dataset

997c19c

move download and extract functionality to shared location

dbf5a69

move visa subset splitting to separate method

48514cc

djdameln requested review from ashwinvaidya17 and samet-akcay December 30, 2022 15:57

github-actions bot added Config Data labels Dec 30, 2022

update changelog

a6326fc

djdameln added 2 commits January 4, 2023 23:00

cleaner zipfile import

dafe94a

address PR comments

415b4c0

djdameln requested a review from samet-akcay January 4, 2023 22:21

djdameln added 19 commits January 5, 2023 08:09

use tuple instead of list

e457b5e

add missing params to dosctring

68ef76b

add missing licence information

1d23942

COLS -> COLUMNS

d2fda44

typing and variable naming

0139d62

remove duplicate parameter in docstring

8914fa1

im_dir -> image_dir

6b2dcc5

typing and docstring

2d24ed5

typing

6d39434

ValSplitMode -> ValidationSplitMode

6e3816f

add missing licence

fad21b1

rename variable

8df22c6

remove empty comment

ced0342

remove unused class attribute

96f9b5e

[Detection] Compute box score when generating boxes from masks (#828)

0904529

* infer box scores from anomaly maps * discard single pixel boxes * revert discard single pixel boxes * add test case for bbox scores * update torch inferencer * minor refactor

revert val_split_mode -> validation_split_mode

a67c21b

merge feature branch

25ab974

use empty string instead of nan as empty mask path

2b63347

typing

83b152d

samet-akcay approved these changes Jan 6, 2023

View reviewed changes

Base automatically changed from feature/datamodules to main January 6, 2023 15:45

djdameln added 2 commits January 6, 2023 16:51

Merge branch 'main' into feature/datamodules

1d5e2ba

Merge branch 'feature/datamodules' into da/visa-dataset

d4234b0

djdameln merged commit b06288a into main Jan 6, 2023

djdameln deleted the da/visa-dataset branch January 6, 2023 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dataset] Add VisA dataset #824

[Dataset] Add VisA dataset #824

djdameln commented Dec 30, 2022 •

edited

Loading

samet-akcay left a comment

ORippler commented Jan 30, 2023 •

edited

Loading

samet-akcay commented Jan 30, 2023

[Dataset] Add VisA dataset #824

[Dataset] Add VisA dataset #824

Conversation

djdameln commented Dec 30, 2022 • edited Loading

Description

Known Issues

Changes

Checklist

samet-akcay left a comment

Choose a reason for hiding this comment

ORippler commented Jan 30, 2023 • edited Loading

samet-akcay commented Jan 30, 2023

djdameln commented Dec 30, 2022 •

edited

Loading

ORippler commented Jan 30, 2023 •

edited

Loading