Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update classifier container to allow pipeline to go straight to inference #280

Merged
merged 19 commits into from
Nov 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions docs/source/User-guide/Classify/Classify.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Classify
=========

.. note:: Run these commands in a Jupyter notebook (or other IDE), ensuring you are in your `mapreader` python environment.

.. note:: You will need to update file paths to reflect your own machines directory structure.

MapReader's ``Classify`` subpackage is used to:

- Train or fine-tune a classifier on annotated patches.
- Use a classifier to infer/predict the labels of unannotated patches.

This is all done within MapReader's ``ClassifierContainer()`` class, which is used to:

- Load models (classifiers).
- Define a labels map.
- Load datasets and dataloaders.
- Define a criterion (loss function), optimizer and scheduler.
- Train and evaluate models using already annotated images.
- Predict labels of unannotated images (model inference).
- Visualize datasets and predictions.

If you already have a fine-tuned model, you can skip to the `Infer labels using a fine-tuned model <https://mapreader.readthedocs.io/en/latest/Classify/Infer.html>`_ page.

If not, you should proceed to the `Train/fine-tune a classifier <https://mapreader.readthedocs.io/en/latest/Classify/Train.html>`_ page.

.. toctree::
:maxdepth: 1

Train
Infer

172 changes: 172 additions & 0 deletions docs/source/User-guide/Classify/Infer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
Infer using a fine-tuned model
==================================

You can use any classifier (model) to predict labels on unannotated patches.

Initialize ``ClassifierContainer()``
-------------------------------------

To initialize your ``ClassifierContainer()`` for inference, you will need to define:

- ``model`` - The model (classifier) you would like to use.
- ``labels_map`` - A dictionary mapping your labels to their indices (e.g. ``{0: "no_rail_space", 1: "rail_space"}``). This labels map should be the same as that used when training/fine-tuning the classifier.

There are a number of options for the ``model`` argument:

**1. To load a locally-saved model, use ``torch.load()`` to load your file and then pass this as the ``model`` argument.**

If you have already trained a model using MapReader, your outputs, by default, should be saved in directory called ``models``.
Within this directory will be ``checkpoint_X.pkl`` and ``model_checkpoint_X.pkl`` files.
Your models are saved in the ``model_checkpoint_X.pkl`` files.

e.g. To load one of these files:

.. code-block:: python

#EXAMPLE
import torch

my_model = torch.load("./models/model_checkpoint_6.pkl")
labels_map = {0: "no_rail_space", 1: "rail_space"}

my_classifier = ClassifierContainer(my_model, labels_map)

.. admonition:: Advanced usage
:class: dropdown

The ``checkpoint_X.pkl`` files contain all the information, except for your models (which is saved in the ``model_checkpoint_X.pkl`` files), you had previously loaded in to your ``ClassifierContainer()``.
If you have already trained a model using MapReader, you can use these files to reload your previously used ``ClassifierContainer()``.

To do this, set the ``model``, ``dataloaders`` and ``label_map`` arguments to ``None`` and pass ``load_path="./models/your_checkpoint_file.pkl"`` when initializing your ``ClassifierContainer()``:

.. code-block:: python

#EXAMPLE
my_classifier = ClassifierContainer(None, None, None, load_path="./models/checkpoint_6.pkl")

This will also load the corresponding model file (in this case "./models/model_checkpoint_6.pkl").

If you use this option, your optimizer, scheduler and criterion will be loaded from last time.

**2. To load a** `hugging face model <https://huggingface.co/models>`__\ **, choose your model, follow the "Use in Transformers" or "Use in timm" instructions to load it and then pass this as the ``model`` argument.**

e.g. `This model <https://huggingface.co/davanstrien/autotrain-mapreader-5000-40830105612>`__ is based on our `*gold standard* dataset <https://huggingface.co/datasets/Livingwithmachines/MapReader_Data_SIGSPATIAL_2022>`__.
It can be loaded using the `transformers <https://github.com/huggingface/transformers>`__ library:

.. code-block:: python

#EXAMPLE
from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("davanstrien/autotrain-mapreader-5000-40830105612")
my_model = AutoModelForImageClassification.from_pretrained("davanstrien/autotrain-mapreader-5000-40830105612")
labels_map = {0: "no_rail_space", 1: "rail_space"}

my_classifier = ClassifierContainer(my_model, labels_map)

.. note:: You will need to install the `transformers <https://github.com/huggingface/transformers>`__ library to do this (``pip install transformers``).

e.g. `This model <https://huggingface.co/timm/resnest101e.in1k>`__ is an example of one which uses the `timm <https://huggingface.co/docs/timm/index>`__ library.
It can be loaded as follows:

.. code-block:: python

#EXAMPLE
import timm

my_model = timm.create_model("hf_hub:timm/resnest101e.in1k", pretrained=True, num_classes=len(annotated_images.labels_map))

my_classifier = ClassifierContainer(my_model, annotated_images.labels_map, dataloaders)

.. note:: You will need to install the `timm <https://huggingface.co/docs/timm/index>`__ library to do this (``pip install timm``).

Create dataset and add to ``my_classifier``
---------------------------------------------

You will then need to create a new dataset containing your unannotated patches.
This can be done by loading a dataframe containing the paths to your patches:

.. code-block:: python

from mapreader import PatchDataset

infer = PatchDataset("./patch_df.csv", delimiter="\t", transform="test")

.. note:: You can create this ``.csv`` file using the ``.convert_image(save=True)`` method on your ``MapImages`` object (follow instructions in the `Load <https://mapreader.readthedocs.io/en/latest/User-guide/Load.html>`__ user guidance).

The ``transform`` argument is used to specify which `image transforms <https://pytorch.org/vision/stable/transforms.html>`__ to use on your patch images.
See :ref:`this section<transforms>` for more information on transforms.

You should then add this dataset to your ``ClassifierContainer()`` (``my_classifier``\):

.. code-block:: python

my_classifier.load_dataset(infer, set_name="infer")

This will create a ``DataLoader`` from your dataset and add it to your ``ClassifierContainer()``\'s ``dataloaders`` attribute.

By default, the ``.load_dataset()`` method will create a dataloader with batch size of 16 and will not use a sampler.
You can change these by specifying the ``batch_size`` and ``sampler`` arguments respectively.
See :ref:`this section<sampler>` for more information on samplers.

Infer
------

After loading your dataset, you can then simply run the ``.inference()`` method to infer the labels on the patches in your dataset:

.. code-block:: python

my_classifier.inference(set_name="infer")

As with the "test" dataset, to see a sample of your predictions, use:

.. code-block:: python

my_classifier.show_inference_sample_results(label="rail_space", set_name="infer")

Add predictions to metadata and save
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To add your predictions to your patch metadata (saved in ``patch_df.csv``), you will need to add your predictions and confidence values to your ``infer`` dataset's dataframe.

This dataframe is saved as the datasets ``patch_df`` attribute.
To view it, use:

.. code-block:: python

infer.patch_df

To add your predictions and confidence values to this dataframe use:

.. code-block:: python

import numpy as np

infer.patch_df['predicted_label'] = my_classifier.pred_label
infer.patch_df['pred'] = my_classifier.pred_label_indices
infer.patch_df['conf'] = np.array(my_classifier.pred_conf).max(axis=1)

If you view your dataframe again (by running ``infer.patch_df`` as above), you will see your predictions and confidence values have been added as columns.

From here, you can either save your results using:

.. code-block:: python

infer.patch_df.to_csv("predictions_patch_df.csv", sep="\t")

Or, you can use the ``MapImages`` object to create some visualizations of your results:

.. code-block:: python

from mapreader import load_patches

my_maps = load_patches(patch_paths = "./path/to/patches/*png", parent_paths="./path/to/parents/*png")

infer_df = infer.patch_df.reset_index(names="image_id") # ensure image_id is one of the columns
my_maps.add_metadata(infer_df, tree_level='patch') # add dataframe as metadata
my_maps.add_shape()

parent_list = my_maps.list_parents()
my_maps.show_parent(parent_list[0], column_to_plot="conf", vmin=0, vmax=1, alpha=0.5, patch_border=False)

Refer to the `Load <https://mapreader.readthedocs.io/en/latest/User-guide/Load.html>`__ user guidance for further details on how these methods work.
Original file line number Diff line number Diff line change
@@ -1,11 +1,5 @@
Classify
=========

.. note:: Run these commands in a Jupyter notebook (or other IDE), ensuring you are in your `mapreader` python environment.

.. note:: You will need to update file paths to reflect your own machines directory structure.

MapReader's ``Classify`` subpackage is used to train or fine-tune a CV (computer vision) model and use it to predict the labels of patches.
Train/fine-tune a classifier
==============================

If you are new to computer vision/ machine learning, `see this tutorial for details on fine-tuning torchvision models <https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html>`__.
This will help you get to grips with the basic steps needed to train/fine-tune a model.
Expand Down Expand Up @@ -57,6 +51,8 @@ To see how your labels map to their label indices, call the ``annotated_images.l

annotated_images.labels_map

.. note:: This ``labels_map`` will be needed later.

To view a sample of your annotated images use the ``show_sample()`` method.
The ``label_to_show`` argument specifies which label you would like to show.

Expand All @@ -69,7 +65,7 @@ For example, to show your "rail_space" label:

.. todo:: update this pic

.. image:: ../figures/show_image_labels_10.png
.. image:: ../../figures/show_image_labels_10.png
:width: 400px


Expand All @@ -82,7 +78,7 @@ The ``.review_labels()`` method, which returns an interactive tool for adjusting

annotated_images.review_labels()

.. image:: ../figures/review_labels.png
.. image:: ../../figures/review_labels.png
:width: 400px


Expand Down Expand Up @@ -220,35 +216,22 @@ Train
Initialize ``ClassifierContainer()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MapReader's ``ClassifierContainer()`` class is used to:

- Load models.
- Load dataloaders and labels map.
- Define a criterion (loss function), optimizer and scheduler.
- Train and evaluate models using already annotated images.
- Predict labels of un-annotated images (model inference).
- Visualize datasets and predictions.

You can initialize a ``ClassifierContainer()`` object (``my_classifier``) using:

.. code-block:: python

from mapreader import ClassifierContainer

my_classifier = ClassifierContainer(model, dataloaders, labels_map)
To initialize your ``ClassifierContainer()`` for training, you will need to define:

Your dataloaders and labels map (``annotated_images.labels_map``) should be passed as the ``dataloaders`` and ``labels_map`` arguments respectively.
- ``model`` - The model (classifier) you would like to train.
- ``labels_map`` - A dictionary mapping your labels to their indices (e.g. ``{0: "no_rail_space", 1: "rail_space"}``). If you have loaded annotations using the method above, you can find your labels map at ``annotated_images.labels_map``.
- ``dataloaders`` - The dataloaders containing your train, test and val datasets.

There are a number of options for the ``model`` argument:

**1. To load a model from** `torchvision.models <https://pytorch.org/vision/stable/models.html>`__\ **, pass one of the model names as the ``model`` argument.**

e.g. To load "resnet18":
e.g. To load "resnet18", pass ``"resnet18"`` as the model argument:

.. code-block:: python

#EXAMPLE
my_classifier = ClassifierContainer("resnet18", dataloaders, annotated_images.labels_map)
my_classifier = ClassifierContainer("resnet18", annotated_images.labels_map, dataloaders)

By default, this will load a pretrained form of the model and reshape the last layer to output the same number of nodes as labels in your dataset.
You can load an untrained model by specifying ``pretrained=False``.
Expand All @@ -269,7 +252,7 @@ There are a number of options for the ``model`` argument:
num_input_features = my_model.fc.in_features
my_model.fc = nn.Linear(num_input_features, len(annotated_images.labels_map))

my_classifier = ClassifierContainer(my_model, dataloaders, annotated_images.labels_map)
my_classifier = ClassifierContainer(my_model, annotated_images.labels_map, dataloaders)

This is equivalent to passing ``model="resnet18"`` (as above) but further customizations are, of course, possible.
See `here <https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html>`__ for more details of how to do this.
Expand All @@ -289,7 +272,7 @@ There are a number of options for the ``model`` argument:

my_model = torch.load("./models/model_checkpoint_6.pkl")

my_classifier = ClassifierContainer(my_model, dataloaders, annotated_images.labels_map)
my_classifier = ClassifierContainer(my_model, annotated_images.labels_map, dataloaders)

.. admonition:: Advanced usage
:class: dropdown
Expand Down Expand Up @@ -321,7 +304,7 @@ There are a number of options for the ``model`` argument:
extractor = AutoFeatureExtractor.from_pretrained("davanstrien/autotrain-mapreader-5000-40830105612")
my_model = AutoModelForImageClassification.from_pretrained("davanstrien/autotrain-mapreader-5000-40830105612")

my_classifier = ClassifierContainer(my_model, dataloaders, annotated_images.labels_map)
my_classifier = ClassifierContainer(my_model, annotated_images.labels_map, dataloaders)

.. note:: You will need to install the `transformers <https://github.com/huggingface/transformers>`__ library to do this (``pip install transformers``).

Expand All @@ -335,7 +318,7 @@ There are a number of options for the ``model`` argument:

my_model = timm.create_model("hf_hub:timm/resnest101e.in1k", pretrained=True, num_classes=len(annotated_images.labels_map))

my_classifier = ClassifierContainer(my_model, dataloaders, annotated_images.labels_map)
my_classifier = ClassifierContainer(my_model, annotated_images.labels_map, dataloaders)

.. note:: You will need to install the `timm <https://huggingface.co/docs/timm/index>`__ library to do this (``pip install timm``).

Expand Down Expand Up @@ -492,7 +475,7 @@ e.g. to plot the loss during each epoch of training and validation:
legends=["Train", "Valid"],
)

.. image:: ../figures/loss.png
.. image:: ../../figures/loss.png
:width: 400px


Expand All @@ -512,7 +495,7 @@ To see a sample of your predictions, use:

my_classifier.show_inference_sample_results(label="rail_space")

.. image:: ../figures/inference_sample_results.png
.. image:: ../../figures/inference_sample_results.png
:width: 400px


Expand Down Expand Up @@ -585,7 +568,7 @@ This will save your ``ClassifierContainer()`` as ``classifier.pkl`` and your mod
Infer (predict)
----------------

Once you are happy with your model's predictions, you can then use it to predict labels on the rest of your (un-annotated) patches.
Once you are happy with your model's predictions, you can then use it to predict labels on the rest of your (unannotated) patches.

To do this, you will need to create a new dataset containing your patches:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/User-guide/User-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ Please read this User Guide **before** looking through the worked examples.
Download
Load
Annotate
Classify
Classify/Classify
Post-process
7 changes: 4 additions & 3 deletions docs/source/Worked-examples/mnist_pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -483,9 +483,10 @@
}
],
"source": [
"my_classifier = ClassifierContainer(\n",
" model=\"resnet18\", dataloaders=dataloaders, labels_map={0: \"3\", 1: \"1\"}\n",
")"
"my_classifier = ClassifierContainer(model=\"resnet18\", \n",
" labels_map={0: \"3\", 1: \"1\"},\n",
" dataloaders=dataloaders\n",
" )"
]
},
{
Expand Down
7 changes: 4 additions & 3 deletions docs/source/Worked-examples/one_inch_pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1640,9 +1640,10 @@
}
],
"source": [
"my_classifier = ClassifierContainer(\n",
" model=\"resnet18\", dataloaders=dataloaders, labels_map={0: \"No\", 1: \"rail space\"}\n",
")"
"my_classifier = ClassifierContainer(model =\"resnet18\",\n",
" labels_map={0: 'No', 1: 'rail space'},\n",
" dataloaders=dataloaders\n",
" )"
]
},
{
Expand Down
Loading
Loading