Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Metrics Paramater in training loops #93

Open
oke-aditya opened this issue Dec 15, 2020 · 6 comments
Open

[RFC] Metrics Paramater in training loops #93

oke-aditya opened this issue Dec 15, 2020 · 6 comments
Labels
enhancement New feature or request feature A new feature request Medium Priority Should be addressed in few days

Comments

@oke-aditya
Copy link
Owner

oke-aditya commented Dec 15, 2020

🚀 Feature

Slightly important as engines are concerned here 😅
Metrics parameter in training.

Motivation

What currently we do in train_step, val_step or fit API returns a dictionary by calculating metrics from torchvision and return metrics (but not predictions).
This hardcodes the API and is trouble.
Metrics param which will open up this API.
Currently train_step will always return metrics, and not return results/model outputs.
Let's add a param metrics which will work as follows.

Pitch

For Detection models in PyTorch Trainers, the Param would work as follows.

metrics: str = None
if metrics is None:
    # Just output what model does and return it.
    # Now people can manually calculate and play around as they like 
    # We do not do PostProcessing, this can be used by the user which we provide. 
elif metrics == "native":
    # Use the torchvision native metrics which we have now.
    # We can expect these to improve over time or adopt lightning metrics here.

## Maybe if we don't get this dependency
elif metrics = "coco":
    # Calculate COCO API metrics (we do the PostProcessing if needed)

Since PyTorch Lightning API is completely decoupled from PyTorch. People can already subclass it and use PL Metrics.
E.g.

class CustomTraining(lit_frcnn):
        def validation_step(self, batch, batch_idx):
            images, targets = batch
            targets = [{k: v for k, v in t.items()} for t in targets]

            out = self.model(images) 
            # Do whatever metrics you like !

Metrics with lIghtning is easer, we can use PL metrics and calculate these on fly with a param in Trainer
Not sure about this but possible easily.

Alternatives

This is open to discussion and really a neat solution can be a major feature upgrade.

Additional context

This will solve lot of issues like #62 ,
Be able to use COCO API for detection. We should not have this dependency. So COCO is still a little confusion,
with metrics None, people can postprocess (which we provide) and use COCO API.

cc @zlapp @zhiqwang @hassiahk

@oke-aditya oke-aditya added the enhancement New feature or request label Dec 15, 2020
@oke-aditya oke-aditya added feature A new feature request Medium Priority Should be addressed in few days labels Dec 15, 2020
@oke-aditya oke-aditya pinned this issue Dec 15, 2020
@zhiqwang
Copy link
Contributor

zhiqwang commented Dec 15, 2020

Metrics is a little abstract, maybe we could also integrate wandb or tensorboard to quickvision to show the performance of training strategy

@oke-aditya
Copy link
Owner Author

oke-aditya commented Dec 15, 2020

People can use wandb / tensorboard. E.g Wandb notebook out of box with current metrics.

We probably should avoid wandb dependency. Again PyTorch Lightning provides these loggers out of box and current API flexible with PyTorch too.

By avoiding integrating wandb or such loggers, people can manually log whatever, whenever, wherever they like.

@oke-aditya
Copy link
Owner Author

oke-aditya commented Dec 16, 2020

After a few discussions over slack with @zlapp and PyTorch Ligthninig Team
Lightning will have Chainable metrics with 1.2. Here is Issue for it
We cannot pass trainer flags for lightning and might not be possible soon.

Should we start this refactor. it would mean updating tests a bit as well as docs.
Since now the output is controlled by metrics param. We should probably have consistency within API
Let's think over it.

@oke-aditya
Copy link
Owner Author

oke-aditya commented Dec 21, 2020

So here is API proposal for metrics. Let's not tie Pycoco Tools with Quickvision.

This is for extra param with train_step and val_step

metrics: bool = False
if metrics is False:
    # Just output what model does and return it.
    # Now people can manually calculate and play around as they like 
    # We do not do PostProcessing, this can be used by the user which we provide. 
 
   # Preds cotaining the prediction that are done for train data.
   # For detection models from torchvision we can simply return loss in train and boxes in eval.
    # preds will contain output of model. E.g out tensor in classification, or detr pred logits and boxes.
    # Possibly it will be a nested dictionary with consistent names.

   return {"preds " : }

else:
    # Use the torchvision native metrics which we have now.
    # We can expect these to improve over time or adopt lightning metrics here.
    # metrics key will contain whatever metrics we return right now. 
   # here too preds is same as above for consistency.
    return {"preds: " [] , "metrics": {}}

@oke-aditya
Copy link
Owner Author

@zlapp to use Pycoco tools or any other metrics is will be very simple.

You can just use

train_results = model.train_step(metrics=False)

boxes = train_result[preds][boxes]

train_results will be a dict. It's values will be a torch.Tensor no abstractions here.
Now you can post process or do whatever you like and use PyCOCO tools, etc.

Some conssitency which we will have

  1. Every model still returns a dict, it contains preds or preds + metrics
  2. What will be inside preds is something specific to model, say classfication / detection. We will have some consistency in it though, but not 100%. Reason being this is what model outputs are and really it would not change. We don't want to hide these outputs from models.
  3. This allows to use custom metrics, use PyCOCO tools, or even lightning metrics. We are returning torch tensors, wrapped in dictionary, (which is what even DETR or FRCNN do).

Another alternative is that we do not compute metrics. I'm not against it or for it, but we provide utilites to compute them.
I think that too can be the way.

@oke-aditya
Copy link
Owner Author

Ok so we can now use torchmetrics It has absolutely no requirements dependency so we can easily add it.

Also, we will use it to calculate metrics. In case any metric is not available we will keep it in quickvision.metrics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature A new feature request Medium Priority Should be addressed in few days
Projects
None yet
Development

No branches or pull requests

2 participants