ENH: `eval` should have option compute metrics with and without clean-ups #472

NickleDave · 2022-03-15T17:31:07Z

as in paper.

Should add same options as in predict config and then adapt from this script:
https://github.com/yardencsGitHub/tweetynet/blob/master/article/src/scripts/run_eval_with_and_without_output_transforms.py

NickleDave · 2022-07-29T00:15:37Z

Note that #537 blocks this.
Should also address #471 at the same time

NickleDave · 2023-01-22T17:00:48Z

One issue with implementing this is that, to convert network outputs to a string of labels in the Model._eval method, we actually use a different function from what we use inside core.predict to convert outputs to annotated segments.

In Model._eval we use vak.labeled_timebins.lbl_tb2labels, and in core.predict we use vak.labeled_timebins.lbl_tb2segments.

The function we use in eval turns the outputs into strings in a different way that prevents us from using the transforms we use in core.predict. It maps from a vector of labeled timebins to a string of labels by first removing the "unlabeled" / "background" class and then taking the first character every time the character changes/
.g. '000001111000222000' -> '1111222' -> '12'.
So it avoids segmenting the output, and the segments are actually what we use to do post-processing.

A fix for this is just to do the segmenting in eval, i.e. to use lbl_tb2segments instead.
This actually is good because we will then get back onset and offset times, giving us access to these for metrics.
It also means we're more sure we're evaluating on the same thing we use for predictions, which also seems like a good thing.

That can work, but there's other behavior of the function we use in eval that we need to preserve.

For predictions, we need to be able to map from network outputs (argmax(y_pred)) to labels used by an annotator that may have multiple characters, e.g. {0: 'aw', 1: 'br', 2: 'ac', ...}.

But for evaluation, we don't want to do this, since we're computing an edit distance, and we don't want to get unfairly penalized because of the extra characters in multi-character labels. If we used multi-character labels, there could be a greater difference between some edits than others, e.g., ("br" -> "aw") would require more edits than ("ac" -> "aw") (since there's only character changed in the latter).

So we should implement as follows:

as in ENH: Convert post-processing to Transform classes that can be specified in config #537, refactor labeled_timebins module:
- move to transforms.labeled_timebins sub-package with a functional module, and then add a _classes module as well (import objects from both into transforms.labeled_timebins.__all__).
- also move the function vak.labeled_timebins._multi_char_labels_to_single_char into vak.labels (make the function "unprivate").
add attribute post_tfm to vak.models.engine.Model and use it inside Model._eval
replace use of lbl_tb2labels in Model._eval with lbl_tb2segments (but as a class vak.transforms.labeled_timebins.ToSegments)
- but make sure we convert labelmap to a dict with single char labels first
- We will use it inside core.eval to ensure we are computing edit distances without unfailr penalizing ourselves

labelmap = vak.labels.multi_char_labels_to_single_char(labelmap)
post_tfm = vak.transforms.labeled_timebins.ToSegments(labelmap=labelmap)

NickleDave · 2023-01-29T22:44:33Z

I got as far as converting labeled_timebins to transforms and started to add post_tfm to Model but now I see an issue that I missed.

To be able to just call the same function we use in core.predict we actually need the time bins vector itself from the spectrogram.

There's a couple of ways to work around this:

Don't call the same function (or the new class that represents it as a callable transform); make another transform that's equivalent to what we currently do inside eval, but with the ability to apply post-processing

add function labeled_timebins.functional.to_labels_with_postprocessing that does everything up to the last code block of to_segments_with_postprocessing. It can just make a dummy timebins given the labeled timebins. We will still need a timebin duration, but we can make that an argument to the function that becomes an init arg of the corresponding transform, then get its value inside core.eval and pass it into transform when we instamtiate it

Change what we return so that we have access to the time bins vector t in a batch when we call eval

modify VocalDataset somehow so we have an option to specify whether we want to include t in the batch
get t and then use it, pass it into to_segments_with_postprocessing
the problem is that this requires some refactoring around use of VocalDataset to specify when we do or do not include 't'; it would be computationally wasteful to always include it in batches when we don't use it during training
but we might want it in training later
and we actually do want it for prediction since that would let us reafctor the predict function to not do a bunch of stuff in the function that should be happening inside the model.predict method

1 is the quick and easy way to get things done with this version.
2 starts to feel like a whole new branch. I had already planned to refactor the ItemTransforms and datasets anyway.
But would rather do that in a separate feature after we rebase version 1.0 on main after merging the current branch in where I'm developing what's described here.

NickleDave · 2023-02-03T20:08:52Z

Update on this one:
Question for version 0.x is how to add this.
Doesn't matter too much since there's really only one model but version 1.x will need to be flexible.
So it's more a question of which way of adding it requires the least amount of adding parameters / refactoring but also doesn't have any unintended consequences?

We need to add a post_tfm attribute to the vak.engine.Model class.
It's really only used by the _eval method.
Should it be a required arg?
It will be required in the sense that we are going to always apply this transform to go from labeled timebins to string labels.

It will always be an instance of vak.transforms.labeled_timebins.ToLabelsWithPostprocessing.
We can't easily have a default of this class e.g. that gets instantiated inside the Model.__init__ because it has its own required __init__ args that we don't have access to there, namely timebin_dur and labelmap. Another option might be to on-the-fly make a default transform inside of _eval if there isn't one? But again we need the required args. The eval_data does have the labelmap attribute but I don't think it has the timebins.

In a way the transform we used before was convenient because it only requires one argument, the labeled timebins themselves.
One option would be to add the post_tfm attribute, but also keep this transform (I currently have it removed in the branch) and set it to the default. This way we preserve old behavior but give people options to use the new behavior. To get the new transform without any extra post-processing, you would specify the post_tfm_kwargs but set them like so: post_tfm_kwargs = {min_segment_dur = None, majority_vote = false}. 😠 this is one of those times where you want TOML to have a null but it doesn't. Right, I think the way to do this will be some sort of converter for this specific option.

Ok I will proceed like that. ☝️

NickleDave · 2023-02-05T18:28:16Z

I got this to a point where it's working.

There's an issue, though. We do the post-processing and then return the labels all in a single call to a single transform class; this means we don't have access to the transformed labeled timebins vector. That means we can't compute accuracy after applying the transformations.

In other words our approach isn't very functional, and it conflates two things: applying the post-processing to the labeled timebins, and converting labeled timebins to either labels (alone) or segments (with labels, onsets, and offsets). We really only want a post_tfm to do the former; it should be applied directly to the output of the network before we take the final step of converting to either labels or segments with labels, onsets, and offsets.

So in other words, the inside of _eval should look something like:

lbl_tb = self.network(x)

if self.post_tfm:
    lbl_tb = self.post_tfm(lbl_tb)

y_pred_labels = transforms.labeled_timebins.to_labels(lbl_tb.cpu().numpy())

Will need to refactor to achieve this.

- Add post_tfm_kwargs to config/eval.py - Add post_tfm_kwargs attribute to LearncurveConfig - Add 'post_tfm_kwargs' option to config/valid.toml - Add post_tfm_kwargs to LEARNCURVE section of vak/config/valid.toml - Add use of post_tfm eval in engine.Model - Add post_tfm_kwargs to core.eval and use with model - Add logic in core/eval.py to use post_tfm_kwargs to make post_tfm - Use multi_char_labels_to_single_char in core.eval, not in transforms, to make sure edit distance is computed correctl - Add post_tfm parameter to vak.models.from_model_config_map - Add parameter and put in docstring, - Pass argument into Model.from_config - Add post_tfm_kwargs to TeenyTweetyNet.from_config - Add post_tfm_kwargs to unit test in test_core/test_eval.py - Pass post_tfm_kwargs into core.eval in cli/eval.py - Add parameter post_tfm_kwargs to vak.core.learncurve function, pass into calls to core.eval - Pass post_tfm_kwargs into core.learncurve inside cli.learncurve

NickleDave added the ENH: enhancement enhancement; new feature or request label Mar 15, 2022

NickleDave self-assigned this Mar 15, 2022

NickleDave mentioned this issue Jul 8, 2022

ENH: Convert post-processing to Transform classes that can be specified in config #537

Closed

NickleDave mentioned this issue Jan 17, 2023

DEV: Version 1.0 to-do list #614

Open

44 tasks

NickleDave mentioned this issue Jan 29, 2023

ENH: Add ability to return time bins vector t in batches, use this for eval with post-processing #620

Open

2 tasks

NickleDave mentioned this issue Feb 6, 2023

ENH: Add ability to run eval with and without post-processing transforms #621

Merged

2 tasks

NickleDave closed this as completed in 5385128 Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: `eval` should have option compute metrics with and without clean-ups #472

ENH: `eval` should have option compute metrics with and without clean-ups #472

NickleDave commented Mar 15, 2022

NickleDave commented Jul 29, 2022

NickleDave commented Jan 22, 2023 •

edited

Loading

NickleDave commented Jan 29, 2023 •

edited

Loading

NickleDave commented Feb 3, 2023

NickleDave commented Feb 5, 2023 •

edited

Loading

ENH: eval should have option compute metrics with and without clean-ups #472

ENH: eval should have option compute metrics with and without clean-ups #472

Comments

NickleDave commented Mar 15, 2022

NickleDave commented Jul 29, 2022

NickleDave commented Jan 22, 2023 • edited Loading

NickleDave commented Jan 29, 2023 • edited Loading

NickleDave commented Feb 3, 2023

NickleDave commented Feb 5, 2023 • edited Loading

ENH: `eval` should have option compute metrics with and without clean-ups #472

ENH: `eval` should have option compute metrics with and without clean-ups #472

NickleDave commented Jan 22, 2023 •

edited

Loading

NickleDave commented Jan 29, 2023 •

edited

Loading

NickleDave commented Feb 5, 2023 •

edited

Loading