Introduce additional active learning sampling strategies #148

PawelPeczek-Roboflow · 2023-11-03T14:32:48Z

Description

On top of previously added Active Learning capabilities, we are adding 3 new sampling strategies for Active Learning:

Close to threshold sampling

Description

Sampling method to be used when one wants to sample datapoints causing certain predictions confidences for certain classes. Works for both detection and classification models - although the behaviour slightly differ.

Degrees of freedom in configuration

selected_class_names - class names to be taken into consideration while sampling - optional (if not given - all classes can be sampled)
threshold and epsilon - represent centre and radius for confidence range that can trigger sampling - for instance - if one is interested to get datapoints that classifier is highly confident (0.8, 1.0) -> threshold=0.9, epsilon=0.1 - this is however limited to the outcomes of model post-processing (and threshold filtering)
probability - fraction of datapoints that matches sampling criterions that will be persisted
minimum_objects_close_to_threshold - (for detection predictions only) - specify how many detected objects from selected classes must be close to threshold to accept datapoint
only_top_classes - (for classification predictions only) - flag to decide if only top, or predicted_classes (for multi-class / multi-label cases respectively) classes should be taken into consideration (to avoid sampling based on non-leading classes in predictions)

Example configuration

{
            "name": "hard_examples",
            "type": "close_to_threshold",
            "selected_class_names": ["a", "b"],
            "threshold": 0.25,
            "epsilon": 0.1,
            "probability": 0.5,
            "tags": ["b"],
            "limits": [
                {"type": "minutely", "value": 10},
                {"type": "hourly", "value": 100},
                {"type": "daily", "value": 1000},
            ],
        },

Classes based sampling (for classification)

Description

Sampling method to be used when one wants to sample specific classes from classifier predictions

Degrees of freedom in configuration

selected_class_names - class names to be taken into consideration while sampling - required
probability - fraction of datapoints that matches sampling criterions that will be persisted

Example configuration

 {
                "name": "underrepresented_classes",
                "type": "classes_based",
                "selected_class_names": ["cat"],
                "probability": 1.0,
                "tags": ["hard-classes"],
                "limits": [
                    {"type": "minutely", "value": 10},
                    {"type": "hourly", "value": 100},
                    {"type": "daily", "value": 1000},
                ],
            },

Detection number based sampling (for detection)

Description

Sampling method to be used when one wants to sample specific detections (based on count and classes)

Degrees of freedom in configuration

selected_class_names - class names to be taken into consideration while sampling - optional (if not given - all classes can be sampled)
probability - fraction of datapoints that matches sampling criterions that will be persisted
more_than - minimal number of detected objects (optional - if not given - lower limit is not applied)
less_than - maximum number of detected objects (optional - if not given - upper limit is not applied)
one of more_than, less_than must be given

Example configuration

{
                "name": "multiple_detections",
                "type": "detections_number_based",
                "probability": 0.2,
                "more_than": 3,
                "tags": ["crowded"],
                "limits": [
                    {"type": "minutely", "value": 10},
                    {"type": "hourly", "value": 100},
                    {"type": "daily", "value": 1000},
                ],
            },

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update (deferred waiting for AL intreoduction)

How has this change been tested, please provide a testcase or example of how you tested the change?

automated tests added
tested e2e locally

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

Docs updated? What were the changes:

paulguerrie

Looks great! This is going to be really powerful for users!!

PawelPeczek-Roboflow added 15 commits November 2, 2023 17:04

Add sample close to threshold strategy

8a33c3a

Push sampling logic into separate package

400d3c9

Add sampling based on class names

2b363a4

Add sampling based on number of detections

9d988d4

Add automatic initialisation for new sampling strategies

8ad7966

Add basic test for close-to-threshold sampling

e3a3f2d

Add tests for close_to_threshold sampling module

e90734e

Simplify tests

a16d353

Simplify tests

f821db0

Add tests for classes based sampling

dfdc2ac

Add tests for detections number based sampling

f0f0b91

Update tests for configuration

ea1f173

Added dummy api response for new strategies

5135b95

Added dummy api response for new strategies

9f39787

Added dummy api response for new strategies

6285646

PawelPeczek-Roboflow requested review from paulguerrie and probicheaux November 3, 2023 14:32

paulguerrie approved these changes Nov 3, 2023

View reviewed changes

PawelPeczek-Roboflow merged commit a256262 into main Nov 3, 2023
2 checks passed

PawelPeczek-Roboflow deleted the feature/introduce_additional_active_learning_sampling_strategies branch November 3, 2023 15:42

PawelPeczek-Roboflow restored the feature/introduce_additional_active_learning_sampling_strategies branch November 3, 2023 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce additional active learning sampling strategies #148

Introduce additional active learning sampling strategies #148

PawelPeczek-Roboflow commented Nov 3, 2023

paulguerrie left a comment

Introduce additional active learning sampling strategies #148

Introduce additional active learning sampling strategies #148

Conversation

PawelPeczek-Roboflow commented Nov 3, 2023

Description

Close to threshold sampling

Description

Example configuration

Classes based sampling (for classification)

Description

Example configuration

Detection number based sampling (for detection)

Description

Example configuration

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

paulguerrie left a comment

Choose a reason for hiding this comment