AUC metrics #3751

liliarose · 2021-06-25T20:14:36Z

Patch description

Adding AUC (Area under ROC curve) metrics as an option for eval_model

Testing steps

Check the other metric tests r okay ( pytest -v -k TestMetrics/ pytest -v -k TestMetric)
Check via test to make sure AUC added up correctly..... (pytest -v -k TestAggregators)
Check it works via eval_model for both classes with micro aggregation --> checking to make sure the agents' auc resetted. (parlai em -mf zoo:dialogue_safety/multi_turn/model -t internal:civil_bias_toxic_comment:civil_bias_toxicity,dialogue_safety -auc 4 -rf parlai_internal/reports/auc_multi_safety_civil_multi.json -ne 3000 -micro True)

Logs

metric	dialogue_safety	internal:civil_bias_toxic_comment:civil_bias_toxicity	all
class___notok___f1	0.82453	0.55437	0.68945
AUC___notok__	0.98469	0.92527	0.96373

Testing Step 2

parlai em -mf zoo:dialogue_safety/multi_turn/model -t internal:civil_bias_toxic_comment:civil_bias_toxicity -auc True -rf parlai_internal/reports/auc_multi_safety_civil_single.json -ne 3000

Results:

Testing Step 1

(conda_parlai) wzhang4343@devfair0173:~/ParlAI$ pytest -v -k TestAggregators
===================================================== test session starts =====================================================
platform linux -- Python 3.7.10, pytest-6.2.4, py-1.10.0, pluggy-1.0.0.dev0 -- /private/home/wzhang4343/.conda/envs/conda_parlai/bin/python
cachedir: .pytest_cache
rootdir: /private/home/wzhang4343/ParlAI, configfile: pytest.ini, testpaths: tests, parlai/tasks
plugins: requests-mock-1.9.2, regressions-2.2.0, hydra-core-1.0.6, datadir-1.3.1
collected 997 items / 990 deselected / 7 selected                                                                             

tests/test_metrics.py::TestAggregators::test_auc_metrics PASSED                                                         [ 14%]
tests/test_metrics.py::TestAggregators::test_classifier_metrics PASSED                                                  [ 28%]
tests/test_metrics.py::TestAggregators::test_macro_aggregation PASSED                                                   [ 42%]
tests/test_metrics.py::TestAggregators::test_micro_aggregation PASSED                                                   [ 57%]
tests/test_metrics.py::TestAggregators::test_time_metric PASSED                                                         [ 71%]
tests/test_metrics.py::TestAggregators::test_uneven_macro_aggrevation PASSED                                            [ 85%]
tests/test_metrics.py::TestAggregators::test_unnamed_aggregation PASSED                                                 [100%]

==================================================== slowest 10 durations =====================================================
0.01s call     tests/test_metrics.py::TestAggregators::test_auc_metrics

(9 durations < 0.005s hidden.  Use -vv to show these durations.)
======================================== 7 passed, 990 deselected, 2 warnings in 5.77s ========================================

Testing Step 0

(conda_parlai) wzhang4343@devfair0173:~/ParlAI$ pytest -v -k TestMetric
===================================================== test session starts =====================================================
platform linux -- Python 3.7.10, pytest-6.2.4, py-1.10.0, pluggy-1.0.0.dev0 -- /private/home/wzhang4343/.conda/envs/conda_parlai/bin/python
cachedir: .pytest_cache
rootdir: /private/home/wzhang4343/ParlAI, configfile: pytest.ini, testpaths: tests, parlai/tasks
plugins: requests-mock-1.9.2, regressions-2.2.0, hydra-core-1.0.6, datadir-1.3.1
collected 996 items / 984 deselected / 12 selected                                                                            

tests/test_metrics.py::TestMetric::test_average_metric_additions PASSED                                                 [  8%]
tests/test_metrics.py::TestMetric::test_average_metric_inputs PASSED                                                    [ 16%]
tests/test_metrics.py::TestMetric::test_fixedmetric PASSED                                                              [ 25%]
tests/test_metrics.py::TestMetric::test_macroaverage_additions PASSED                                                   [ 33%]
tests/test_metrics.py::TestMetric::test_sum_metric_additions PASSED                                                     [ 41%]
tests/test_metrics.py::TestMetric::test_sum_metric_inputs PASSED                                                        [ 50%]
tests/test_metrics.py::TestMetrics::test_largebuffer PASSED                                                             [ 58%]
tests/test_metrics.py::TestMetrics::test_multithreaded PASSED                                                           [ 66%]
tests/test_metrics.py::TestMetrics::test_recent PASSED                                                                  [ 75%]
tests/test_metrics.py::TestMetrics::test_shared PASSED                                                                  [ 83%]
tests/test_metrics.py::TestMetrics::test_simpleadd PASSED                                                               [ 91%]
tests/test_metrics.py::TestMetrics::test_verymultithreaded PASSED                                                       [100%]

==================================================== slowest 10 durations =====================================================
0.13s call     tests/test_metrics.py::TestMetrics::test_verymultithreaded
0.09s call     tests/test_metrics.py::TestMetrics::test_largebuffer

(8 durations < 0.005s hidden.  Use -vv to show these durations.)
======================================= 12 passed, 984 deselected, 2 warnings in 7.61s ========================================
(conda_parlai) wzhang4343@devfair0173:~/ParlAI$ pytest -v -k TestMetrics
===================================================== test session starts =====================================================
platform linux -- Python 3.7.10, pytest-6.2.4, py-1.10.0, pluggy-1.0.0.dev0 -- /private/home/wzhang4343/.conda/envs/conda_parlai/bin/python
cachedir: .pytest_cache
rootdir: /private/home/wzhang4343/ParlAI, configfile: pytest.ini, testpaths: tests, parlai/tasks
plugins: requests-mock-1.9.2, regressions-2.2.0, hydra-core-1.0.6, datadir-1.3.1
collected 996 items / 990 deselected / 6 selected                                                                             

tests/test_metrics.py::TestMetrics::test_largebuffer PASSED                                                             [ 16%]
tests/test_metrics.py::TestMetrics::test_multithreaded PASSED                                                           [ 33%]
tests/test_metrics.py::TestMetrics::test_recent PASSED                                                                  [ 50%]
tests/test_metrics.py::TestMetrics::test_shared PASSED                                                                  [ 66%]
tests/test_metrics.py::TestMetrics::test_simpleadd PASSED                                                               [ 83%]
tests/test_metrics.py::TestMetrics::test_verymultithreaded PASSED                                                       [100%]

==================================================== slowest 10 durations =====================================================
0.13s call     tests/test_metrics.py::TestMetrics::test_verymultithreaded
0.09s call     tests/test_metrics.py::TestMetrics::test_largebuffer

(8 durations < 0.005s hidden.  Use -vv to show these durations.)
======================================== 6 passed, 990 deselected, 2 warnings in 5.16s ========================================

** Other Info **

I chose to store the AUCmetric in the classifier because if we ever wanna do it during training.... it should be relatively easy to modify the code and do it there :) (maybe?)
Also, added 6 tests and know that it might be a lot less code if I just did a for loop with some of the commands, but then I noticed that if I did do that, it would be less easy to read exactly which test it was.
Please don't get scared by the lines...... ~300 are in the tests.
Code used to generate the graphs (not in the final version)

                # for fun...
                import matplotlib.pyplot as plt
                from sklearn import metrics
                fpr, tpr, _, _ = curr_auc._calc_fpr_tpr()
                display = metrics.RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=curr_auc.value())
                display.plot()
                folder = '/'.join(opt['report_filename'].split('/')[:-1])
                plt.savefig(folder+f"/AUC_graph_{task}_{classifier_agent.class_list[class_indices]}.png")
                plt.clf()
                print(f"graphed {folder}/AUC_graph_{task}_{classifier_agent.class_list[class_indices]}.png")

Code used to generate table from report file:

interested_metrics = {'AUC___notok__', 'class___notok___f1'}

fields = ['dialogue_safety', 'internal:civil_bias_toxic_comment:civil_bias_toxicity']
rows = []

for metric in interested_metrics:  
 row = [metric]
 for field in fields:
   key = field + '/' + metric 
   row.append(str(round(report[key], 5)))
 row.append(str(round(report[metric], 5)))
 rows.append('|'.join(row))

print('|'.join(['metric'] + fields + ['all']))
print('|'.join(['---']*(len(fields) + 2)))
print('\n'.join(rows))

… at the end, but need to add tests

…e tested for accuracy

…s recorded at the very end

…e not updated to only calculate at the end yet

…e fpr and tpr reversed???

…side

…e weirdness going on with auc

stephenroller

Generally seems good and I appreciate the feature. Just have a clarification question.

parlai/core/torch_classifier_agent.py

stephenroller · 2021-06-27T16:07:18Z

(That test is fantastic btw)

parlai/core/torch_classifier_agent.py

EricMichaelSmith

Wow, great to have! Yeah, I like the thorough tests. Will defer to others with more context for approval

parlai/core/torch_classifier_agent.py

parlai/scripts/eval_model.py

parlai/core/torch_classifier_agent.py

jxmsML

LGTM!! some nit

parlai/scripts/eval_model.py

liliarose added 23 commits June 21, 2021 15:44

added rates

3bf960f

added AUC and updated rates slightly (not sure if we'll be using them…

89d977d

… at the end, but need to add tests

linting

84fc014

fixed a bug

e5cb9c7

Still needs to be updated, but will deal with this later....

db9f716

only dependency on sklearn.metrics should be auc, but still need to b…

1ea17d9

…e tested for accuracy

cleaned up/deleted unnecessary statements

da844d1

might have messed sth up

5408789

last commit before adding tests & changing the structure so that it i…

6c6755d

…s recorded at the very end

found a bug

70bbd34

added 3 tests + a couple small changes to torch classifier agent (hav…

3668d76

…e not updated to only calculate at the end yet

fixed a small thing that would affect testing metrics

6ab2fdf

fixed things so the tests pass... but also don't know why it likes th…

4cd745d

…e fpr and tpr reversed???

passed all current auc tests.... but I will add more later for checking

7e27792

added a hopefully faster merge way

43483dd

finished testing inside test_metrics... now gonna work on the actual …

69a56bd

…side

changed some stuff so it only is run on eval_model, but there are som…

44e59ed

…e weirdness going on with auc

testing snapshot

e4fb497

removed extra prints

ab25b30

fixed something

0588976

removed an extra commit

afc720f

fixed another thing....

4a7beef

another bug

560edf7

facebook-github-bot added the CLA Signed label Jun 25, 2021

liliarose marked this pull request as ready for review June 25, 2021 21:20

stephenroller reviewed Jun 27, 2021

View reviewed changes

parlai/core/torch_classifier_agent.py Outdated Show resolved Hide resolved

parlai/core/torch_classifier_agent.py Outdated Show resolved Hide resolved

parlai/core/torch_classifier_agent.py Outdated Show resolved Hide resolved

liliarose added 3 commits June 28, 2021 19:15

removed slots and added better class comment

e60e028

linting

d400258

why was linting not added before...

c91bba5

liliarose marked this pull request as ready for review June 30, 2021 22:21

jxmsML self-requested a review July 1, 2021 14:44

jxmsML reviewed Jul 1, 2021

View reviewed changes

parlai/core/torch_classifier_agent.py Outdated Show resolved Hide resolved

jxmsML reviewed Jul 1, 2021

View reviewed changes

parlai/core/torch_classifier_agent.py Outdated Show resolved Hide resolved

fixed some max_bucket_dec_places problems

acbd0e4

liliarose requested review from meganung and EricMichaelSmith July 1, 2021 15:46

jxmsML reviewed Jul 1, 2021

View reviewed changes

parlai/core/torch_classifier_agent.py Outdated Show resolved Hide resolved

jxmsML reviewed Jul 1, 2021

View reviewed changes

parlai/core/torch_classifier_agent.py Outdated Show resolved Hide resolved

liliarose added 4 commits July 1, 2021 18:12

updated sort and counter typing

40abafd

fixed another typing

2375cc0

fixed an import thing

9645728

fixed a small typing thing

3672c6d

EricMichaelSmith reviewed Jul 2, 2021

View reviewed changes

liliarose added 3 commits July 6, 2021 19:53

added comments + changed -auc in eval_model

5c8c724

added comment about classifier agent

6e7f1af

linted

22b4c01

liliarose marked this pull request as draft July 6, 2021 20:46

fixed optional

065c9e5

liliarose marked this pull request as ready for review July 6, 2021 21:07

added to documentation

a3bb11f

jxmsML reviewed Jul 7, 2021

View reviewed changes

parlai/core/torch_classifier_agent.py Outdated Show resolved Hide resolved

jxmsML reviewed Jul 7, 2021

View reviewed changes

parlai/core/torch_classifier_agent.py Outdated Show resolved Hide resolved

jxmsML approved these changes Jul 7, 2021

View reviewed changes

stephenroller changed the title ~~Auc metrics~~ AUC metrics Jul 7, 2021

fixed comments + counter import

60b7953

meganung reviewed Jul 7, 2021

View reviewed changes

parlai/scripts/eval_model.py Outdated Show resolved Hide resolved

fixed wording :)

40750f6

liliarose merged commit 36004c9 into master Jul 8, 2021

liliarose deleted the auc_metrics branch July 8, 2021 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AUC metrics #3751

AUC metrics #3751

liliarose commented Jun 25, 2021 •

edited

Loading

stephenroller left a comment

stephenroller commented Jun 27, 2021

EricMichaelSmith left a comment

jxmsML left a comment

AUC metrics #3751

AUC metrics #3751

Conversation

liliarose commented Jun 25, 2021 • edited Loading

Patch description

Testing steps

Logs

Testing Step 2

Testing Step 1

Testing Step 0

stephenroller left a comment

Choose a reason for hiding this comment

stephenroller commented Jun 27, 2021

EricMichaelSmith left a comment

Choose a reason for hiding this comment

jxmsML left a comment

Choose a reason for hiding this comment

liliarose commented Jun 25, 2021 •

edited

Loading