Skip to content

[BUG] Incorrect cost for crash, r2 metric.  #339

Closed
@ravinkohli

Description

@ravinkohli

The cost of an unsuccessful configuration such as a crash, time out or mem out is not the lowest in the run history. I think the issue is due to the worst_possible_result of r2 to be 0 instead of a large negative value.

Currently, we assume that the worst r2 value is 0, however, it can be worse as well. So, for this test, get_incumbent_results is giving a failed configuration.

run_history.data

run history: OrderedDict([(RunKey(config_id=1, instance_id='{"task_id": "346bba22-4b9e-11ec-8898-af6b24cea2d8"}', seed=0, budget=5.555555555555555), RunValue(cost=1.0506822531406963, time=1.7120046615600586, status=<StatusType.SUCCESS: 1>, starttime=1637590341.2953675, endtime=1637590344.0851278, additional_info={'opt_loss': {'mean_absolute_error': 0.848294091048943, 'mean_squared_error': 1.1446978981576077, 'root_mean_squared_error': 1.0698249409683425, 'median_absolute_error': 0.6924790938752838, 'r2': 1.0506822531406963}, 'duration': 1.59676194190979, 'num_run': 2, 'train_loss': 1.0, 'test_loss': {'mean_absolute_error': 0.600238164830327, 'mean_squared_error': 0.6760705147388893, 'root_mean_squared_error': 0.8222350726762325, 'median_absolute_error': 0.4358874619042557, 'r2': 1.0006027803574398}, 'configuration_origin': 'Default'})), (RunKey(config_id=2, instance_id='{"task_id": "346bba22-4b9e-11ec-8898-af6b24cea2d8"}', seed=0, budget=5.555555555555555), RunValue(cost=1.0506822531406963, time=1.6906914710998535, status=<StatusType.SUCCESS: 1>, starttime=1637590344.3137865, endtime=1637590347.078679, additional_info={'opt_loss': {'mean_absolute_error': 0.848294091048943, 'mean_squared_error': 1.1446978981576077, 'root_mean_squared_error': 1.0698249409683425, 'median_absolute_error': 0.6924790938752838, 'r2': 1.0506822531406963}, 'duration': 1.5626084804534912, 'num_run': 3, 'train_loss': 1.0, 'test_loss': {'mean_absolute_error': 0.600238164830327, 'mean_squared_error': 0.6760705147388893, 'root_mean_squared_error': 0.8222350726762325, 'median_absolute_error': 0.4358874619042557, 'r2': 1.0006027803574398}, 'configuration_origin': 'Random Search'})), (RunKey(config_id=3, instance_id='{"task_id": "346bba22-4b9e-11ec-8898-af6b24cea2d8"}', seed=0, budget=5.555555555555555), RunValue(cost=1.0506822531406963, time=1.9393565654754639, status=<StatusType.SUCCESS: 1>, starttime=1637590351.282449, endtime=1637590354.308634, additional_info={'opt_loss': {'mean_absolute_error': 0.848294091048943, 'mean_squared_error': 1.1446978981576077, 'root_mean_squared_error': 1.0698249409683425, 'median_absolute_error': 0.6924790938752838, 'r2': 1.0506822531406963}, 'duration': 1.820396900177002, 'num_run': 4, 'train_loss': 1.0, 'test_loss': {'mean_absolute_error': 0.600238164830327, 'mean_squared_error': 0.6760705147388893, 'root_mean_squared_error': 0.8222350726762325, 'median_absolute_error': 0.4358874619042557, 'r2': 1.0006027803574398}, 'configuration_origin': 'Random Search (sorted)'})), (RunKey(config_id=4, instance_id='{"task_id": "346bba22-4b9e-11ec-8898-af6b24cea2d8"}', seed=0, budget=5.555555555555555), RunValue(cost=1.0506822531406963, time=1.8805546760559082, status=<StatusType.SUCCESS: 1>, starttime=1637590354.5875776, endtime=1637590357.55083, additional_info={'opt_loss': {'mean_absolute_error': 0.848294091048943, 'mean_squared_error': 1.1446978981576077, 'root_mean_squared_error': 1.0698249409683425, 'median_absolute_error': 0.6924790938752838, 'r2': 1.0506822531406963}, 'duration': 1.7624120712280273, 'num_run': 5, 'train_loss': 1.0, 'test_loss': {'mean_absolute_error': 0.600238164830327, 'mean_squared_error': 0.6760705147388893, 'root_mean_squared_error': 0.8222350726762325, 'median_absolute_error': 0.4358874619042557, 'r2': 1.0006027803574398}, 'configuration_origin': 'Random Search'})), (RunKey(config_id=5, instance_id='{"task_id": "346bba22-4b9e-11ec-8898-af6b24cea2d8"}', seed=0, budget=5.555555555555555), RunValue(cost=1.0506822531406963, time=1.9475481510162354, status=<StatusType.SUCCESS: 1>, starttime=1637590357.8529084, endtime=1637590360.8822083, additional_info={'opt_loss': {'mean_absolute_error': 0.848294091048943, 'mean_squared_error': 1.1446978981576077, 'root_mean_squared_error': 1.0698249409683425, 'median_absolute_error': 0.6924790938752838, 'r2': 1.0506822531406963}, 'duration': 1.8303802013397217, 'num_run': 6, 'train_loss': 1.0, 'test_loss': {'mean_absolute_error': 0.600238164830327, 'mean_squared_error': 0.6760705147388893, 'root_mean_squared_error': 0.8222350726762325, 'median_absolute_error': 0.4358874619042557, 'r2': 1.0006027803574398}, 'configuration_origin': 'Random Search'})), (RunKey(config_id=6, instance_id='{"task_id": "346bba22-4b9e-11ec-8898-af6b24cea2d8"}', seed=0, budget=5.555555555555555), RunValue(cost=1.0506822531406963, time=1.8912639617919922, status=<StatusType.SUCCESS: 1>, starttime=1637590361.1837845, endtime=1637590364.1609986, additional_info={'opt_loss': {'mean_absolute_error': 0.848294091048943, 'mean_squared_error': 1.1446978981576077, 'root_mean_squared_error': 1.0698249409683425, 'median_absolute_error': 0.6924790938752838, 'r2': 1.0506822531406963}, 'duration': 1.7739710807800293, 'num_run': 7, 'train_loss': 1.0, 'test_loss': {'mean_absolute_error': 0.600238164830327, 'mean_squared_error': 0.6760705147388893, 'root_mean_squared_error': 0.8222350726762325, 'median_absolute_error': 0.4358874619042557, 'r2': 1.0006027803574398}, 'configuration_origin': 'Random Search'})), (RunKey(config_id=7, instance_id='{"task_id": "346bba22-4b9e-11ec-8898-af6b24cea2d8"}', seed=0, budget=5.555555555555555), RunValue(cost=1.0, time=1.008704423904419, status=<StatusType.TIMEOUT: 2>, starttime=1637590369.4671094, endtime=1637590371.494137, additional_info={'error': 'Timeout', 'configuration_origin': 'Random Search (sorted)'}))])

incumbent_results

{'configuration_origin': 'Random Search (sorted)', 'error': 'Timeout'}

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingfirst priorityPRs to be checked as a priority

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions