-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: remove custom searcher and DSAT #9949
chore: remove custom searcher and DSAT #9949
Conversation
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## searcher-context-removal #9949 +/- ##
============================================================
- Coverage 54.50% 53.88% -0.62%
============================================================
Files 1255 1240 -15
Lines 156733 153498 -3235
Branches 3601 3599 -2
============================================================
- Hits 85424 82711 -2713
+ Misses 71176 70654 -522
Partials 133 133
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice. -11k!
Docsite preview being generated for this PR. |
For my understanding, is deep speed also part of the searcher context as well?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
} | ||
) | ||
|
||
func newCustomSearch(config expconf.CustomConfig) SearchMethod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the expconf.CustomConfig
type still exists, do we want to remove this as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type only seems to be used here in our experiment integration tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, removed it and the calling piece in that intg test
@@ -85,8 +76,6 @@ func NewSearchMethod(c expconf.SearcherConfig) SearchMethod { | |||
return newAsyncHalvingSearch(*c.RawAsyncHalvingConfig, c.SmallerIsBetter()) | |||
case c.RawAdaptiveASHAConfig != nil: | |||
return newAdaptiveASHASearch(*c.RawAdaptiveASHAConfig, c.SmallerIsBetter()) | |||
case c.RawCustomConfig != nil: | |||
return newCustomSearch(*c.RawCustomConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also remove the RawCustomConfig type from SearcherConfigV0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, but apparently we can't 'cause of the case where master tries to restore a pre-upgrade custom search experiment. so i've kept the config but treat it like how we treat the other removed searcher configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backend looks great! Left a few comments on a couple of structs that can potentially be removed
Docsite preview being generated for this PR. |
6 similar comments
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
sorry, i'm on a new laptop and having some proto/bindings build issues. those files were added back 😄 |
Docsite preview being generated for this PR. |
5d06c6c
to
4865e03
Compare
Docsite preview being generated for this PR. |
3 similar comments
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
edc1848
to
c76a5ca
Compare
Docsite preview being generated for this PR. |
4 similar comments
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
Docsite preview being generated for this PR. |
a32c75c
to
db5d855
Compare
Docsite preview being generated for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
siiiick
16779e7
into
searcher-context-removal
delete custom searcher (and DSAT)
delete custom searcher (and DSAT)
delete custom searcher (and DSAT)
Eliminate a couple dozen yaml files for controlling the no_op fixture, each of which was tweaked a half dozen ways by different tests. There were 124 usages of the no_op fixture, and it was very hard to know what any particular test was trying to accomplish. All of these (except 6 from the custom searcher tests, which are removed in an upcoming feature branch) have been re-written to use a new python module for creating noop experiments with obvious behaviors. By my measurements, a combined total of 34 minutes of effective sleeping were removed from the individual tests of our test suite. The biggest wins were from cases where the test author probably did not realize how long some of the no_op experiments were configured to run for. Most tests were faithfully preserved, with the following exceptions: - cluster/test_exp_continue:test_continue_config_file_and_args_cli - converted to a unit test - cluster/test_exp_continue:test_continue_config_file_and_args_cli - deleted; with new unit test, adds nothing to test_continue_batches - cluster/test_exp_continue:test_continue_fixing_broken_config - deleted; adds nothing to test_continue_batches - cluster/test_exp_continue:test_continue_workloads_searcher - deleted since it was really a wlsq test - cluster/test_exp_continue:test_continue_pytorch_completed_searcher - deleted since it was really a pytorch trainer test - cluster/test_resource_manager:test_allocation_resources_incremental_release - the test has not been working, I think at least since we defaulted to using det.launch.torch_distributed; the non-chief container was not exiting until the chief exited - experiment/test_core:test_trial_logs - deleted due to cluster/test_logging - experiment/test_core:test_log_null_bytes - deleted, but added null bytes to test_logging.py - experiment/test_noop:test_noop_nan_validations - combined with test_noop_pause - experiment/test_noop:test_cancel_ten_experiments - this test is dumb, also it was pathologically slow - experiment/test_noop:test_cancel_ten_paused_experiments - this test is dumb - experiment/test_noop:test_startup_hook - test_logging tests startup hooks already - run/test_api:test_run_pause_and_resume_filter_skip_empty - renamed to test_run_in_search_not_pausable_or_resumable to match its intended purpose, also simplify it, also make it stricter, also stop leaking adaptive searches onto the cluster after passing chore: remove custom searcher and DSAT (#9949) delete custom searcher (and DSAT) chore: refactor searcher operations out of master side searchers code gen
Eliminate a couple dozen yaml files for controlling the no_op fixture, each of which was tweaked a half dozen ways by different tests. There were 124 usages of the no_op fixture, and it was very hard to know what any particular test was trying to accomplish. All of these (except 6 from the custom searcher tests, which are removed in an upcoming feature branch) have been re-written to use a new python module for creating noop experiments with obvious behaviors. By my measurements, a combined total of 34 minutes of effective sleeping were removed from the individual tests of our test suite. The biggest wins were from cases where the test author probably did not realize how long some of the no_op experiments were configured to run for. Most tests were faithfully preserved, with the following exceptions: - cluster/test_exp_continue:test_continue_config_file_and_args_cli - converted to a unit test - cluster/test_exp_continue:test_continue_config_file_and_args_cli - deleted; with new unit test, adds nothing to test_continue_batches - cluster/test_exp_continue:test_continue_fixing_broken_config - deleted; adds nothing to test_continue_batches - cluster/test_exp_continue:test_continue_workloads_searcher - deleted since it was really a wlsq test - cluster/test_exp_continue:test_continue_pytorch_completed_searcher - deleted since it was really a pytorch trainer test - cluster/test_resource_manager:test_allocation_resources_incremental_release - the test has not been working, I think at least since we defaulted to using det.launch.torch_distributed; the non-chief container was not exiting until the chief exited - experiment/test_core:test_trial_logs - deleted due to cluster/test_logging - experiment/test_core:test_log_null_bytes - deleted, but added null bytes to test_logging.py - experiment/test_noop:test_noop_nan_validations - combined with test_noop_pause - experiment/test_noop:test_cancel_ten_experiments - this test is dumb, also it was pathologically slow - experiment/test_noop:test_cancel_ten_paused_experiments - this test is dumb - experiment/test_noop:test_startup_hook - test_logging tests startup hooks already - run/test_api:test_run_pause_and_resume_filter_skip_empty - renamed to test_run_in_search_not_pausable_or_resumable to match its intended purpose, also simplify it, also make it stricter, also stop leaking adaptive searches onto the cluster after passing chore: remove custom searcher and DSAT (#9949) delete custom searcher (and DSAT) chore: refactor searcher operations out of master side searchers code gen
delete custom searcher (and DSAT)
Eliminate a couple dozen yaml files for controlling the no_op fixture, each of which was tweaked a half dozen ways by different tests. There were 124 usages of the no_op fixture, and it was very hard to know what any particular test was trying to accomplish. All of these (except 6 from the custom searcher tests, which are removed in an upcoming feature branch) have been re-written to use a new python module for creating noop experiments with obvious behaviors. By my measurements, a combined total of 34 minutes of effective sleeping were removed from the individual tests of our test suite. The biggest wins were from cases where the test author probably did not realize how long some of the no_op experiments were configured to run for. Most tests were faithfully preserved, with the following exceptions: - cluster/test_exp_continue:test_continue_config_file_and_args_cli - converted to a unit test - cluster/test_exp_continue:test_continue_config_file_and_args_cli - deleted; with new unit test, adds nothing to test_continue_batches - cluster/test_exp_continue:test_continue_fixing_broken_config - deleted; adds nothing to test_continue_batches - cluster/test_exp_continue:test_continue_workloads_searcher - deleted since it was really a wlsq test - cluster/test_exp_continue:test_continue_pytorch_completed_searcher - deleted since it was really a pytorch trainer test - cluster/test_resource_manager:test_allocation_resources_incremental_release - the test has not been working, I think at least since we defaulted to using det.launch.torch_distributed; the non-chief container was not exiting until the chief exited - experiment/test_core:test_trial_logs - deleted due to cluster/test_logging - experiment/test_core:test_log_null_bytes - deleted, but added null bytes to test_logging.py - experiment/test_noop:test_noop_nan_validations - combined with test_noop_pause - experiment/test_noop:test_cancel_ten_experiments - this test is dumb, also it was pathologically slow - experiment/test_noop:test_cancel_ten_paused_experiments - this test is dumb - experiment/test_noop:test_startup_hook - test_logging tests startup hooks already - run/test_api:test_run_pause_and_resume_filter_skip_empty - renamed to test_run_in_search_not_pausable_or_resumable to match its intended purpose, also simplify it, also make it stricter, also stop leaking adaptive searches onto the cluster after passing chore: remove custom searcher and DSAT (#9949) delete custom searcher (and DSAT) chore: refactor searcher operations out of master side searchers code gen
delete custom searcher (and DSAT)
delete custom searcher (and DSAT)
delete custom searcher (and DSAT)
Ticket
Description
removes custom searcher and DSAT
Test Plan
Checklist
docs/release-notes/
See Release Note for details.