Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: remove custom searcher and DSAT #9949

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

azhou-determined
Copy link
Contributor

@azhou-determined azhou-determined commented Sep 17, 2024

Ticket

Description

removes custom searcher and DSAT

Test Plan

Checklist

  • Changes have been manually QA'd
  • New features have been approved by the corresponding PM
  • User-facing API changes have the "User-facing API Change" label
  • Release notes have been added as a separate file under docs/release-notes/
    See Release Note for details.
  • Licenses have been included for new code which was copied and/or modified from any external code

@azhou-determined azhou-determined requested review from a team as code owners September 17, 2024 20:23
@azhou-determined azhou-determined requested review from jgongd and amandavialva01 and removed request for a team September 17, 2024 20:23
@cla-bot cla-bot bot added the cla-signed label Sep 17, 2024
@determined-ci determined-ci added the documentation Improvements or additions to documentation label Sep 17, 2024
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci determined-ci requested a review from a team September 17, 2024 20:23
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

Copy link

codecov bot commented Sep 17, 2024

Codecov Report

Attention: Patch coverage is 27.27273% with 8 lines in your changes missing coverage. Please review.

Project coverage is 53.88%. Comparing base (f758303) to head (db5d855).
Report is 1 commits behind head on searcher-context-removal.

Files with missing lines Patch % Lines
master/pkg/schemas/expconf/searcher_config.go 0.00% 5 Missing ⚠️
...ter/pkg/schemas/expconf/zgen_searcher_config_v0.go 0.00% 2 Missing ⚠️
master/pkg/searcher/searcher.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@                     Coverage Diff                      @@
##           searcher-context-removal    #9949      +/-   ##
============================================================
- Coverage                     54.50%   53.88%   -0.62%     
============================================================
  Files                          1255     1240      -15     
  Lines                        156733   153498    -3235     
  Branches                       3601     3599       -2     
============================================================
- Hits                          85424    82711    -2713     
+ Misses                        71176    70654     -522     
  Partials                        133      133              
Flag Coverage Δ
backend 45.19% <11.11%> (+0.02%) ⬆️
harness 71.00% <100.00%> (-1.75%) ⬇️
web 54.27% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
harness/determined/common/api/errors.py 76.36% <100.00%> (ø)
...s/determined/pytorch/deepspeed/_deepspeed_trial.py 79.88% <ø> (+1.28%) ⬆️
master/internal/api_experiment.go 58.35% <ø> (+1.55%) ⬆️
master/internal/experiment.go 32.96% <ø> (+2.83%) ⬆️
master/internal/experiment/authz_basic_impl.go 9.37% <ø> (+0.55%) ⬆️
master/internal/experiment/authz_permissive.go 2.32% <ø> (+0.15%) ⬆️
master/internal/experiment/authz_rbac.go 0.51% <ø> (+<0.01%) ⬆️
master/pkg/schemas/zgen_schemas.go 1.11% <ø> (ø)
master/pkg/searcher/operations.go 11.53% <ø> (-0.50%) ⬇️
master/pkg/searcher/search_method.go 76.19% <ø> (-2.08%) ⬇️
... and 3 more

... and 7 files with indirect coverage changes

Copy link
Contributor

@rb-determined-ai rb-determined-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice. -11k!

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci determined-ci requested a review from a team September 17, 2024 22:59
@jgongd
Copy link
Contributor

jgongd commented Sep 18, 2024

For my understanding, is deep speed also part of the searcher context as well?
Answer my own question: yes, deep speed imports searcher

from determined import searcher

Copy link
Contributor

@jgongd jgongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

}
)

func newCustomSearch(config expconf.CustomConfig) SearchMethod {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the expconf.CustomConfig type still exists, do we want to remove this as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type only seems to be used here in our experiment integration tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, removed it and the calling piece in that intg test

@@ -85,8 +76,6 @@ func NewSearchMethod(c expconf.SearcherConfig) SearchMethod {
return newAsyncHalvingSearch(*c.RawAsyncHalvingConfig, c.SmallerIsBetter())
case c.RawAdaptiveASHAConfig != nil:
return newAdaptiveASHASearch(*c.RawAdaptiveASHAConfig, c.SmallerIsBetter())
case c.RawCustomConfig != nil:
return newCustomSearch(*c.RawCustomConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also remove the RawCustomConfig type from SearcherConfigV0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, but apparently we can't 'cause of the case where master tries to restore a pre-upgrade custom search experiment. so i've kept the config but treat it like how we treat the other removed searcher configs.

Copy link
Contributor

@amandavialva01 amandavialva01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend looks great! Left a few comments on a couple of structs that can potentially be removed

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

6 similar comments
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@azhou-determined
Copy link
Contributor Author

There's an import of ProjectSpec from the deleted file webui/react/src/services/stream/wire.ts in webui/react/src/stores/projects.tsx

sorry, i'm on a new laptop and having some proto/bindings build issues. those files were added back 😄

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

3 similar comments
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

4 similar comments
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

Copy link
Contributor

@rb-determined-ai rb-determined-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

siiiick

@azhou-determined azhou-determined merged commit 16779e7 into searcher-context-removal Sep 24, 2024
81 of 94 checks passed
@azhou-determined azhou-determined deleted the remove-custom-searcher branch September 24, 2024 19:04
rb-determined-ai pushed a commit that referenced this pull request Oct 1, 2024
delete custom searcher (and DSAT)
rb-determined-ai pushed a commit that referenced this pull request Oct 22, 2024
delete custom searcher (and DSAT)
azhou-determined added a commit that referenced this pull request Oct 22, 2024
delete custom searcher (and DSAT)
azhou-determined pushed a commit that referenced this pull request Oct 22, 2024
Eliminate a couple dozen yaml files for controlling the no_op fixture,
each of which was tweaked a half dozen ways by different tests.

There were 124 usages of the no_op fixture, and it was very hard to
know what any particular test was trying to accomplish.  All of these
(except 6 from the custom searcher tests, which are removed in an
upcoming feature branch) have been re-written to use a new python module
for creating noop experiments with obvious behaviors.

By my measurements, a combined total of 34 minutes of effective sleeping
were removed from the individual tests of our test suite.  The biggest
wins were from cases where the test author probably did not realize how
long some of the no_op experiments were configured to run for.

Most tests were faithfully preserved, with the following exceptions:

- cluster/test_exp_continue:test_continue_config_file_and_args_cli
    - converted to a unit test
- cluster/test_exp_continue:test_continue_config_file_and_args_cli
    - deleted; with new unit test, adds nothing to test_continue_batches
- cluster/test_exp_continue:test_continue_fixing_broken_config
    - deleted; adds nothing to test_continue_batches
- cluster/test_exp_continue:test_continue_workloads_searcher
    - deleted since it was really a wlsq test
- cluster/test_exp_continue:test_continue_pytorch_completed_searcher
    - deleted since it was really a pytorch trainer test
- cluster/test_resource_manager:test_allocation_resources_incremental_release
    - the test has not been working, I think at least since we defaulted
      to using det.launch.torch_distributed; the non-chief container was
      not exiting until the chief exited
- experiment/test_core:test_trial_logs
    - deleted due to cluster/test_logging
- experiment/test_core:test_log_null_bytes
    - deleted, but added null bytes to test_logging.py
- experiment/test_noop:test_noop_nan_validations
    - combined with test_noop_pause
- experiment/test_noop:test_cancel_ten_experiments
    - this test is dumb, also it was pathologically slow
- experiment/test_noop:test_cancel_ten_paused_experiments
    - this test is dumb
- experiment/test_noop:test_startup_hook
    - test_logging tests startup hooks already
- run/test_api:test_run_pause_and_resume_filter_skip_empty
    - renamed to test_run_in_search_not_pausable_or_resumable to match
      its intended purpose, also simplify it, also make it stricter,
      also stop leaking adaptive searches onto the cluster after passing

chore: remove custom searcher and DSAT (#9949)

delete custom searcher (and DSAT)

chore: refactor searcher operations out of master side searchers

code gen
azhou-determined pushed a commit that referenced this pull request Oct 22, 2024
Eliminate a couple dozen yaml files for controlling the no_op fixture,
each of which was tweaked a half dozen ways by different tests.

There were 124 usages of the no_op fixture, and it was very hard to
know what any particular test was trying to accomplish.  All of these
(except 6 from the custom searcher tests, which are removed in an
upcoming feature branch) have been re-written to use a new python module
for creating noop experiments with obvious behaviors.

By my measurements, a combined total of 34 minutes of effective sleeping
were removed from the individual tests of our test suite.  The biggest
wins were from cases where the test author probably did not realize how
long some of the no_op experiments were configured to run for.

Most tests were faithfully preserved, with the following exceptions:

- cluster/test_exp_continue:test_continue_config_file_and_args_cli
    - converted to a unit test
- cluster/test_exp_continue:test_continue_config_file_and_args_cli
    - deleted; with new unit test, adds nothing to test_continue_batches
- cluster/test_exp_continue:test_continue_fixing_broken_config
    - deleted; adds nothing to test_continue_batches
- cluster/test_exp_continue:test_continue_workloads_searcher
    - deleted since it was really a wlsq test
- cluster/test_exp_continue:test_continue_pytorch_completed_searcher
    - deleted since it was really a pytorch trainer test
- cluster/test_resource_manager:test_allocation_resources_incremental_release
    - the test has not been working, I think at least since we defaulted
      to using det.launch.torch_distributed; the non-chief container was
      not exiting until the chief exited
- experiment/test_core:test_trial_logs
    - deleted due to cluster/test_logging
- experiment/test_core:test_log_null_bytes
    - deleted, but added null bytes to test_logging.py
- experiment/test_noop:test_noop_nan_validations
    - combined with test_noop_pause
- experiment/test_noop:test_cancel_ten_experiments
    - this test is dumb, also it was pathologically slow
- experiment/test_noop:test_cancel_ten_paused_experiments
    - this test is dumb
- experiment/test_noop:test_startup_hook
    - test_logging tests startup hooks already
- run/test_api:test_run_pause_and_resume_filter_skip_empty
    - renamed to test_run_in_search_not_pausable_or_resumable to match
      its intended purpose, also simplify it, also make it stricter,
      also stop leaking adaptive searches onto the cluster after passing

chore: remove custom searcher and DSAT (#9949)

delete custom searcher (and DSAT)

chore: refactor searcher operations out of master side searchers

code gen
rb-determined-ai pushed a commit that referenced this pull request Oct 22, 2024
delete custom searcher (and DSAT)
azhou-determined pushed a commit that referenced this pull request Oct 23, 2024
Eliminate a couple dozen yaml files for controlling the no_op fixture,
each of which was tweaked a half dozen ways by different tests.

There were 124 usages of the no_op fixture, and it was very hard to
know what any particular test was trying to accomplish.  All of these
(except 6 from the custom searcher tests, which are removed in an
upcoming feature branch) have been re-written to use a new python module
for creating noop experiments with obvious behaviors.

By my measurements, a combined total of 34 minutes of effective sleeping
were removed from the individual tests of our test suite.  The biggest
wins were from cases where the test author probably did not realize how
long some of the no_op experiments were configured to run for.

Most tests were faithfully preserved, with the following exceptions:

- cluster/test_exp_continue:test_continue_config_file_and_args_cli
    - converted to a unit test
- cluster/test_exp_continue:test_continue_config_file_and_args_cli
    - deleted; with new unit test, adds nothing to test_continue_batches
- cluster/test_exp_continue:test_continue_fixing_broken_config
    - deleted; adds nothing to test_continue_batches
- cluster/test_exp_continue:test_continue_workloads_searcher
    - deleted since it was really a wlsq test
- cluster/test_exp_continue:test_continue_pytorch_completed_searcher
    - deleted since it was really a pytorch trainer test
- cluster/test_resource_manager:test_allocation_resources_incremental_release
    - the test has not been working, I think at least since we defaulted
      to using det.launch.torch_distributed; the non-chief container was
      not exiting until the chief exited
- experiment/test_core:test_trial_logs
    - deleted due to cluster/test_logging
- experiment/test_core:test_log_null_bytes
    - deleted, but added null bytes to test_logging.py
- experiment/test_noop:test_noop_nan_validations
    - combined with test_noop_pause
- experiment/test_noop:test_cancel_ten_experiments
    - this test is dumb, also it was pathologically slow
- experiment/test_noop:test_cancel_ten_paused_experiments
    - this test is dumb
- experiment/test_noop:test_startup_hook
    - test_logging tests startup hooks already
- run/test_api:test_run_pause_and_resume_filter_skip_empty
    - renamed to test_run_in_search_not_pausable_or_resumable to match
      its intended purpose, also simplify it, also make it stricter,
      also stop leaking adaptive searches onto the cluster after passing

chore: remove custom searcher and DSAT (#9949)

delete custom searcher (and DSAT)

chore: refactor searcher operations out of master side searchers

code gen
rb-determined-ai pushed a commit that referenced this pull request Oct 24, 2024
delete custom searcher (and DSAT)
rb-determined-ai pushed a commit that referenced this pull request Oct 24, 2024
delete custom searcher (and DSAT)
azhou-determined added a commit that referenced this pull request Oct 25, 2024
delete custom searcher (and DSAT)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants