Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hydra: remove hard-coded token limit #6682

Merged
merged 5 commits into from
Sep 20, 2023
Merged

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Sep 19, 2023

Pull Request Description

There are multiple places in hydra that use a hard-coded macro HYD_NUM_TMP_STRING (set to 1000) for temporary arrays. This places unnecessary limit as very long file lines or command lines, while not usual, do appear in real usages. In addition, the hard limit embeds pitfalls for code to segfault if boundary checking is missing. Using dynamic arrays fixes these issue.

Fixes #6681

[skip warnings]

Notes

The remaining usage of HYD_NUM_TMP_STRINGS are used for constructing hydra_pmi_proxy command line argument. Proxy argument list is controlled and should be well below 1000. Majority of argument and environment list are passed in via socket connection. It may still worth for some cleanup. This is left for later.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou force-pushed the 2309_hydra_args branch 2 times, most recently from 0f68836 to 09c7855 Compare September 19, 2023 17:14
@hzhou
Copy link
Contributor Author

hzhou commented Sep 19, 2023

test:mpich/ch3/tcp
test:mpich/pmi

✔️

@hzhou
Copy link
Contributor Author

hzhou commented Sep 19, 2023

test:mpich/ch4/ofi

A memory leak in the spawn path, detected by the asan tests.

The hard limit of HYD_NUM_TMP_STRINGS is potential for unsafe code.
The code here is safe, but make it local and not depend on arbitrary
macro is cleaner.
When parsing config file, it first load tokens into a pre-allocated
array of size of HYD_NUM_TMP_STRINGS (1000). The hard limit is not safe
in case of very large number of arguments. Use utarray instead.

It seems the previous code also leaks the strdup'ed token strings. This
commit fixes that as well.
Avoid arbitrary limit (HYD_NUM_TMP_STRINGS) on number of hosts, use
utarray instead.
Avoid hard limit of HYD_NUM_TMP_STRINGS, use dynamic array instead.
Also allocate exact size for each token string rather than
HYD_TMP_STRLEN (16K) for each.
It is unsafe to use arbitrary limit -- HYD_NUM_TMP_STRINGS -- to static
allocate exec arg list. Use dynamic array instead.
@hzhou
Copy link
Contributor Author

hzhou commented Sep 19, 2023

test:mpich/ch4/ofi ✔️

@hzhou hzhou requested a review from raffenet September 19, 2023 19:39
Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hzhou hzhou merged commit 16e1e60 into pmodels:main Sep 20, 2023
5 checks passed
@hzhou hzhou deleted the 2309_hydra_args branch September 20, 2023 19:25
@tarunkumar987
Copy link

I originally filed this issue. May I know in which version the fix will be available? Is there an ETA?

@hzhou
Copy link
Contributor Author

hzhou commented Oct 6, 2023

It will be in release 4.2. We will release mpich-4.2b1 early this November, and GA release in December or early next January.

@tarunkumar987
Copy link

We are using 3.2.1 version. Is there a way or feasible even to merge the change in 3.2.1. Moving to 4.2 at this time is not feasible for us.

@hzhou
Copy link
Contributor Author

hzhou commented Oct 10, 2023

You could try install/upgrade hydra separately. There is a separate hydra tarball in every release.

@tarunkumar987
Copy link

Is it possible to cherry-pick these changes in 3.2.1?

@hzhou
Copy link
Contributor Author

hzhou commented Oct 10, 2023

I think separately upgrade hydra is more feasible. 3.2.1 is very old at this point. Hydra has gone through significant refactor since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pm/hydra: Crash when supplying large number of arguments to executable
3 participants