-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BSE-4419] Support large counts in MPI gatherv #89
Conversation
@@ -936,6 +936,9 @@ def _infer_series_arr_type(S: pd.Series, array_metadata=None): | |||
arr_type = types.Array(arr_type.dtype, 1, "C") | |||
|
|||
return arr_type | |||
except pa.lib.ArrowMemoryError: # pragma: no cover | |||
# OOM | |||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed when scattering a Dataframe that is too large to fit into memory, S.array
can fail. Before the error message would be:
bodo.utils.typing.BodoError: data type string for column A not supported yet
Now it looks something like:
pa.lib.ArrowMemoryError: Could not find an empty frame of required size (68719476736)!
static void c_gatherv(void* send_data, int sendcount, void* recv_data, | ||
int* recv_counts, int* displs, int typ_enum, | ||
static_assert(sizeof(MPI_Count) == sizeof(int64_t)); | ||
static_assert(sizeof(MPI_Aint) == sizeof(int64_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation says that MPI_Aint
is a "C type that holds any valid address." On my machine it is typedef'd as long int
. We'd need to check that it is still 64 bits on windows.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #89 +/- ##
=======================================
Coverage ? 77.81%
=======================================
Files ? 160
Lines ? 61927
Branches ? 8754
=======================================
Hits ? 48191
Misses ? 11610
Partials ? 2126 |
A couple failures on Nightly: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @scott-routledge2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks @scott-routledge2 !
Changes included in this PR
Changes
MPI_Gengatherv
andc_gatherv
to useMPI_Gatherv_c
/MPI_Allgatherv_c
. For spawn mode, most cases will take theMPI_Gengatherv
path viagather_array
but also includingc_gatherv
here for robustness.Testing strategy
PR CI, Nightly, locally with:
BODO_NUM_WORKERS=2 python -u test.py
User facing changes
Larger results can be returned from bodo jit functions.
More accurate error message surrounding out of memory when scattering large Series.
Checklist
[run CI]
in your commit message.