Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BSE-4419] Support large counts in MPI gatherv #89

Merged
merged 10 commits into from
Dec 27, 2024

Conversation

scott-routledge2
Copy link
Contributor

@scott-routledge2 scott-routledge2 commented Dec 23, 2024

Changes included in this PR

Changes MPI_Gengatherv and c_gatherv to use MPI_Gatherv_c/ MPI_Allgatherv_c. For spawn mode, most cases will take the MPI_Gengatherv path via gather_array but also including c_gatherv here for robustness.

Testing strategy

PR CI, Nightly, locally with:

import pandas as pd
import bodo

bodo.set_verbose_level(2)

@bodo.jit
def test(df):
    return df

df = pd.Series(["abcd"*10000]*100000)
df = test(df)
print(df.head(15))

BODO_NUM_WORKERS=2 python -u test.py

User facing changes

Larger results can be returned from bodo jit functions.
More accurate error message surrounding out of memory when scattering large Series.

Checklist

  • Pipelines passed before requesting review. To run CI you must include [run CI] in your commit message.
  • I am familiar with the Contributing Guide
  • I have installed + ran pre-commit hooks.

@scott-routledge2 scott-routledge2 changed the base branch from main to scott/update_scatterv_c December 23, 2024 22:23
Base automatically changed from scott/update_scatterv_c to main December 26, 2024 15:09
@@ -936,6 +936,9 @@ def _infer_series_arr_type(S: pd.Series, array_metadata=None):
arr_type = types.Array(arr_type.dtype, 1, "C")

return arr_type
except pa.lib.ArrowMemoryError: # pragma: no cover
# OOM
raise
Copy link
Contributor Author

@scott-routledge2 scott-routledge2 Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed when scattering a Dataframe that is too large to fit into memory, S.array can fail. Before the error message would be:
bodo.utils.typing.BodoError: data type string for column A not supported yet
Now it looks something like:
pa.lib.ArrowMemoryError: Could not find an empty frame of required size (68719476736)!

static void c_gatherv(void* send_data, int sendcount, void* recv_data,
int* recv_counts, int* displs, int typ_enum,
static_assert(sizeof(MPI_Count) == sizeof(int64_t));
static_assert(sizeof(MPI_Aint) == sizeof(int64_t));
Copy link
Contributor Author

@scott-routledge2 scott-routledge2 Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation says that MPI_Aint is a "C type that holds any valid address." On my machine it is typedef'd as long int. We'd need to check that it is still 64 bits on windows.

Copy link

codecov bot commented Dec 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@5fb1896). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #89   +/-   ##
=======================================
  Coverage        ?   77.81%           
=======================================
  Files           ?      160           
  Lines           ?    61927           
  Branches        ?     8754           
=======================================
  Hits            ?    48191           
  Misses          ?    11610           
  Partials        ?     2126           

@scott-routledge2
Copy link
Contributor Author

A couple failures on Nightly:
test_iceberg_ddl.py::test_alter_table_drop_column[tabular] and
test_snowflake_catalog.py::test_filter_pushdown_row_count_caching[test_db_snowflake_catalog0]
These tests appear to be flakey, both pass for me locally and I think they should be safe to ignore.

@scott-routledge2 scott-routledge2 marked this pull request as ready for review December 26, 2024 20:20
@scott-routledge2 scott-routledge2 changed the title [WIP][BSE-4419] Support large counts in MPI gatherv [BSE-4419] Support large counts in MPI gatherv Dec 26, 2024
Copy link
Contributor

@njriasan njriasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @scott-routledge2

Copy link
Collaborator

@ehsantn ehsantn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @scott-routledge2 !

@scott-routledge2 scott-routledge2 merged commit 986515d into main Dec 27, 2024
112 of 113 checks passed
@scott-routledge2 scott-routledge2 deleted the scott/update_gatherv_c branch December 27, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants