Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for converting pyarrow DataFrame to numpy #98

Merged
merged 6 commits into from
Jul 29, 2024
Merged

Commits on Jul 29, 2024

  1. Add support for converting pyarrow DataFrame to numpy

    Introduced a new function `convert_to_numpy` to reverse convert pyarrow DataFrame to numpy dtypes. Added a unit test to ensure the conversions are accurate and consistent with expected numpy dtypes.
    
    Signed-off-by: DanielAvdar <66269169+DanielAvdar@users.noreply.github.com>
    DanielAvdar committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    76d4821 View commit details
    Browse the repository at this point in the history
  2. Add convert_to_numpy to module exports and imports

    Included convert_to_numpy in __all__ for module exports and updated relevant imports in tests. This change ensures convert_to_numpy is accessible and properly tested across the library.
    
    Signed-off-by: DanielAvdar <66269169+DanielAvdar@users.noreply.github.com>
    DanielAvdar committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    d598bf6 View commit details
    Browse the repository at this point in the history
  3. Add conversion for string dtype in reverse converter

    Previously, strings were not explicitly handled in the reverse converter, which could lead to inconsistencies. This commit ensures that columns with string dtype are converted to object type, improving the dtype consistency and compatibility.
    
    Signed-off-by: DanielAvdar <66269169+DanielAvdar@users.noreply.github.com>
    DanielAvdar committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    a0183e3 View commit details
    Browse the repository at this point in the history
  4. Add idempotency and immutability checks to test_convert_to_numpy

    Ensure the conversion from pandas to numpy via pyarrow is idempotent by asserting the result is unchanged on a second conversion. Additionally, check that the original dataframe remains unaltered during the process.
    
    Signed-off-by: DanielAvdar <66269169+DanielAvdar@users.noreply.github.com>
    DanielAvdar committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    5c247b1 View commit details
    Browse the repository at this point in the history
  5. Update Codecov workflow paths for more accurate triggers

    Changed the paths in the Codecov workflow configuration to ensure it only runs when relevant files, including the codecov.yml, are modified. This improves the efficiency of our CI/CD pipeline by reducing unnecessary runs.
    
    Signed-off-by: DanielAvdar <66269169+DanielAvdar@users.noreply.github.com>
    DanielAvdar committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    a1fd60f View commit details
    Browse the repository at this point in the history
  6. Add unit tests for reverse conversion functions

    Created tests for converting data back from PyArrow to Numpy. These tests ensure the correct data types after conversion for timestamps, half floats, durations, and strings. They validate that PyArrow types do not persist in the resulting Numpy dataframes.
    
    Signed-off-by: DanielAvdar <66269169+DanielAvdar@users.noreply.github.com>
    DanielAvdar committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    e7220c0 View commit details
    Browse the repository at this point in the history