Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PySpark support #8

Merged
merged 4 commits into from
Jan 30, 2025
Merged

Add PySpark support #8

merged 4 commits into from
Jan 30, 2025

Conversation

akmalsoliev
Copy link
Owner

This pull request includes several updates to dependencies and enhancements to the data frame creation utilities and type validation.

Dependency Updates:

  • Added pyarrow>=11.0.0 to the main dependencies in pyproject.toml.
  • Added pyspark>=3.5.4 and marimo>=0.10.17 to the development dependencies in pyproject.toml.

Enhancements to Data Frame Creation:

  • Updated tests/utils/create_frames.py to import generate_temporary_column_name from narwhals and DataFrame, SparkSession, and lit from pyspark.sql.
  • Added a new function spark_df to create a Spark DataFrame from a dictionary of lists, including handling of null columns and repartitioning.
  • Modified the create_frame_fixture function to include pyspark in the list of data frame types.

Type Validation Enhancements:

  • Updated validoopsie/validation_catalogue/TypeValidation/type_check.py to import pyarrow and use it in the __call__ method for type validation, replacing the previous method of using narwhals native namespace. [1] [2]

@akmalsoliev akmalsoliev self-assigned this Jan 30, 2025
@akmalsoliev akmalsoliev linked an issue Jan 30, 2025 that may be closed by this pull request
@akmalsoliev akmalsoliev added bug Something isn't working enhancement New feature or request labels Jan 30, 2025
@akmalsoliev akmalsoliev merged commit df2f2d4 into main Jan 30, 2025
@akmalsoliev akmalsoliev deleted the 5-add-pyspark-to-test-suite branch January 30, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add PySpark to test suite
1 participant