Add PySpark support #8

akmalsoliev · 2025-01-30T16:17:25Z

This pull request includes several updates to dependencies and enhancements to the data frame creation utilities and type validation.

Added pyarrow>=11.0.0 to the main dependencies in pyproject.toml.
Added pyspark>=3.5.4 and marimo>=0.10.17 to the development dependencies in pyproject.toml.

Updated tests/utils/create_frames.py to import generate_temporary_column_name from narwhals and DataFrame, SparkSession, and lit from pyspark.sql.
Added a new function spark_df to create a Spark DataFrame from a dictionary of lists, including handling of null columns and repartitioning.
Modified the create_frame_fixture function to include pyspark in the list of data frame types.

Updated validoopsie/validation_catalogue/TypeValidation/type_check.py to import pyarrow and use it in the __call__ method for type validation, replacing the previous method of using narwhals native namespace. [1] [2]

akmalsoliev added 3 commits January 23, 2025 20:39

NEW: PySpark test suite

8df6617

fix: Fix PySpark null values #7

0f6e005

new: full PySpark support #5

869fb02

akmalsoliev self-assigned this Jan 30, 2025

akmalsoliev linked an issue Jan 30, 2025 that may be closed by this pull request

Add PySpark to test suite #5

Closed

akmalsoliev added bug Something isn't working enhancement New feature or request labels Jan 30, 2025

Merge branch 'main' into 5-add-pyspark-to-test-suite

203bba7

akmalsoliev merged commit df2f2d4 into main Jan 30, 2025

akmalsoliev deleted the 5-add-pyspark-to-test-suite branch January 30, 2025 16:18

Provide feedback