[SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` #44639

LuciferYang · 2024-01-09T11:49:17Z

What changes were proposed in this pull request?

This pr refine docstring of from_csv/schema_of_csv/to_csv and add some new examples.

Why are the changes needed?

To improve PySpark documentation

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass Github Actions

Was this patch authored or co-authored using generative AI tooling?

No

LuciferYang · 2024-01-09T14:04:52Z

Error: Internal server error occurred while resolving "actions/cache@v3". Internal server error occurred while resolving "actions/checkout@v4". Internal server error occurred while resolving "actions/setup-java@v4". Internal server error occurred while resolving "actions/upload-artifact@v3"

Seems there are some issues with GA, need to wait until it's resolved to continue testing.

LuciferYang · 2024-01-10T05:52:17Z

python/pyspark/sql/functions/builtin.py

    return _invoke_function("schema_of_csv", col, _options_to_str(options))

-
+# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display


Example 2: Converting a complex StructType to a CSV string displayed different results between Regular Spark and Spark Connect, skip test it in this pr and add TODO(SPARK-46654):

********************************************************************** 3953File "/__w/spark/spark/python/pyspark/sql/connect/functions/builtin.py", line 2232, in pyspark.sql.connect.functions.builtin.to_csv 3954Failed example: 3955 df.select(sf.to_csv(df.value)).show(truncate=False) 3956Expected: 3957 +-----------------------+ 3958 |to_csv(value) | 3959 +-----------------------+ 3960 |2,Alice,"[100,200,300]"| 3961 +-----------------------+ 3962Got: 3963 +--------------------------------------------------------------------------+ 3964 |to_csv(value) | 3965 +--------------------------------------------------------------------------+ 3966 |2,Alice,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@99c5e30f| 3967 +--------------------------------------------------------------------------+ 3968 <BLANKLINE> 3969********************************************************************** 3970 1 of 18 in pyspark.sql.connect.functions.builtin.to_csv 3971***Test Failed*** 1 failures.

Python 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/01/10 13:56:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 24/01/10 13:56:18 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/01/10 13:56:18 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022 15:24:06) Spark context Web UI available at http://localhost:4042 Spark context available as 'sc' (master = local[*], app id = local-1704866178807). SparkSession available as 'spark'. >>> from pyspark.sql import Row, functions as sf >>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))] >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(sf.to_csv(df.value)).show(truncate=False) +-----------------------+ |to_csv(value) | +-----------------------+ |2,Alice,"[100,200,300]"| +-----------------------+

./bin/pyspark --remote "sc://localhost" Python 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin Type "help", "copyright", "credits" or "license" for more information. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0.dev0 /_/ Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022 15:24:06) Client connected to the Spark Connect server at localhost SparkSession available as 'spark'. >>> from pyspark.sql import Row, functions as sf >>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))] >>> df = spark.createDataFrame(data, ("key", "value")) >>> df.select(sf.to_csv(df.value)).show(truncate=False) +--------------------------------------------------------------------------+ |to_csv(value) | +--------------------------------------------------------------------------+ |2,Alice,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@99c5e30f| +--------------------------------------------------------------------------+

LuciferYang · 2024-01-10T08:48:30Z

Merged into master. Thanks @HyukjinKwon

LuciferYang added 2 commits January 9, 2024 19:39

init

6cbbd64

test SPARK_ANSI_SQL_MODE true

56cd37f

LuciferYang marked this pull request as draft January 9, 2024 11:49

github-actions bot added SQL INFRA PYTHON labels Jan 9, 2024

HyukjinKwon approved these changes Jan 10, 2024

View reviewed changes

LuciferYang added 2 commits January 10, 2024 10:24

skip one

96bede0

test SPARK_ANSI_SQL_MODE faslse

32ebeca

github-actions bot removed the INFRA label Jan 10, 2024

LuciferYang added 2 commits January 10, 2024 13:49

add todo

2084f5f

add todo

f31633d

LuciferYang commented Jan 10, 2024

View reviewed changes

HyukjinKwon approved these changes Jan 10, 2024

View reviewed changes

format

5810358

LuciferYang marked this pull request as ready for review January 10, 2024 07:05

LuciferYang closed this in bda9957 Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` #44639

[SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` #44639

Uh oh!

LuciferYang commented Jan 9, 2024

Uh oh!

LuciferYang commented Jan 9, 2024

Uh oh!

LuciferYang Jan 10, 2024

Uh oh!

This comment was marked as outdated.

Uh oh!

LuciferYang Jan 10, 2024

Uh oh!

LuciferYang Jan 10, 2024

Uh oh!

LuciferYang commented Jan 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return _invoke_function("schema_of_csv", col, _options_to_str(options))


		# TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display

[SPARK-46635][PYTHON][DOCS] Refine docstring of from_csv/schema_of_csv/to_csv #44639

[SPARK-46635][PYTHON][DOCS] Refine docstring of from_csv/schema_of_csv/to_csv #44639

Uh oh!

Conversation

LuciferYang commented Jan 9, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang commented Jan 9, 2024

Uh oh!

LuciferYang Jan 10, 2024

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

LuciferYang Jan 10, 2024

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jan 10, 2024

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Jan 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` #44639

[SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` #44639