-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46635][PYTHON][DOCS] Refine docstring of from_csv/schema_of_csv/to_csv
#44639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Seems there are some issues with GA, need to wait until it's resolved to continue testing. |
| return _invoke_function("schema_of_csv", col, _options_to_str(options)) | ||
|
|
||
|
|
||
| # TODO(SPARK-46654) Re-enable the `Example 2` test after fixing the display |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example 2: Converting a complex StructType to a CSV string displayed different results between Regular Spark and Spark Connect, skip test it in this pr and add TODO(SPARK-46654):
**********************************************************************
3953File "/__w/spark/spark/python/pyspark/sql/connect/functions/builtin.py", line 2232, in pyspark.sql.connect.functions.builtin.to_csv
3954Failed example:
3955 df.select(sf.to_csv(df.value)).show(truncate=False)
3956Expected:
3957 +-----------------------+
3958 |to_csv(value) |
3959 +-----------------------+
3960 |2,Alice,"[100,200,300]"|
3961 +-----------------------+
3962Got:
3963 +--------------------------------------------------------------------------+
3964 |to_csv(value) |
3965 +--------------------------------------------------------------------------+
3966 |2,Alice,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@99c5e30f|
3967 +--------------------------------------------------------------------------+
3968 <BLANKLINE>
3969**********************************************************************
3970 1 of 18 in pyspark.sql.connect.functions.builtin.to_csv
3971***Test Failed*** 1 failures.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/01/10 13:56:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/01/10 13:56:18 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
24/01/10 13:56:18 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT
/_/
Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022 15:24:06)
Spark context Web UI available at http://localhost:4042
Spark context available as 'sc' (master = local[*], app id = local-1704866178807).
SparkSession available as 'spark'.
>>> from pyspark.sql import Row, functions as sf
>>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(sf.to_csv(df.value)).show(truncate=False)
+-----------------------+
|to_csv(value) |
+-----------------------+
|2,Alice,"[100,200,300]"|
+-----------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./bin/pyspark --remote "sc://localhost"
Python 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 4.0.0.dev0
/_/
Using Python version 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022 15:24:06)
Client connected to the Spark Connect server at localhost
SparkSession available as 'spark'.
>>> from pyspark.sql import Row, functions as sf
>>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
>>> df = spark.createDataFrame(data, ("key", "value"))
>>> df.select(sf.to_csv(df.value)).show(truncate=False)
+--------------------------------------------------------------------------+
|to_csv(value) |
+--------------------------------------------------------------------------+
|2,Alice,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@99c5e30f|
+--------------------------------------------------------------------------+
|
Merged into master. Thanks @HyukjinKwon |
What changes were proposed in this pull request?
This pr refine docstring of
from_csv/schema_of_csv/to_csvand add some new examples.Why are the changes needed?
To improve PySpark documentation
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass Github Actions
Was this patch authored or co-authored using generative AI tooling?
No