Skip to content

Conversation

@anthonywainer
Copy link
Contributor

@anthonywainer anthonywainer commented Oct 21, 2023

What changes were proposed in this pull request?
Read schema from json without nullable and metadata

Why are the changes needed?
In order to read schema from json and avoid having to set implicit values

Does this PR introduce any user-facing change?
Yes, avoiding filling json with implicit values

How was this patch tested?
Unit tests

When create a StructType from a Python dictionary you use StructType.fromJson or in scala DataType.fromJson

To create a schema can be created as follows from the code below, but it requires to put inside the json: Nullable and Metadata, this is inconsistent because within the DataType class this by default.

schema = {
            "name": "name",
            "type": "string"
        }

StructField.fromJson(schema)

Python Error:

from pyspark.sql.types import StructField
schema = {
            "name": "c1",
            "type": "string"
        }
StructField.fromJson(schema)

>>
Traceback (most recent call last):
  File "code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "pyspark/sql/types.py", line 583, in fromJson
    json["nullable"],
KeyError: 'nullable' 

Scala Error:

    val schema =
      """
        |{
        |    "type": "struct",
        |    "fields": [
        |        {
        |            "name": "c1",
        |            "type": "string",
        |            "nullable": false
        |        }
        |    ]
        |}
        |""".stripMargin
    DataType.fromJson(schema)

>>
Failed to convert the JSON string '{"name":"c1","type":"string"}' to a field.
java.lang.IllegalArgumentException: Failed to convert the JSON string '{"name":"c1","type":"string"}' to a field.
	at org.apache.spark.sql.types.DataType$.parseStructField(DataType.scala:268)
	at org.apache.spark.sql.types.DataType$.$anonfun$parseDataType$1(DataType.scala:225)

@anthonywainer anthonywainer changed the title [SPARK-40820][PYTHON] Creating StructType from Json [SPARK-40820][PYTHON&SCALA] Creating StructType from Json Oct 22, 2023
@anthonywainer anthonywainer marked this pull request as ready for review October 22, 2023 15:42
@anthonywainer
Copy link
Contributor Author

@HyukjinKwon I have re-opened the PR, could you check please?

@HyukjinKwon HyukjinKwon changed the title [SPARK-40820][PYTHON&SCALA] Creating StructType from Json [SPARK-40820][PYTHON][SQL] Creating StructType from Json Oct 23, 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the tests to DataTypeSuite.scala

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_struct_field_from_json(self):
def test_struct_field_from_json(self):
# SPARK-40820: fromJson with only name and type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!

What changes were proposed in this pull request?
Read schema from json without nullable and metadata

Why are the changes needed?
In order to read schema from json and avoid having to set implicit values

Does this PR introduce any user-facing change?
Yes, avoiding filling json with implicit values

How was this patch tested?
Unit tests
@anthonywainer
Copy link
Contributor Author

@HyukjinKwon could you check this, please?

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants