Skip to content

Conversation

@WweiL
Copy link
Contributor

@WweiL WweiL commented Feb 18, 2023

What changes were proposed in this pull request?

Showing the essential information when throwing InvalidUnsafeRowException. Including where the check failed, and status of the unsafeRow and expctedSchema

Example output:

[UnsafeRowStatus] expectedSchema: StructType(StructField(key1,IntegerType,false),StructField(key2,IntegerType,false),StructField(sum(key1),IntegerType,false),StructField(sum(key2),IntegerType,false)), expectedSchemaNumFields: 4, numFields: 4, bitSetWidthInBytes: 8, rowSizeInBytes: 40 
fieldStatus: 
[UnsafeRowFieldStatus] index: 0, expectedFieldType: IntegerType, isNull: false, isFixedLength: true, offset: -1, size: -1
[UnsafeRowFieldStatus] index: 1, expectedFieldType: IntegerType, isNull: false, isFixedLength: true, offset: -1, size: -1
[UnsafeRowFieldStatus] index: 2, expectedFieldType: IntegerType, isNull: false, isFixedLength: true, offset: -1, size: -1
[UnsafeRowFieldStatus] index: 3, expectedFieldType: IntegerType, isNull: false, isFixedLength: true, offset: -1, size: -1

Why are the changes needed?

Right now if such error happens, it's hard to track where it errored, and what the misbehaved row & schema looks like. With this change these information are more clear.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

@WweiL WweiL changed the title [SPARK-42484] UnsafeRowUtils better error message [SPARK-42484] [SQL] UnsafeRowUtils better error message Feb 18, 2023
@WweiL WweiL force-pushed the SPARK-42484-better-log-unsaferowUtil branch from 5623f25 to 7cb2853 Compare February 21, 2023 18:05
@WweiL
Copy link
Contributor Author

WweiL commented Feb 21, 2023

Some tests always fail with

112 did not equal 104 Invalid stopIndex of a query context. Actual:SQLQueryContext(Some(1),Some(15),Some(15),Some(112),Some(select id from hive.`/home/runner/work/oss-spark/oss-spark/target/tmp/spark-9ff647b3-c1b5-449d-ae54-19a7f3baff9d`),None,None)

Length of string "select id from hive./home/runner/work/oss-spark/oss-spark/target/tmp/spark-9ff647b3-c1b5-449d-ae54-19a7f3baff9d" is 113, so the last index is 112.

I guess that's because my folder name is oss-spark. The two oss- contains 8 chars, that's equal to 112 - 104.

https://github.com/WweiL/oss-spark/actions/runs/4235566927/jobs/7359363624

So I changed the hard-coded length here to a variable-length one.

@MaxGekk @srielau @itholic I found that this is related to the PR (#39977) you pushed / reviewed. Can you guys also take a look? Thanks!

@itholic
Copy link
Contributor

itholic commented Feb 22, 2023

Thanks for catching out, @WweiL !

@WweiL
Copy link
Contributor Author

WweiL commented Feb 22, 2023

@cloud-fan Can you merge this to master when you get a chance? Thank you!

@WweiL WweiL requested a review from cloud-fan February 22, 2023 06:05
@cloud-fan
Copy link
Contributor

cloud-fan commented Feb 23, 2023

thanks, merging to master!

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 27, 2023

Was this merged only to Spark 3.5 (master branch)? The JIRA ticket is not properly marked for fix version as well as status, and we need to make it clear to determine the version range to apply SPARK-42572.

@WweiL
Copy link
Contributor Author

WweiL commented Feb 27, 2023

Was this merged only to Spark 3.5 (master branch)? The JIRA ticket is not properly marked for fix version as well as status, and we need to make it clear to determine the version range to apply SPARK-42572.

Yes this was only merged to master. I've updated the version in SPARK-42572. BTW is there a way to quickly decide what's the version of the current master branch? I thought it was 3.4...

@HeartSaVioR
Copy link
Contributor

Sorry I seem to look at different JIRA ticket. SPARK-42484 contains the fixed version, 3.5.0. That said, SPARK-42572 only needs to be applied to master branch.

HeartSaVioR pushed a commit that referenced this pull request Feb 28, 2023
…ateRowFormat

### What changes were proposed in this pull request?

#40073 accidentally changed the relationship of the two `if` statement in `StateStoreProvider.validateStateRowFormat`. Before they were inclusive, i.e.
```
if (a) {
  // <code>
  if (b) {
    // <code>
  }
}
```

It was changed to parallel, i.e.
```
if (a) {
  // <code>
}
if (b) {
    // <code>
}
```

This PR change it back to the original behavior and add a unit test to prevent it in the future.

### Why are the changes needed?

As above.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test

Closes #40187 from WweiL/SPARK-42572-stateStore-logic-test.

Authored-by: Wei Liu <wei.liu@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants