Skip to content

Conversation

@WweiL
Copy link
Contributor

@WweiL WweiL commented May 9, 2024

What changes were proposed in this pull request?

Fix the 3.5 <> 4.0 compatibility test. This is a test only issue. The reason of the failure could be a pickle optimization.
On branch 3.5, there is a listener named "TestListener", but on 4.0, it was renamed to "TestListenerSpark". In the test, the listener should be serialized to the server. But on the 4.0 server, we see this error:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/sql/connect/streaming/worker/listener_worker.py", line 115, in <module>
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/sql/connect/streaming/worker/listener_worker.py", line 73, in main
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/worker_util.py", line 64, in read_command
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/serializers.py", line 473, in loads
AttributeError: Can't get attribute 'TestListener' on <module 'pyspark.sql.tests.connect.streaming.test_parity_listener' from '/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/sql/tests/connect/streaming/test_parity_listener.py'>

It cannot find the TestListener on 4.0 server. This indicates that the 4.0 server is trying to read that TestListener from it's local <module 'pyspark.sql.tests.connect.streaming.test_parity_listener' from '/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/sql/tests/connect/streaming/test_parity_listener.py'> but TestListener is not there. If the TestListener is really serialized and streamed to the server to deserialize, the server should not throw this error. So it could be that pickle is trying to do some fast load, but I'm not really sure about this theory. But anyways an easy fix is to just rename the listener on 3.5.

Also cherry-picked 4d9dbb3 to remove the tables after testing.

Why are the changes needed?

Backward compatibility test fix for Spark Connect

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested on my own branch: https://github.com/WweiL/oss-spark/actions/runs/9021444850

Was this patch authored or co-authored using generative AI tooling?

No

@WweiL
Copy link
Contributor Author

WweiL commented May 9, 2024

cc @HyukjinKwon PTAL!

@HyukjinKwon
Copy link
Member

Merged to branch-3.5.

HyukjinKwon added a commit that referenced this pull request May 9, 2024
…atibility test

### What changes were proposed in this pull request?

Fix the 3.5 <> 4.0 compatibility test. This is a test only issue. The reason of the failure could be a pickle optimization.
On branch 3.5, there is a listener named "TestListener", but on 4.0, it was renamed to "TestListenerSpark". In the test, the listener should be serialized to the server. But on the 4.0 server, we see this error:
```
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/sql/connect/streaming/worker/listener_worker.py", line 115, in <module>
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/sql/connect/streaming/worker/listener_worker.py", line 73, in main
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/worker_util.py", line 64, in read_command
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
  File "/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/serializers.py", line 473, in loads
AttributeError: Can't get attribute 'TestListener' on <module 'pyspark.sql.tests.connect.streaming.test_parity_listener' from '/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/sql/tests/connect/streaming/test_parity_listener.py'>
```

It cannot find the `TestListener` on 4.0 server. This indicates that the 4.0 server is trying to read that `TestListener` from it's local `<module 'pyspark.sql.tests.connect.streaming.test_parity_listener' from '/home/runner/work/oss-spark/oss-spark/python/lib/pyspark.zip/pyspark/sql/tests/connect/streaming/test_parity_listener.py'>` but `TestListener` is not there. If the TestListener is really serialized and streamed to the server to deserialize, the server should not throw this error. So it could be that pickle is trying to do some fast load, but I'm not really sure about this theory. But anyways an easy fix is to just rename the listener on 3.5.

Also cherry-picked 4d9dbb3 to remove the tables after testing.

### Why are the changes needed?

Backward compatibility test fix for Spark Connect

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tested on my own branch: https://github.com/WweiL/oss-spark/actions/runs/9021444850

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46513 from WweiL/wweil-3.5.

Lead-authored-by: Wei Liu <wei.liu@databricks.com>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon closed this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants