[SPARK-54619][PYTHON] Add a sanity check for configuration numbers#53359
[SPARK-54619][PYTHON] Add a sanity check for configuration numbers#53359gaogaotiantian wants to merge 2 commits intoapache:masterfrom
Conversation
| def load(self, infile): | ||
| num_conf = read_int(infile) | ||
| for i in range(num_conf): | ||
| if num_conf < 0 or num_conf > 10000: |
There was a problem hiding this comment.
10,000 seems to be too small though, @gaogaotiantian .
There was a problem hiding this comment.
Hm, don't we already only send the allowed confs only? e.g., ArrowPythonRunner.getPythonRunnerConfMap
There was a problem hiding this comment.
Ah, okay so this is adding a sanity check
There was a problem hiding this comment.
Instead of introducing a hard-coded magic number, please provide an environment variable to control this, @gaogaotiantian . Also, it would be great if we can have more higher default values to make it sure.
if num_conf < 0 or num_conf > 10000:
|
Hi @dongjoon-hyun , I don't think this should be controllable by an env var because:
If you really hate the magic number, we can only check the positivity of the number. However, that leaves plenty of unreasonable space for sanity check. Thanks! |
|
Got it. Thank you for the details. |
|
|
||
| def load(self, infile): | ||
| num_conf = read_int(infile) | ||
| for i in range(num_conf): |
There was a problem hiding this comment.
Let's add a bit of background here in a comment. I was also wondering why we need this as we're already controlling the confs to send.
|
I updated the comments, let's see if it's clearer. The rational behind this is - when we change something on the protocol (passing an extra integer at a random place, which is what we often do now), it's common to just "stuck" at somewhere. The test hangs and we don't know what happened - we don't even know where the message might go wrong. Having sanity check in different places in our protocols can stop the communication early so we know the message is already wrong at this place. It's helpful for debugging. More than that, there could be communication errors during production (rare, but possible). There could be dark corners that we forgot to test. Raising an error explicitly is always better than hanging there. That's why I think we should introduce more sanity check and real runtime validation checks on our data passed in. Of course, eventually we might just want a more dedicated RPC, but for now this is helpful. |
|
Merged to master for Apache Spark 4.2.0. Thank you, @gaogaotiantian and all. |
What changes were proposed in this pull request?
Add sanity check for number of configurations being passed.
Why are the changes needed?
This is helpful to recognize malformed message - avoid potential deadlock when the message does not conform to protocol.
Does this PR introduce any user-facing change?
No
How was this patch tested?
This error should not happen and it should not break CI either.
Was this patch authored or co-authored using generative AI tooling?
No.