[Bug Report] Data Flow Interruption with Function Parameters and Variable Arguments in Python #17753

gravingPro · 2024-10-14T13:36:52Z

I've encountered issues in CodeQL regarding data flow interruption. Here are the details:

1. Function Parameter Passing Interruption

In the code below:

def read_sql(sql):
    spark.sql()  # sink custom

def process(func, args): 
    func(*args) 

sql = request.json['data']  # Source
process(func=read_sql, args=sql)

CodeQL fails to detect that the tainted variable sql is passed into read_sql when using the process function to handle the function call and its argument. This shows an interruption in data flow tracking during function parameter passing and subsequent invocation with variable arguments.

2. `*args` and `**kwargs` Interruption

The problem with *args (variable positional arguments) and **kwargs (variable keyword arguments) is that when used in a way that impacts data flow, CodeQL can't track accurately. In the given example, using *args in the process function leads to incorrect recognition of the data flow for sql. This issue extends to similar scenarios involving these constructs.

Moreover, these problems also occur in functions related to multithreading and multiprocessing like threading.Thread, mulitprocess.Process, concurrent.futures.ThreadPoolExecutor, and concurrent.futures.ProcessPoolExecutor.

I hope this description helps in identifying and resolving these problems. Looking forward to a timely fix or further guidance on handling such complex data flow tracking scenarios.

Best regards

The text was updated successfully, but these errors were encountered:

rvermeulen · 2024-10-14T19:11:15Z

Hi @gravingPro,

Thanks for the bug report.
We will inform the Python team and get back to you on possible further guidance.

rvermeulen · 2024-10-14T21:07:37Z

Hi @gravingPro,

A quick follow-up question. How is your custom sink defined?
In read_sql the argument sql is currently unused.

gravingPro · 2024-10-15T15:12:44Z

Hi @gravingPro,

A quick follow-up question. How is your custom sink defined? In read_sql the argument sql is currently unused.

It's just a simple example. Any sink can be used here, no matter sql injection or ssrf.

rvermeulen · 2024-10-17T22:44:26Z

Hi @gravingPro,

Is read_sql your sink, because the comment suggests spark.sql() # sink custom is the sink? If spark.sql() is the sink, then the flow will stop at read_sql's parameter sql that is unused. That is, it will never reach your sink so I would like to exclude that possibility.

gravingPro added the question Further information is requested label Oct 14, 2024

rvermeulen added the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Data Flow Interruption with Function Parameters and Variable Arguments in Python #17753

[Bug Report] Data Flow Interruption with Function Parameters and Variable Arguments in Python #17753

gravingPro commented Oct 14, 2024

rvermeulen commented Oct 14, 2024

rvermeulen commented Oct 14, 2024

gravingPro commented Oct 15, 2024

rvermeulen commented Oct 17, 2024

[Bug Report] Data Flow Interruption with Function Parameters and Variable Arguments in Python #17753

[Bug Report] Data Flow Interruption with Function Parameters and Variable Arguments in Python #17753

Comments

gravingPro commented Oct 14, 2024

1. Function Parameter Passing Interruption

2. *args and **kwargs Interruption

rvermeulen commented Oct 14, 2024

rvermeulen commented Oct 14, 2024

gravingPro commented Oct 15, 2024

rvermeulen commented Oct 17, 2024

2. `*args` and `**kwargs` Interruption