Create a pydough.from_string
API so that PyDough code can be run programmatically
#236
Labels
documentation
Improvements or additions to documentation
effort - medium
mid-sized issue with average implementation time/difficulty
enhancement
New feature or request
user feature
Adding a new user-facing feature/functionality
Goal: add a new PyDough api
from_string
that takes in 1+ lines of Python PyDough code, which can be programmatically generated by an LLM, and runs an equivalent of the pydough cell magic (withoutto_sql
orto_df
) to obtain a PyDough unqualified node corresponding to the code. Once this API is called, other apis liketo_sql
orto_df
can then be called on it.Specifications:
python_code
can be a multiline string of Python statements. The last line should store the PyDough object for the final answer being sought in a variable.answer_variable
, a string indicating the name that was used bypython_code
to store the result. The default value should be"result"
.python_code
has any indentation of lines fixed, and should be verified (e.g. checks thatanswer_variable
gets defined in the last line). The strictness of these checks, and some of the formatting changes, can perhaps be controlled by additional keyword arugments tofrom_sring
.python_code
is transformed using the AST visitor, in a manners similar to thetransform_cell
API used by the%%pydough
Jupyter extension. The appropriate global/local namespaces should be used so that PyDough can figure out what is/isn't an undefined variable in the code.exec
, using the realglobals()
, and either the reallocals()
or a new dictionary to mimic it (which could start out empty, or start as a shallow copy oflocals()
). The behavior with the local namespace argument is a key design consideration:locals()
, that means that all local variables are accessible, but that all variables defined in the code will be defined afterward.locals()
, that means that no persistent mutation to the namespace occurs, but also that no variables from the local namespace can be used in the PyDough code.locals
, that means that no persistent mutation to the namespace occurs, but all variables from the existing local namespace can still be accessed by the PyDough code.answer_variable
) into the reallocals()
. This means that the final answer can be accessed later in the namespace, but no intermediary variables mutate the namespace.namespace
keyword argument tofrom_string
(one of the options is chosen as the default). Option 1 ="update"
, Option 2 ="empty"
, Option 3 ="snapshot"
, Option 4 =""
. or they can pass in a dictionary directly (which can be one of the ones from options 1-3, or their own which they can selectively re-use between calls -> allows separation between Python variables vs PyDough variables).answer_variable
argument (e.g.result
, by default). This variable can be accessed by indexing into the dictionary, and it should be an unqualified node.from_string
. Afterwards, other operations (explain
,to_sql
,to_df
) can be called on the unqualified node.This API needs to be clearly & thoroughly documented in the user documentation.
Example of what this would look like:
And an example where a name other than
result
is used:The text was updated successfully, but these errors were encountered: