Implement specialized get_first and get_records method in OracleHook#61144
Conversation
…nhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results.
henry3260
left a comment
There was a problem hiding this comment.
I have a small concern regarding performance here. Iterating through rows and calling .read() on each LOB object might trigger N+1 network round-trips to the database .
Have we considered using outputtypehandler to avoid this?
Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
jscheffl
left a comment
There was a problem hiding this comment.
Not an Oracle expert, approving assuming you know what you are doing... assuming you consider the feedback of @henry3260 - Using these methods really then loads all data into memory, so can be slow and making an OOM compared to a generator that is passed back. Maybe a note should be added to pydocs.
Thanks for raising this — the concern is valid for sure, and a good point. Calling .read() on Using an outputtypehandler would shift when the LOB is read (during fetch rather than post-processing), but it would not avoid the underlying cost of transferring and materializing the LOB data. If you don't want to materialize those, you could always use the run method as there no handler is specified by default. Since these methods (e.g. That said, I’m open, but I personally think that should be done in a separate PR anyway if we would go that way. |
Good point on the pydocs, will add it. You can read my reply regarding the handlers why materializing those LOB's is needed anyway. |
Thanks for the explanation! I agree that materialization is necessary for XCom serialization regardless. Just a small clarification: my concern regarding N+1 was about network latency (round-trips) rather than data volume. Using outputtypehandler allows the driver to prefetch LOB data within the same fetch round-trips, whereas explicit .read() calls usually force separate network packets for each row. However, I fully agree that this optimization is out of scope for this PR, as the priority here is correctness and fixing the serialization crash. Let's merge this to fix the immediate bug, and we can look into optimizing get_conn with outputtypehandler in a future PR. |
Hello @henry3260, thank you for clarifying this as I wasn't exactly aware of the |
…ed to return a tuple
…to avoid serialization issues with XCom's (apache#61144) * refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results. * refactor: Added unit test for get_first and get_records method * refactor: Reformatted get_first method in OracleHook * refactor: Fixed test_test_connection_use_dual_table * Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> * refactor: Moved Oracle handlers in separate module like in common.sql * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Added handlers in python-module of Oracle hook * refactor: Added handlers test for Oracle provider * refactor: Updated hooks in provider info * refactor: Reformatted test files * refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple --------- Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…to avoid serialization issues with XCom's (apache#61144) * refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results. * refactor: Added unit test for get_first and get_records method * refactor: Reformatted get_first method in OracleHook * refactor: Fixed test_test_connection_use_dual_table * Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> * refactor: Moved Oracle handlers in separate module like in common.sql * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Added handlers in python-module of Oracle hook * refactor: Added handlers test for Oracle provider * refactor: Updated hooks in provider info * refactor: Reformatted test files * refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple --------- Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…to avoid serialization issues with XCom's (apache#61144) * refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results. * refactor: Added unit test for get_first and get_records method * refactor: Reformatted get_first method in OracleHook * refactor: Fixed test_test_connection_use_dual_table * Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> * refactor: Moved Oracle handlers in separate module like in common.sql * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Added handlers in python-module of Oracle hook * refactor: Added handlers test for Oracle provider * refactor: Updated hooks in provider info * refactor: Reformatted test files * refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple --------- Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…to avoid serialization issues with XCom's (apache#61144) * refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results. * refactor: Added unit test for get_first and get_records method * refactor: Reformatted get_first method in OracleHook * refactor: Fixed test_test_connection_use_dual_table * Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> * refactor: Moved Oracle handlers in separate module like in common.sql * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Added handlers in python-module of Oracle hook * refactor: Added handlers test for Oracle provider * refactor: Updated hooks in provider info * refactor: Reformatted test files * refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple --------- Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…to avoid serialization issues with XCom's (apache#61144) * refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results. * refactor: Added unit test for get_first and get_records method * refactor: Reformatted get_first method in OracleHook * refactor: Fixed test_test_connection_use_dual_table * Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> * refactor: Moved Oracle handlers in separate module like in common.sql * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Fixed return type of fetch_one_handler in common.sql provider * refactor: Added handlers in python-module of Oracle hook * refactor: Added handlers test for Oracle provider * refactor: Updated hooks in provider info * refactor: Reformatted test files * refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple --------- Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
Was generative AI tooling used to co-author this PR?
This PR overrides get_first and get_records in OracleHook with Oracle-specific handlers to ensure OracleDB types such as CLOB are fully deserialized before being returned.
This is required when querying Oracle-specific column types using operators like GenericTransfer and SQLExecuteQueryOperator. The default DBAPI handlers return raw Oracle LOB objects, which are not XCom-serializable. As a result, tasks that return query results containing (C)LOB columns may fail when the result is pushed to XCom.
By eagerly reading the LOB contents in the hook, this PR guarantees that get_first and get_records only return plain Python types, making the results safe to serialize and pass through XCom.
This change is closely related to PR #60152, which addressed triggerer crashes caused by unserializable objects in TriggerEvent payloads. Together, these changes ensure that Oracle (C)LOB values are handled correctly both in the triggerer and in operators relying on OracleHook, preventing serialization errors throughout the execution flow.
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.