Skip to content

Comments

Implement specialized get_first and get_records method in OracleHook#61144

Merged
dabla merged 20 commits intoapache:mainfrom
dabla:feature/oracle-hook-resolve-oracledb-types-handler
Feb 1, 2026
Merged

Implement specialized get_first and get_records method in OracleHook#61144
dabla merged 20 commits intoapache:mainfrom
dabla:feature/oracle-hook-resolve-oracledb-types-handler

Conversation

@dabla
Copy link
Contributor

@dabla dabla commented Jan 27, 2026


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

This PR overrides get_first and get_records in OracleHook with Oracle-specific handlers to ensure OracleDB types such as CLOB are fully deserialized before being returned.

This is required when querying Oracle-specific column types using operators like GenericTransfer and SQLExecuteQueryOperator. The default DBAPI handlers return raw Oracle LOB objects, which are not XCom-serializable. As a result, tasks that return query results containing (C)LOB columns may fail when the result is pushed to XCom.

By eagerly reading the LOB contents in the hook, this PR guarantees that get_first and get_records only return plain Python types, making the results safe to serialize and pass through XCom.

This change is closely related to PR #60152, which addressed triggerer crashes caused by unserializable objects in TriggerEvent payloads. Together, these changes ensure that Oracle (C)LOB values are handled correctly both in the triggerer and in operators relying on OracleHook, preventing serialization errors throughout the execution flow.


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

…nhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results.
Copy link
Contributor

@henry3260 henry3260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a small concern regarding performance here. Iterating through rows and calling .read() on each LOB object might trigger N+1 network round-trips to the database .
Have we considered using outputtypehandler to avoid this?

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an Oracle expert, approving assuming you know what you are doing... assuming you consider the feedback of @henry3260 - Using these methods really then loads all data into memory, so can be slow and making an OOM compared to a generator that is passed back. Maybe a note should be added to pydocs.

@dabla
Copy link
Contributor Author

dabla commented Jan 28, 2026

I have a small concern regarding performance here. Iterating through rows and calling .read() on each LOB object might trigger N+1 network round-trips to the database .
Have we considered using outputtypehandler to avoid this?

Thanks for raising this — the concern is valid for sure, and a good point.

Calling .read() on LOBs can indeed trigger additional round-trips, depending on driver configuration and LOB size. However, when (C)LOB columns are selected and the result is returned from get_first / get_records, the LOB contents must be fully materialized anyway in order to be XCom-serializable.

Using an outputtypehandler would shift when the LOB is read (during fetch rather than post-processing), but it would not avoid the underlying cost of transferring and materializing the LOB data. If you don't want to materialize those, you could always use the run method as there no handler is specified by default.

Since these methods (e.g. get_records and get_first) are used by operators that return results (e.g. GenericTransfer, SQLExecuteQueryOperator), returning raw Oracle LOB objects is not a viable option. This PR ensures correctness by guaranteeing that only serializable Python types are returned.

That said, I’m open, but I personally think that should be done in a separate PR anyway if we would go that way.

@dabla
Copy link
Contributor Author

dabla commented Jan 28, 2026

Not an Oracle expert, approving assuming you know what you are doing... assuming you consider the feedback of @henry3260 - Using these methods really then loads all data into memory, so can be slow and making an OOM compared to a generator that is passed back. Maybe a note should be added to pydocs.

Good point on the pydocs, will add it. You can read my reply regarding the handlers why materializing those LOB's is needed anyway.

@henry3260
Copy link
Contributor

I have a small concern regarding performance here. Iterating through rows and calling .read() on each LOB object might trigger N+1 network round-trips to the database .
Have we considered using outputtypehandler to avoid this?

Thanks for raising this — the concern is valid for sure, and a good point.

Calling .read() on LOBs can indeed trigger additional round-trips, depending on driver configuration and LOB size. However, when (C)LOB columns are selected and the result is returned from get_first / get_records, the LOB contents must be fully materialized anyway in order to be XCom-serializable.

Using an outputtypehandler would shift when the LOB is read (during fetch rather than post-processing), but it would not avoid the underlying cost of transferring and materializing the LOB data. If you don't want to materialize those, you could always use the run method as there no handler is specified by default.

Since these methods (e.g. get_records and get_first) are used by operators that return results (e.g. GenericTransfer, SQLExecuteQueryOperator), returning raw Oracle LOB objects is not a viable option. This PR ensures correctness by guaranteeing that only serializable Python types are returned.

That said, I’m open, but I personally think that should be done in a separate PR anyway if we would go that way.

Thanks for the explanation! I agree that materialization is necessary for XCom serialization regardless.

Just a small clarification: my concern regarding N+1 was about network latency (round-trips) rather than data volume. Using outputtypehandler allows the driver to prefetch LOB data within the same fetch round-trips, whereas explicit .read() calls usually force separate network packets for each row.

However, I fully agree that this optimization is out of scope for this PR, as the priority here is correctness and fixing the serialization crash. Let's merge this to fix the immediate bug, and we can look into optimizing get_conn with outputtypehandler in a future PR.

@dabla
Copy link
Contributor Author

dabla commented Jan 29, 2026

Just a small clarification: my concern regarding N+1 was about network latency (round-trips) rather than data volume. Using outputtypehandler allows the driver to prefetch LOB data within the same fetch round-trips, whereas explicit .read() calls usually force separate network packets for each row.

Hello @henry3260, thank you for clarifying this as I wasn't exactly aware of the outputtypehandler and it's a valid point. This is something to think about as this would indeed be a nice optimization.

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dabla dabla merged commit 0a5c9c4 into apache:main Feb 1, 2026
90 checks passed
morelgeorge pushed a commit to morelgeorge/airflow that referenced this pull request Feb 1, 2026
…to avoid serialization issues with XCom's (apache#61144)

* refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results.

* refactor: Added unit test for get_first and get_records method

* refactor: Reformatted get_first method in OracleHook

* refactor: Fixed test_test_connection_use_dual_table

* Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>

* refactor: Moved Oracle handlers in separate module like in common.sql

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Added handlers in python-module of Oracle hook

* refactor: Added handlers test for Oracle provider

* refactor: Updated hooks in provider info

* refactor: Reformatted test files

* refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple

---------

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
shashbha14 pushed a commit to shashbha14/airflow that referenced this pull request Feb 2, 2026
…to avoid serialization issues with XCom's (apache#61144)

* refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results.

* refactor: Added unit test for get_first and get_records method

* refactor: Reformatted get_first method in OracleHook

* refactor: Fixed test_test_connection_use_dual_table

* Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>

* refactor: Moved Oracle handlers in separate module like in common.sql

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Added handlers in python-module of Oracle hook

* refactor: Added handlers test for Oracle provider

* refactor: Updated hooks in provider info

* refactor: Reformatted test files

* refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple

---------

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
jason810496 pushed a commit to abhijeets25012-tech/airflow that referenced this pull request Feb 3, 2026
…to avoid serialization issues with XCom's (apache#61144)

* refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results.

* refactor: Added unit test for get_first and get_records method

* refactor: Reformatted get_first method in OracleHook

* refactor: Fixed test_test_connection_use_dual_table

* Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>

* refactor: Moved Oracle handlers in separate module like in common.sql

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Added handlers in python-module of Oracle hook

* refactor: Added handlers test for Oracle provider

* refactor: Updated hooks in provider info

* refactor: Reformatted test files

* refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple

---------

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
jhgoebbert pushed a commit to jhgoebbert/airflow_Owen-CH-Leung that referenced this pull request Feb 8, 2026
…to avoid serialization issues with XCom's (apache#61144)

* refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results.

* refactor: Added unit test for get_first and get_records method

* refactor: Reformatted get_first method in OracleHook

* refactor: Fixed test_test_connection_use_dual_table

* Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>

* refactor: Moved Oracle handlers in separate module like in common.sql

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Added handlers in python-module of Oracle hook

* refactor: Added handlers test for Oracle provider

* refactor: Updated hooks in provider info

* refactor: Reformatted test files

* refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple

---------

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
choo121600 pushed a commit to choo121600/airflow that referenced this pull request Feb 22, 2026
…to avoid serialization issues with XCom's (apache#61144)

* refactor: Override the get_first and get_records method with Oracle enhanced handlers to make sure OracleDB types like CLOB's are correctly deserialized before return the results.

* refactor: Added unit test for get_first and get_records method

* refactor: Reformatted get_first method in OracleHook

* refactor: Fixed test_test_connection_use_dual_table

* Update providers/oracle/src/airflow/providers/oracle/hooks/oracle.py

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>

* refactor: Moved Oracle handlers in separate module like in common.sql

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Fixed return type of fetch_one_handler in common.sql provider

* refactor: Added handlers in python-module of Oracle hook

* refactor: Added handlers test for Oracle provider

* refactor: Updated hooks in provider info

* refactor: Reformatted test files

* refactor: Updated test_fetch_one_handler as fetchone method is supposed to return a tuple

---------

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants