Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] JdbcFieldInfo is insufficient for proper type inference #35916

Closed
4 tasks
lidavidm opened this issue Jun 5, 2023 · 0 comments · Fixed by #37123
Closed
4 tasks

[Java] JdbcFieldInfo is insufficient for proper type inference #35916

lidavidm opened this issue Jun 5, 2023 · 0 comments · Fixed by #37123

Comments

@lidavidm
Copy link
Member

lidavidm commented Jun 5, 2023

Describe the enhancement requested

The APIs only let us infer the type based on the data type, precision, and scale, effectively. But in the wild, there are always exceptions; for example, the JDBC driver for PostgreSQL maps both TIMESTAMP WITHOUT TIME ZONE and TIMESTAMP WITH TIME ZONE to just Types.TIMESTAMP, so arrow-jdbc dutifully maps them both to naïve timestamps (wrong!).

  • JdbcFieldInfo should expose all the available information it gets.
  • For convenience, it should also work with DatabaseMetaData#getColumns.
  • We need a way to override how JdbcConsumers are constructed. (ArrowVectorIterator hardcodes calls to JdbcToArrowUtils#getConsumer.)
  • We should integration test with popular databases (PostgreSQL, MySQL, SQL Server, SQLite, DuckDB, others?)

Component(s)

Java

@lidavidm lidavidm self-assigned this Jun 5, 2023
lidavidm added a commit to apache/arrow-adbc that referenced this issue Jun 7, 2023
- Adds a custom option to the ADBC JDBC adapter to provide a custom type
mapper from JDBC column info to ArrowType
- Adds a customized JdbcFieldInfo that exposes more fields that are
necessary (e.g. PostgreSQL's JDBC driver exposes `TIMESTAMP WITHOUT TIME
ZONE` and `TIMESTAMP WITH TIME ZONE` as `Types.TIMESTAMP`, so you have
to look at the type name instead)
- Future work: #728
- Future work: apache/arrow#35916
- Future work: #727
- Add (failing) tests demonstrating that the values read are
inconsistent with the assumed types

Fixes #720.
lidavidm pushed a commit that referenced this issue Aug 14, 2023
### Rationale for this change

Existing info is not enough to make completely accurate conversion decisions.

### What changes are included in this PR?

Include the type name reported by the database and the max number of characters as part of the `JdbcFieldInfo`.
* Closes: #35916

Authored-by: Diego Fernandez <aiguo.fernandez@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@lidavidm lidavidm added this to the 14.0.0 milestone Aug 14, 2023
@lidavidm lidavidm removed their assignment Aug 14, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…pache#37123)

### Rationale for this change

Existing info is not enough to make completely accurate conversion decisions.

### What changes are included in this PR?

Include the type name reported by the database and the max number of characters as part of the `JdbcFieldInfo`.
* Closes: apache#35916

Authored-by: Diego Fernandez <aiguo.fernandez@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant