Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4142: [Java] JDBC Array -> Arrow ListVector #3294

Closed
wants to merge 1 commit into from
Closed

ARROW-4142: [Java] JDBC Array -> Arrow ListVector #3294

wants to merge 1 commit into from

Conversation

mikepigott
Copy link
Contributor

https://issues.apache.org/jira/browse/ARROW-4142

This adds support for reading JDBC arrays and converting them to Arrow ListVectors. JDBC does not provide a great way to get the array type; there is a ResultSet object for walking the array, but its ResultSetMetaData may not contain the right value type (H2, for example, returns a JDBC type of NULL).

This is based on #3134, which includes ARROW-3965 and ARROW-3966.

I found Arrow arrays to be very confusing, and I am not sure if I am using them correctly here.  One thing I noticed was if I added a null array to a ListVector of VarCharVectors, the next value in the VarCharVector would be empty.  I would appreciate any help on why!  The ListVector unit tests weren't very helpful.

For all other cases, this code seems to work.  I look forward to your review!

@codecov-io
Copy link

codecov-io commented Jan 2, 2019

Codecov Report

Merging #3294 into master will increase coverage by 1.24%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3294      +/-   ##
==========================================
+ Coverage   87.79%   89.03%   +1.24%     
==========================================
  Files         689      463     -226     
  Lines       84209    55085   -29124     
  Branches     1081        0    -1081     
==========================================
- Hits        73929    49045   -24884     
+ Misses      10165     6040    -4125     
+ Partials      115        0     -115
Impacted Files Coverage Δ
cpp/src/arrow/csv/reader.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/array/builder_union.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/array/builder_union.cc 4.16% <0%> (-95.84%) ⬇️
cpp/src/arrow/csv/reader.cc 0.53% <0%> (-90.38%) ⬇️
cpp/src/arrow/adapters/orc/adapter.cc 0.26% <0%> (-73.04%) ⬇️
cpp/src/arrow/csv/options.h 66.66% <0%> (-33.34%) ⬇️
cpp/src/parquet/column_scanner.cc 59.09% <0%> (-22.73%) ⬇️
cpp/src/arrow/vendored/xxhash/xxhash.c 51.26% <0%> (-22.34%) ⬇️
cpp/src/arrow/memory_pool.cc 68.8% <0%> (-16.52%) ⬇️
cpp/src/arrow/io/interfaces.cc 40% <0%> (-15%) ⬇️
... and 275 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 240c469...66376dd. Read the comment docs.

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to the java codebase so don't take my questions/suggestions as gospel.


/**
* Builds a <code>JdbcFieldInfo</code> using only the {@link java.sql.Types} type. Do not use this constructor
* if the field type is {@link java.sql.Types#DECIMAL} or {@link java.sql.Types#NUMERIC}; the precision and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put a check for this in the constructor for this condition, in case someone doesn't read the javadoc?


/**
* Builds a <code>JdbcFieldInfo</code> from the {@link java.sql.Types} type, precision, and scale.
* Use this constructor for {@link java.sql.Types#DECIMAL} and {@link java.sql.Types#NUMERIC} types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe warn if this is used with other types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, log a warning? I don't think this project logs much of anything.

public JdbcFieldInfo(ResultSetMetaData rsmd, int column) throws SQLException {
Preconditions.checkNotNull(rsmd, "ResultSetMetaData cannot be null.");
Preconditions.checkArgument(column > 0, "ResultSetMetaData columns have indices starting at 1.");
Preconditions.checkArgument(column <= rsmd.getColumnCount(), "The index must be within the number of columns");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these two (probably more for the second one, it might pay to use the templating feature of Preconditions.checkArgument to pass through the bad values

throws SQLException, IOException {
Preconditions.checkNotNull(connection, "JDBC connection object can not be null");
Preconditions.checkArgument(query != null && query.length() > 0, "SQL query can not be null or empty");
Preconditions.checkNotNull(config, "The configuration cannot be null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style consistency. it looks like you combined null and a validity check but here you split them for config

*/
public static VectorSchemaRoot sqlToArrow(ResultSet resultSet, JdbcToArrowConfig config)
throws SQLException, IOException {
Preconditions.checkNotNull(resultSet, "JDBC ResultSet object can not be null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to the java code base, but given all the nullness checks it might be nice to follow a convention things are not null by default and use "@nullable" annotations when they can be null (might be worth discussing on the mailing list)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just trying to follow the style in the existing code. I think stylistic changes like these should be a different PR ...

break;
final Map<String, String> metadata;
if (config.getIncludeMetadata()) {
metadata = new HashMap<String, String>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use a Map here vs POJO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of a POJO might be you can move this logic to the constructor of that object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it is getting passed through to the existing schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Also, this seems to be a merge conflict that was resolved the wrong way; the metadata logic was implemented in another PR.


final ArrowType arrowType = getArrowTypeForJdbcField(new JdbcFieldInfo(rsmd, i), config.getCalendar());
if (arrowType != null) {
final FieldType fieldType = new FieldType(true, arrowType, null, metadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a /* parameter name */ for null

JdbcToArrowConfig config)
throws SQLException, IOException {

final JdbcFieldInfo fieldInfo = getJdbcFieldInfoForArraySubType(resultSet.getMetaData(), arrayIndex, config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to move this to be a method on config?


final JdbcFieldInfo fieldInfo = getJdbcFieldInfoForArraySubType(resultSet.getMetaData(), arrayIndex, config);
if (fieldInfo == null) {
throw new IllegalStateException("Column " + arrayIndex + " is an array of unknown type.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems more like config wasn't setup correctly so illegal argument exception might be better.


while (rs.next()) {
// The second column contains the actual data.
jdbcToFieldVector(rs, 2, fieldInfo.getJdbcType(), valueCount + arrayRowCount, fieldVector, config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make 2, a named constant?

* @return The corresponding <code>ArrowType</code>.
* @throws NullPointerException if either <code>fieldInfo</code> or <code>calendar</code> are <code>null</code>.
*/
public static ArrowType getArrowTypeForJdbcField(JdbcFieldInfo fieldInfo, Calendar calendar) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could potentially move this to a method on fieldInfo

Copy link
Contributor Author

@mikepigott mikepigott Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an idea. This logic, in some form, has always been in this file, which is why I left it there.

break;
default:
// no-op, shouldn't get here
arrowType = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to throw an exception instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or at least log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't trying to stray too far from what the original author was doing, since his work did pass code review ...

The diff is a little confusing, but the original logic is on the old logic is on the old (red) line 249.

}
rowCount++;
}
root.setRowCount(rowCount);
}

private static void jdbcToFieldVector(
ResultSet rs,
int i,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a name indicating this is the column index might be a little clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.

/* Uses the configuration to determine what the array sub-type JdbcFieldInfo is.
* If no sub-type can be found, returns null.
*/
private static JdbcFieldInfo getJdbcFieldInfoForArraySubType(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to JdbcToArrowConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid making JdbcToArrowConfig too "smart." This logic can certainly be moved there, but I wanted the configuration to be a POJO that the JdbcToArrow logic pulled from, rather than define parts of the JdbcToArrow behavior.

@mikepigott
Copy link
Contributor Author

Thanks for the feedback! I'll cycle back to this once ARROW-3966 (#3134) and ARROW-3923 (#3066) are approved and merged. For now, do you know enough about Arrow ListVectors to know why the Varchar ListVector doesn't quite work?

@mikepigott
Copy link
Contributor Author

I tried to document my issue in the PR description, but it may be easiest to see with a test. I'm in my phone so I can't look it up, but one of the unit tests I wrote builds a VarChar ListVector. It passes now because I add the null value last. If you change the test to add the null earlier in the test, it will fail. The value after the null will not be found. I'm not sure why.

@kou kou changed the title ARROW-4142 [Java] JDBC Array -> Arrow ListVector ARROW-4142: [Java] JDBC Array -> Arrow ListVector Feb 4, 2019
@mikepigott
Copy link
Contributor Author

mikepigott commented Feb 7, 2019

I just merged with the latest master, and got all of the tests to pass. (I haven't responded to any of the feedback on this PR yet, but I will this week.)

My last commit modified the unit test to show the error I'm getting when using a VarChar ListVector with a null value. There are three elements in the VarChar ListVector:

  • "4"
  • null
  • "5"

All of these are written to the ListVector successfully, but only the first two elements come back out again. "5" comes back out as an empty string. I'm not sure why.

[INFO] Running org.apache.arrow.adapter.jdbc.h2.JdbcToArrowArrayTest
[ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.12 s <<< FAILURE! - in org.apache.arrow.adapter.jdbc.h2.JdbcToArrowArrayTest
[ERROR] testJdbcToArrowWithNulls(org.apache.arrow.adapter.jdbc.h2.JdbcToArrowArrayTest)  Time elapsed: 0.04 s  <<< FAILURE!
java.lang.AssertionError: array lengths differed, expected.length=1 actual.length=0
	at org.apache.arrow.adapter.jdbc.h2.JdbcToArrowArrayTest.testStringList(JdbcToArrowArrayTest.java:283)
	at org.apache.arrow.adapter.jdbc.h2.JdbcToArrowArrayTest.testJdbcToArrowWithNulls(JdbcToArrowArrayTest.java:218)

Thanks for your help!

@praveenbingo
Copy link
Contributor

@mikepigott Thanks for the test case, there seems to be a bug in the way ListVector handles the value counts for its child vector. Will discuss with @pravindra and get back on this.

@xhochy
Copy link
Member

xhochy commented Feb 8, 2019

@mikepigott Can you rebase this on master or do you need help with that?

@mikepigott
Copy link
Contributor Author

mikepigott commented Feb 8, 2019

Hi @xhochy - I can merge master into this branch if you need; rebasing always gets me into trouble. Should I expect my unit test to work now?

@xhochy
Copy link
Member

xhochy commented Feb 8, 2019

As we sadly rebase master on release tags, we also need to rebase this on master.
This won't make a difference on the unittests though.

@wesm
Copy link
Member

wesm commented Feb 8, 2019

Rebased. @mikepigott please avoid git merge in this project -- it makes things harder on us maintainers because of the way we maintain master. If you ever get into a pickle with rebasing please let us know and we can help you

@mikepigott
Copy link
Contributor Author

Sure thing. Sorry for the trouble.

@wesm
Copy link
Member

wesm commented Feb 8, 2019

@xhochy this branch looks right to me, compare with your PR

@wesm
Copy link
Member

wesm commented Feb 8, 2019

Ugh. I'm going to force push the branch again

@wesm
Copy link
Member

wesm commented Feb 8, 2019

@mikepigott I just edited the branch again. Please git fetch $YOUR_REMOTE and git reset --hard $YOUR_REMOTE/jdbc-array-field

@mikepigott
Copy link
Contributor Author

OK, I have your changes now.

@wesm
Copy link
Member

wesm commented Feb 8, 2019

Cool sorry for the cross-fire =)

@mikepigott
Copy link
Contributor Author

No worries! I'll try to respond to the code review feedback this weekend.

@mikepigott
Copy link
Contributor Author

I think I've either responded to or implemented all of the code-review feedback. Please let me know if I missed anything!

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my feedback. I left a few more comments (most of the m can likely be done as part of follow-up CLs).

I assume the failing unit tests are still due to the issue @praveenbingo brought up, don't know if there has been an update there.

* @param scale The field's numeric scale.
*/
public JdbcFieldInfo(int jdbcType, int precision, int scale) {
this.jdbcType = jdbcType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to add a similar check to the one above in here (only reversed). Or comment why the check isn't there (maybe it makes reflection or something else easier?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no real reason not to use this constructor for non-numeric & non-decimal types; the precision & scale are just ignored. No harm done.

* @param calendar The calendar to use when constructing Timestamp fields and reading time-based results.
*/
public JdbcToArrowConfig(BaseAllocator allocator, Calendar calendar) {
Preconditions.checkNotNull(allocator, "Memory allocator cannot be null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

* @param index The {@link java.sql.ResultSetMetaData} column name of an {@link java.sql.Types#ARRAY} type.
* @return The {@link JdbcFieldInfo} for that array's sub-type, or <code>null</code> if not defined.
*/
public JdbcFieldInfo getArraySubTypeByColumnName(String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not for this review. I think having a clear convention around nulls is important (and my preference but I'm sure there are others) is to assume things are non-null by default and mark things that can be null explcitly. I'll go through the mailing list archive to see if this was discussed before and if not post to the mailing list.

* @return The {@link JdbcFieldInfo} for that array's sub-type, or <code>null</code> if not defined.
*/
public JdbcFieldInfo getArraySubTypeByColumnName(String name) {
if (arraySubTypesByColumnName == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a style thing also, so not relevant for this review, but in the past I've had a preference for validating up front (again I'm not familiar with conventions in this code base though).

@@ -30,6 +31,8 @@
private Calendar calendar;
private BaseAllocator allocator;
private boolean includeMetadata;
private Map<Integer, JdbcFieldInfo> arraySubTypesByColumnIndex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not worth changing for this review, but given this would be a relatively dense map, it might be slightly more performant to have this as an JdbcFieldInfo[]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the number of fields returned by the query. If a query returns 10 fields and the ninth is an array, the map wouldn't be very dense at all. I'd prefer to leave this as-is until proven otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with leaving as is. You are right not quite dense, but still a few number of elements in general vs boxing/unboxing cost for integers on every call (like I said probably not a performance issue at all.

}
}

private void testIntegerList(ListVector listVector, int rowCount, Integer[][] expectedValues) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't rowCount determined by expectedValues.length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it would look like:

int rowCount = (expectedValues == null) ? 0 : expectedValues.length;

I don't think that's clearer.

}

private void testFloatList(ListVector listVector, int rowCount, Float[][] expectedValues) {
Float4Vector vector = (Float4Vector) listVector.getDataVector();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seems to be a lot of redundant code in these test methods is it possible to refactor to use a lambda to avoid the duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundancy is due to the different vectors having different types. I'm not at all familiar with Java lambdas, but from what I'm reading in https://medium.freecodecamp.org/learn-these-4-things-and-working-with-lambda-expressions-b0ab36e0fffc it looks like I can't use it to hide the types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they can't really hide types, but if the function took two Object values you could have a specific one that does the casts internally for the listVector and expectedValues.

No need to implement this just a thought.

public void testJdbcToArrowWithNulls() throws Exception {
int rowCount = 3;
Integer[][] intArrays = new Integer[rowCount][];
intArrays[0] = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in addition to nulls it might be worth testing a non-null empty list as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

int offset = offsetBuffer.getInt((row + 1) * ListVector.OFFSET_WIDTH);

if (expectedValues[row] == null) {
assertEquals(0, offset - prevOffset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i might be missing it but shouldn't there be a test against the validity bitmap as well for each row?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector.get(i) throws an exception if trying to get an invalid item (which is determined by checking the validity bitmap). So I think it's covered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking on listVector not vector, is that case still covered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, nice catch, I wasn't indicating values as NULL when I should have been. I fixed that.

@praveenbingo
Copy link
Contributor

@mikepigott @emkornfield
Fixed the varchar issue in #3625

@mikepigott
Copy link
Contributor Author

Thanks @praveenbingo!

@emkornfield - I'll try to respond to your feedback by Friday.

@mikepigott
Copy link
Contributor Author

@emkornfield: I believe I've responded to and/or implemented all of your feedback.

That said, I don't think I follow your comment #3294 (comment) ... where did you want to see it replicated?

Thanks,
Mike

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On using length instead of passing rowCount (#3294 (comment)) I didn't think you ever passed in nulls so the code would need to change.

Overall, other then the issues I responded to, I don't think I have other comments. So its up to a committer to take a pass. Thanks for bearing with me.

@@ -30,6 +31,8 @@
private Calendar calendar;
private BaseAllocator allocator;
private boolean includeMetadata;
private Map<Integer, JdbcFieldInfo> arraySubTypesByColumnIndex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with leaving as is. You are right not quite dense, but still a few number of elements in general vs boxing/unboxing cost for integers on every call (like I said probably not a performance issue at all.

} catch (Exception e) {
e.printStackTrace();
throw e;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, might be worth leaving a comment reflecting that. I wonder if it is a configuration parameter (I would think at least in an IDE it would be possible to get the full strack trace without this).

}

private void testFloatList(ListVector listVector, int rowCount, Float[][] expectedValues) {
Float4Vector vector = (Float4Vector) listVector.getDataVector();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they can't really hide types, but if the function took two Object values you could have a specific one that does the casts internally for the listVector and expectedValues.

No need to implement this just a thought.

public void testJdbcToArrowWithNulls() throws Exception {
int rowCount = 4;

Integer[][] intArrays = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this before but why the boxes type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm when I get home, but I believe the JDBC function to build a JDBC Array requires an Object[]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm when I get home, but I believe the JDBC function to build a JDBC Array requires an Object[]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed.

Array intArray = conn.createArrayOf("INT", integerArray); - the createArrayOf field requires an Object[] as its second parameter.

int offset = offsetBuffer.getInt((row + 1) * ListVector.OFFSET_WIDTH);

if (expectedValues[row] == null) {
assertEquals(0, offset - prevOffset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking on listVector not vector, is that case still covered?

@mikepigott
Copy link
Contributor Author

Sorry, I won't be able to dive into your comments tonight. I'll try again tomorrow. Have a good night!

@mikepigott
Copy link
Contributor Author

OK! I think my last push covers the last of the feedback.

@mikepigott
Copy link
Contributor Author

@praveenbingo - I saw #3625 was merged, so I rebased master and retried the failing unit test from before. It passes now! Thank you so much for fixing that.

I think I've responded to all of the feedback on this PR?

@praveenbingo
Copy link
Contributor

praveenbingo commented Feb 18, 2019

@mikepigott - Sure no problem :)

Something seems off in the PR..did you merge from master by any chance? PR seems to include a lot of existing commits..

@mikepigott
Copy link
Contributor Author

@praveenbingo - I rebased from apache/arrow:master using the instructions in https://thoughtbot.com/blog/keeping-a-github-fork-updated ... I'm sorry if I did that wrong. But yes, that's where the extra commits came from.

@wesm
Copy link
Member

wesm commented Feb 18, 2019

@mikepigott you probably had not fetched from the apache remote. Would it be OK if I fixed your branch?

@wesm
Copy link
Member

wesm commented Feb 18, 2019

There's a merge commit here "[Merge branch 'jdbc-array-field' of](/apache/arrow/pull/3294/commits/7015472394a3582699990953486aafb80ff1f97f) [https://github.com/ironponyexpress…](/apache/arrow/pull/3294/commits/7015472394a3582699990953486aafb80ff1f97f) -- that could not have arisen from a use of git rebase

@mikepigott
Copy link
Contributor Author

Absolutely - sorry for the trouble!

@mikepigott
Copy link
Contributor Author

mikepigott commented Feb 18, 2019

After I performed the rebase, git told me I needed to merge from ironponyexpress/arrow:jdbc-array-field because the branches had diverged. I listened to it.

@wesm
Copy link
Member

wesm commented Feb 18, 2019

That's not right. The next step is to force push your remote branch

@mikepigott
Copy link
Contributor Author

Oh. Sorry.

Change-Id: Id6788671f5b30ed2f33f4317d00b5e6dc92ee889
@wesm
Copy link
Member

wesm commented Feb 18, 2019

Fixed. Please make sure to

git fetch ironponyexpress
git reset --hard ironponyexpress/jdbc-array-field

@mikepigott
Copy link
Contributor Author

Done. Sorry again.

@wesm
Copy link
Member

wesm commented Feb 18, 2019

No worries, git workflows are always an exercise in blood, sweat, and tears :)

@wesm
Copy link
Member

wesm commented Feb 18, 2019

Looks like the build is passed, who can have a final look and sign off on this PR?

Copy link
Contributor

@praveenbingo praveenbingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikepigott - thanks for pushing this through :)

@pravindra @wesm - could you please help merge.

@pravindra pravindra closed this in 5aaf700 Feb 19, 2019
tanyaschlusser pushed a commit to tanyaschlusser/arrow that referenced this pull request Feb 21, 2019
https://issues.apache.org/jira/browse/ARROW-4142

This adds support for reading JDBC arrays and converting them to Arrow ListVectors.  JDBC does not provide a great way to get the array type; there is a ResultSet object for walking the array, but its ResultSetMetaData may not contain the right value type (H2, for example, returns a JDBC type of NULL).

This is based on apache#3134, which includes ARROW-3965 and ARROW-3966.

I found Arrow arrays to be very confusing, and I am not sure if I am using them correctly here.  One thing I noticed was if I added a null array to a ListVector of VarCharVectors, the next value in the VarCharVector would be empty.  I would appreciate any help on why!  The ListVector unit tests weren't very helpful.

For all other cases, this code seems to work.  I look forward to your review!

Author: Mike Pigott <mpigott@gmail.com>

Closes apache#3294 from mikepigott/jdbc-array-field and squashes the following commits:

66376dd <Mike Pigott> Support for reading Array records to ListVector from JDBC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants