Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(go/adbc/driver/snowflake): support parameter binding #1808

Merged
merged 3 commits into from
May 7, 2024
Merged

Conversation

lidavidm
Copy link
Member

@lidavidm lidavidm commented May 3, 2024

Fixes #1144.

@lidavidm
Copy link
Member Author

lidavidm commented May 3, 2024

@zeroshade could you help figure out the ASAN error? Reducing SqlPrepareUpdateStream to this first part still seems to trigger it, so to me it seems like a GC issue again:

void StatementTest::TestSqlPrepareUpdateStream() {
  if (!quirks()->supports_bulk_ingest(ADBC_INGEST_OPTION_MODE_CREATE) ||
      !quirks()->supports_dynamic_parameter_binding()) {
    GTEST_SKIP();
  }

  ASSERT_THAT(AdbcStatementNew(&connection, &statement, &error), IsOkStatus(&error));
  ASSERT_THAT(quirks()->DropTable(&connection, "bulk_ingest", &error),
              IsOkStatus(&error));
  struct ArrowError na_error;

  const std::vector<SchemaField> fields = {{"ints", NANOARROW_TYPE_INT64}};

  // Create table
  {
    Handle<struct ArrowSchema> schema;
    Handle<struct ArrowArray> array;

    ASSERT_THAT(AdbcStatementSetOption(&statement, ADBC_INGEST_OPTION_TARGET_TABLE,
                                       "bulk_ingest", &error),
                IsOkStatus(&error));
    ASSERT_THAT(MakeSchema(&schema.value, fields), IsOkErrno());
    ASSERT_THAT((MakeBatch<int64_t>(&schema.value, &array.value, &na_error, {})),
                IsOkErrno(&na_error));
    ASSERT_THAT(AdbcStatementBind(&statement, &array.value, &schema.value, &error),
                IsOkStatus(&error));
    ASSERT_THAT(AdbcStatementExecuteQuery(&statement, nullptr, nullptr, &error),
                IsOkStatus(&error));
  }
Direct leak of 160 byte(s) in 2 object(s) allocated from:
    #0 0x7f50bac9561c in __interceptor_malloc ../../../../libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x7f50b95c1339 in get_arr /home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v17@v17.0.0-20240430043840-e4f31462dbd6/arrow/cdata/cdata.go:31
    #2 0x7f50b95c14d3 in _cgo_1f0435ccdf41_Cfunc_get_arr /tmp/go-build/cgo-gcc-prolog:144
    #3 0x7f50b7bb64e4 in runtime.asmcgocall /usr/lib/go-1.21/src/runtime/asm_amd64.s:872

Indirect leak of 584 byte(s) in 4 object(s) allocated from:
    #0 0x7f50bac9561c in __interceptor_malloc ../../../../libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x55763b631bab in PrivateArrowMalloc /home/lidavidm/Code/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:186

Indirect leak of 33 byte(s) in 2 object(s) allocated from:
    #0 0x7f50bac94df6 in __interceptor_realloc ../../../../libsanitizer/asan/asan_malloc_linux.cpp:85
    #1 0x55763b631bba in PrivateArrowRealloc /home/lidavidm/Code/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:188

SUMMARY: AddressSanitizer: 777 byte(s) leaked in 8 allocation(s).

@zeroshade
Copy link
Member

I tracked down the cause of the leak, it's not a GC issue. I filed an upstream PR to fix the leak in the cdata package of Go Arrow which is linked above

lidavidm pushed a commit to apache/arrow that referenced this pull request May 3, 2024
### What changes are included in this PR?
If the `imp.alloc.bufCount` is 0, indicating we did not import any buffers from the provided C ArrowArray object, then we are free to not only call the release callback (which we already do) but also we need to free the temp ArrowArray we allocated to move the source to.

This was uncovered by apache/arrow-adbc#1808
* GitHub Issue: #41534

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@lidavidm
Copy link
Member Author

lidavidm commented May 4, 2024

Hmm.

=================================================================
==8722==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 80 byte(s) in 1 object(s) allocated from:
    #0 0x7fa643a955f8 in __interceptor_malloc ../../../../libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x7fa642104139 in get_arr /home/runner/go/pkg/mod/github.com/apache/arrow/go/v17@v17.0.0-20240503231747-7cd9c6fbd313/arrow/cdata/cdata.go:31
    #2 0x7fa6421042d3 in _cgo_7084d99dab99_Cfunc_get_arr /tmp/go-build/cgo-gcc-prolog:144
    #3 0x7fa6406f8884 in runtime.asmcgocall /opt/hostedtoolcache/go/1.21.8/x64/src/runtime/asm_amd64.s:872

Indirect leak of 584 byte(s) in 4 object(s) allocated from:
    #0 0x7fa643a955f8 in __interceptor_malloc ../../../../libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x5586872db7b0 in PrivateArrowMalloc /home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:186

Indirect leak of 33 byte(s) in 2 object(s) allocated from:
    #0 0x7fa643a94dd2 in __interceptor_realloc ../../../../libsanitizer/asan/asan_malloc_linux.cpp:85
    #1 0x5586872db7bf in PrivateArrowRealloc /home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:188

SUMMARY: AddressSanitizer: 697 byte(s) leaked in 7 allocation(s).

@zeroshade
Copy link
Member

Still?? Wtf?

@lidavidm
Copy link
Member Author

lidavidm commented May 4, 2024

I think I tracked it down to a missing release in one case, let's see :P

@lidavidm lidavidm marked this pull request as ready for review May 4, 2024 12:54
@lidavidm
Copy link
Member Author

lidavidm commented May 6, 2024

@zeroshade can I just get a final look here once you're up?

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just a couple nitpicks

Comment on lines +40 to +44
case *array.Boolean:
params[i].Value = sql.NullBool{
Bool: column.Value(index),
Valid: column.IsValid(index),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for us to think about is that go1.22 added sql.Null[T] which might be useful to allow us to create a more generic implementation of this later on (since we officially still support go 1.21, we wouldn't be able to use it yet)

Comment on lines +32 to +34
// technically, snowflake can bind an array of values at once, but
// only for INSERT, so we can't take advantage of that without
// analyzing the query ourselves
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate this, but nothing we can do right now i guess

go/adbc/driver/snowflake/binding.go Show resolved Hide resolved
params, err := r.NextParams()
if err != nil {
return nil, err
} else if params == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for else since we're returning in the previous condition

return nil, err
} else if params == nil {
// end-of-stream
return nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil defines end-of-stream instead of using io.EOF?

Comment on lines 106 to 108
if r.currentBatch != nil {
r.currentBatch.Release()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might still end up with a double release here because we don't set r.currentBatch to nil after calling stream.Next() and getting the end of the stream. we should probably just always set r.currentBatch = nil before we call stream.Next() to ensure we don't double release. You can test for excess releases by running with -tags assert which will turn on the debug asserts in Arrow

Comment on lines 109 to 111
if r.stream != nil {
r.stream.Release()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably also set both r.currentBatch and r.stream to nil during release, just to prevent issues if release is called multiple times somehow

@lidavidm lidavidm requested a review from zeroshade May 7, 2024 08:12
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lidavidm lidavidm merged commit 8f0bdb3 into main May 7, 2024
92 of 96 checks passed
@lidavidm lidavidm deleted the gh-1144 branch May 7, 2024 14:11
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…41535)

### What changes are included in this PR?
If the `imp.alloc.bufCount` is 0, indicating we did not import any buffers from the provided C ArrowArray object, then we are free to not only call the release callback (which we already do) but also we need to free the temp ArrowArray we allocated to move the source to.

This was uncovered by apache/arrow-adbc#1808
* GitHub Issue: apache#41534

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
kou pushed a commit to apache/arrow-go that referenced this pull request Aug 30, 2024
### What changes are included in this PR?
If the `imp.alloc.bufCount` is 0, indicating we did not import any buffers from the provided C ArrowArray object, then we are free to not only call the release callback (which we already do) but also we need to free the temp ArrowArray we allocated to move the source to.

This was uncovered by apache/arrow-adbc#1808
* GitHub Issue: #41534

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

go/adbc/driver/snowflake: Parameterized SQL queries -> NotSupportedError
2 participants