Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: ArgumentOutOfRangeException thrown when trying to write column (interop type narrowing) #393

Open
itayfisz opened this issue Sep 3, 2023 · 1 comment

Comments

@itayfisz
Copy link
Contributor

itayfisz commented Sep 3, 2023

Library Version

4.16.2

OS

Linux

OS Architecture

64 bit

How to reproduce?

It's hard to reproduce - It happens rarely when I try to write column with many rows, with very long string values. After calling WriteColumnAsync, I get ArgumentOutOfRangeException - see full error below.
Seems that in IronCompress\Iron.cs, the returned output variable "len" is negative, which is then used to initialize an array, and as a result causing the ArgumentOutOfRangeException.

bool ok = Native.compress(
                           compressOrDecompress,
                           (int)codec, inputPtr, input.Length, null, &len, level);

This is probably because the snappy::RawCompress, called in api.cpp, has an integer overflow. Here's a bug on Snappy about it, which has already been fixed.

I assume the solution would be to upgrade the Snappy version, but I'm not sure if it would just return a more informative error.

Exception Details
===================================
Exception Type: System.ArgumentOutOfRangeException
Message: Specified argument was out of the range of valid values. (Parameter 'minimumLength')
Actual Value: 
Param Name: minimumLength
Target Site: T[] Rent(Int32)
Help Link: 
Source: System.Private.CoreLib
HResult: -2146233086

Stack Trace Details 
-----------------------------------
   at System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1.Rent(Int32 minimumLength)
   at IronCompress.Iron.NativeCompressOrDecompress(Boolean compressOrDecompress, Codec codec, ReadOnlySpan`1 input, CompressionLevel compressionLevel, Nullable`1 outputLength)
   at IronCompress.Iron.Compress(Codec codec, ReadOnlySpan`1 input, Nullable`1 outputLength, CompressionLevel compressionLevel)
   at Parquet.File.DataColumnWriter.CompressAndWriteAsync(PageHeader ph, MemoryStream data, ColumnSizes cs, CancellationToken cancellationToken)
   at Parquet.File.DataColumnWriter.WriteColumnAsync(ColumnChunk chunk, DataColumn column, SchemaElement tse, CancellationToken cancellationToken)
   at Parquet.File.DataColumnWriter.WriteAsync(FieldPath fullPath, DataColumn column, CancellationToken cancellationToken)
   at Parquet.ParquetRowGroupWriter.WriteColumnAsync(DataColumn column, Dictionary`2 customMetadata, CancellationToken cancellationToken)

Failing test

No response

@itayfisz itayfisz changed the title [BUG]: [BUG]: ArgumentOutOfRangeException thrown when trying to write column Sep 4, 2023
@aloneguid
Copy link
Owner

IronCompress is already on latest snappy version, but there is a bug in narrowing data type as you have mentioned. Thanks for reporting this, I'll try to reproduce and get some fixes in.

In the meantime, you can try to write in batches (row groups) as it looks like columns are massive anyway and readers will have issues decompressing them if ram is an issue.

@aloneguid aloneguid changed the title [BUG]: ArgumentOutOfRangeException thrown when trying to write column [BUG]: ArgumentOutOfRangeException thrown when trying to write column (interop type narrowing) Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants