Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Deserialize nullable LIST, ArgumentOutOfRange exception #502

Closed
akaloshych84 opened this issue Apr 22, 2024 · 4 comments
Closed

[BUG]: Deserialize nullable LIST, ArgumentOutOfRange exception #502

akaloshych84 opened this issue Apr 22, 2024 · 4 comments
Assignees
Milestone

Comments

@akaloshych84
Copy link

Library Version

4.23.5

OS

MacOS

OS Architecture

64 bit

How to reproduce?

Hello,

I am working on POC and try to process ROS generated data which is converted then to parquet in GCS, this .NET package is really cool, amazing performance. While it works for some topics (without nested repeated structs), it is failing on large complec data structures with multi level lists/structs.
I found multiple error types on deserializing to C# classes, one is the same that was closed last year - destination is too short. Another one is ArgumentOutOfRange.
I prepared a small test on which can be reproduced the second error type, another one is more difficult to reproduce, will try to prepare another test dataset and submit another ticket.
Here is the example file with truncated schema to just a few fields but the ArgumentOutOfRange can be reproduced:
000000000000.parquet.zip

Error:
image

Failing test

Class to deserialize to:

    public class HeaderStamp
    {
        public Int64? secs { get; set; }
        public Int64? nsecs { get; set; }
    }

    public class Header
    {
        public Int64? seq { get; set; }
        public HeaderStamp stamp { get; set; }
        public String frame_id { get; set; }
    }

    public class TrackedObjectsTestListElement
    {
        public long? track_id { get; set; }
        public Double? existence_probability { get; set; }
        public Boolean? moving { get; set; }
    }

    public class TrackedObjectsTest
    {
        public Header header { get; set; }
        public List<TrackedObjectsTestListElement> tracked_objects { get; set; }
        public String _launch_id { get; set; }
    }

And deserialize command:
var r = await ParquetSerializer.DeserializeAsync<TrackedObjectsTest>("000000000000.parquet");
@aloneguid
Copy link
Owner

Just looking at this now.

@aloneguid
Copy link
Owner

Seems like the issue is in second row group, in column "tracked_objects/list/element/track_id". More to follow.

@aloneguid
Copy link
Owner

Got to the bottom of it and the issue is schema compatibility. The error is really confusing and not informative, which I've done some work on for the next release. But, the actual problem is that schema for list in the class definion is optional (you can have nullable list) but in the file itself the list is required, so deserializer gets confused. I'm looking at possible solutions to this.

@aloneguid aloneguid changed the title [BUG]: Deserialize, ArgumentOutOfRange exception [BUG]: Deserialize nullable LIST, ArgumentOutOfRange exception Oct 3, 2024
@aloneguid aloneguid added this to the 5.0.1 milestone Oct 3, 2024
@aloneguid
Copy link
Owner

Hi. In 5.0.1-pre.1 you can fix the schema by marking:

[ParquetRequired, ParquetListElementRequired]
public List<TrackedObjectsTestListElement> tracked_objects { get; set; }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants