Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: non null string fails with InvalidDataException and is wrongly flagged as nullable #535

Open
ErikApption opened this issue Jul 22, 2024 · 3 comments

Comments

@ErikApption
Copy link
Contributor

ErikApption commented Jul 22, 2024

Library Version

4.25.0-pre.2

OS

Windows

OS Architecture

64 bit

How to reproduce?

  1. Deserialize simple parquet file with schema (obtained from Tad - the parquet file also crashes floor)
    image

  2. "System.IO.InvalidDataException: 'property 'Value' is declared as 'value (System.String?)' but source data has it as 'value (System.String)''"

with

public class MyClass 
{
    [JsonPropertyName("id")]
    public long Id { get; set; }

    [JsonPropertyName("value")]
    public string Value { get; set; } = null!;

    [JsonPropertyName("frequency")]
    public double Frequency { get; set; }
}

The important part to note is that the exception is the same wherever Value is defined as nullable or not.

    [JsonPropertyName("value")]
    public string? Value { get; set; } = null!;

ends up with exactly same error message.

Failing test

var records = await ParquetSerializer.DeserializeAllAsync<MyClass>(stream).ToListAsync();
@aloneguid
Copy link
Owner

I'm not sure how to reproduce this to be honest, are you trying to deserialise a particular file with class serializer? Do you have a sample?

@ErikApption
Copy link
Contributor Author

ErikApption commented Jul 26, 2024

I'm not sure how to reproduce this to be honest, are you trying to deserialise a particular file with class serializer? Do you have a sample?

Yes I am trying to deserialize a class similar to PocoClassNotNullable using the class serializer.

I added unit tests in my PR to show the issue.

        class PocoClassNotNullable {
            [JsonPropertyName("id")]
            public long Id { get; set; }

            [JsonPropertyName("value")]
            public string Value { get; set; } = null!;

            [JsonPropertyName("frequency")]
            public double Frequency { get; set; }
        }

        [Fact]
        public void Strings_NotNullable() {
            ParquetSchema s = typeof(PocoClassNotNullable).GetParquetSchema(true);
            Assert.False(s.DataFields[0].IsNullable);
            Assert.False(s.DataFields[1].IsNullable);
            Assert.False(s.DataFields[2].IsNullable);
        }

This fails with current version and Value will be marked as IsNullable. The current code will consider any string as nullable (and Value above), so I don't think you can deserialize a non nullable string in parquet to a class unless you add [ParquetRequired] to that field. Without Roslyn annotations, there is no difference in metadata between string and string? - both are technically nullable because it is a class.

@ErikApption
Copy link
Contributor Author

@aloneguid bumping this up - any chances you could look at my PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants