Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: 4.25 breaks binary compatibility with files serialized using 4.24 and De/Serialize<T> #555

Closed
NeilMacMullen opened this issue Sep 30, 2024 · 5 comments
Assignees

Comments

@NeilMacMullen
Copy link

Library Version

4.25.0

OS

windows

OS Architecture

64 bit

How to reproduce?

Repro:

  1. Build the console application below using Parquet.net 4.24
  2. Run vcheck.exe w test_424.parquet
  3. Rebuild the application using Parquet.net 4.25
  4. Run vcheck.exe w test_425.parquet
  5. Run vcheck.exe r test_424.parquet

Note that the following exception is thrown...
Unhandled exception. System.IO.InvalidDataException: property 'TestProperty' is declared as 'TestProperty (System.String?)' but source data has it as 'TestProperty (System.String)'

This appears to because the default for "IsNullable" has changed between versions...

image

It may be that the new behaviour is more "correct" but it's unfortunate that we have a whole load of files which are suddenly unreadable without dto/code changes :-( It would be desirable if there was a ParquetOptions property that could be set to revert to the old behaviour and allow 'nullable' columns to be mapped to non-nullable properties with an exception being thrown at runtime if a null were ever encountered. (Or even to allow some null-to-value function to be supplied)

Code

using Parquet;
using Parquet.Serialization;

if (args.Length < 2)
{
    Console.WriteLine("wrong number of parameters - use vcheck CMD FILE where CMD is r/w/s");
    return;
}
   
var cmd = args[0];
var file = args[1];

if (cmd.ToLowerInvariant().StartsWith("w"))
{
    Console.WriteLine($"writing {file}...");
    var dataOut = Enumerable.Range(0, 1000).Select(i => new TestClass { TestProperty = $"Test {i}" });
    await ParquetSerializer.SerializeAsync(dataOut, file);
}

if (cmd.ToLowerInvariant().StartsWith("r"))
{
    Console.WriteLine($"reading {file}");
    var dataIn = await ParquetSerializer.DeserializeAsync<TestClass>(file);
    Console.WriteLine("Done");
}

if (cmd.ToLowerInvariant().StartsWith("s"))
{
    using(ParquetReader reader = await ParquetReader.CreateAsync(file))
    {
        Console.WriteLine($"reading schema for {file}");
        var schema = reader.Schema;
        foreach (var f in schema.DataFields)
        {
            Console.WriteLine($"type:{f.ClrType} nullable:{f.IsNullable}");
        }
    }
}

public class TestClass
{
    public string TestProperty { get; set; } = string.Empty;
}

Failing test

No response

@aloneguid aloneguid self-assigned this Sep 30, 2024
@aloneguid
Copy link
Owner

My bad, this is caused by #537 which I'm trying to revert now.

@NeilMacMullen
Copy link
Author

No worries - really great library by the way :-)

aloneguid added a commit that referenced this issue Sep 30, 2024
@aloneguid
Copy link
Owner

Please try 5.0.0-pre.2

@ErikApption
Copy link
Contributor

ErikApption commented Sep 30, 2024

Imo there is no reason this fails with an exception, this should be a warning (you can technically put a null into a string when it it is not marked as nullable) but removing nullable support makes it very unintuitive to load external files. Parquet. Net doesn't let you map a non nullable string column otherwise.

@aloneguid
Copy link
Owner

I'd like to integrate it back from the #537 PR manually asap, but I'll have to write some test cases covering edge case scenarios first. Rest assured this is not boing to be lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants