Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3631: SerDe support for Value::Fixed(). #1893

Closed

Conversation

RikHeijdens
Copy link
Contributor

What is the purpose of the change

This Pull Request adds tests that reproduces the issue reported in AVRO-3631.

I don't have a good solution for this issue yet, and by opening this as a draft PR I'm hoping we could use this PR as a discussion medium for a solution.

Verifying this change

We're adding automated tests here (that currently fail).

Documentation

  • Does this pull request introduce a new feature? No
  • If yes, how is the feature documented? Not applicable

This test-case mainly demonstrates the issue reported in AVRO-3631. It
is unclear to me whether we should actually expect the serializer to
serialize to a Value::Fixed right away given that Schema information is
not available at this stage.
@github-actions github-actions bot added the Rust label Sep 29, 2022
@martin-g
Copy link
Member

martin-g commented Oct 3, 2022

@RikHeijdens Please 7b1cdd8
It is based on serde-rs/bytes#28 which is not yet approved and merged!

But as you explained at JIRA since there is no schema available at serialization time it is not easy to decide whether to produce Value::Fixed or Value::Array. I'll see whether this could be provided by a serde attribute, e.g. https://serde.rs/field-attrs.html#with

@RikHeijdens
Copy link
Contributor Author

Thanks for taking a look into this @martin-g. If I am understanding the solution correctly, we would need in addition to this change a change to rsgen-avro to store Value::Fixed fields in structs as a serde_bytes::ByteArray rather than a Rust array in the case where the Rust struct is generated from the following schema:

{
    "type": "record",
    "name": "TestStructFixedField",
    "fields": [
        {
            "name": "field",
            "type": {
                "name": "field",
                "type": "fixed",
                "size": 6
            }
        }
    ]
}

Is this being tracked in lerouxrgd/rsgen-avro#38?

@martin-g
Copy link
Member

martin-g commented Oct 3, 2022

@RikHeijdens Please see 3e3f125
There is no need to leak the implementation detail ByteArray.

I want to add some more tests related to serializing &[u8], [u8; N] and Vec<u8>. I have the feeling it would be much more complicated than this. I hope I am wrong!

@RikHeijdens
Copy link
Contributor Author

It's great that we can store the field in the struct as an array of u8 and have the SerDe framework handle the annotation for us if the #[serde(with = "serde_bytes")] attribute is set.

However, since I'm generating the example struct that I used from an Avro schema in the first place, using rsgen-avro, I believe we'd still need support in rsgen-avro to set that attribute when generating a struct from a Value::Fixed field. My understanding is that it will only set the attribute for Value::Bytes at this moment.

@martin-g
Copy link
Member

martin-g commented Oct 5, 2022

Correct!
rsgen-avro does not support [u8; N] at the moment because serde_bytes does not support it - serde-rs/bytes#28

I've spent more time reading serde docs and code and I think we don't really need serde_bytes. Its documentation says that it does some optimizations but actually it just delegates to Serializer::serialize_bytes() (https://github.com/serde-rs/bytes/blob/master/src/ser.rs).

So the real implementation is here -

fn serialize_bytes(self, v: &[u8]) -> Result<Self::Ok, Self::Error> {
Ok(Value::Bytes(v.to_owned()))
}
.
But the problem here is still the same - there is no information about the Avro Schema and we don't know whether to use Value::Bytes or Value::Fixed.

The only solution I see is to use custom #[serde(serialize_with="avro_serialize_fixed")] annotation on such fields and then the implementation of this method could export a thread local that will provide the needed information to serialize_bytes().

@RikHeijdens
Copy link
Contributor Author

I'm not quite sure I understand what you mean by:

rsgen-avro does not support [u8; N]

The structs that I used in my examples were generated through rsgen-avro. I suppose it might not support it for Bytes fields? It does seem to generate the struct properly from what I can tell, but the one change that we need in rsgen-avro is that it should annotate the array associated with the fixed field with the #[serde(serialize_with="") attribute. The serialize_with attribute should then refer to a function to serialize a Value::Fixed, for which we should add support in the avro library.

@martin-g
Copy link
Member

martin-g commented Oct 5, 2022

I meant that rsgen-avro does not put #[serde(with = serde_bytes)] for fixed schema.

And yes, since apache-avro does not provide (yet) logic to serialize [u8; N] to Fixed, rsgen-avro does not annotate the field with #[serde(serialize_with="") either.

@martin-g martin-g closed this Oct 6, 2022
@martin-g
Copy link
Member

martin-g commented Oct 6, 2022

For some reason I was not able to push to your branch, so I opened a new PR: #1900

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants