Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Pulsar client AUTO_CONSUME "double" as "BigDecimal" in JSON schemas #20092

Open
2 tasks done
aymkhalil opened this issue Apr 14, 2023 · 8 comments
Open
2 tasks done
Labels
Stale type/bug The PR fixed a bug or issue reported a bug

Comments

@aymkhalil
Copy link
Contributor

Search before asking

  • I searched in the issues and found nothing similar.

Version

Pulsar client 2.10.3, 2.10.2, 2.10.1. Please note the behavior was correct with 2.8.3 and 2.10.0 and regressed starting with 2.10.1

Minimal reproduce step

Run the below program to create a topic with a JSON schema that has an optional double field. Please note the behavior is correct if the field is not optional

public static void main(String[] args) throws PulsarClientException {
        final PulsarClient client = PulsarClient.builder()
                //.serviceUrl("pulsar://localhost:65093")
                .serviceUrl("pulsar://localhost:6650")
                .build();

        RecordSchemaBuilder schemaBuilder = SchemaBuilder.record("myrecord");
        schemaBuilder.field("xdouble").type(SchemaType.DOUBLE).optional();
        SchemaInfo schemaInfo = schemaBuilder.build(SchemaType.JSON);
        GenericSchema<GenericRecord> schema = Schema.generic(schemaInfo);
        GenericRecordBuilder builder = schema.newRecordBuilder();
        builder.set("xdouble", 1.0d);
        GenericRecord record = builder.build();

        Producer<GenericRecord> producer = client.newProducer(schema)
                .topic("persistent://public/default/json-topic3")
                .create();
        producer.send(record);

        Consumer<GenericRecord> consumer = client.newConsumer(Schema.AUTO_CONSUME())
                .topic("persistent://public/default/json-topic3")
                .subscriptionName("my-subscription2")
                .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest)
                .subscribe();

        var message = consumer.receive();
        var value = ((JsonNode)message.getValue().getNativeObject()).get("xdouble").numberValue();
        assertTrue("expected double, got " + value.getClass() , value instanceof Double);
        //consumer.acknowledge(message);

        client.close();
    }

What did you expect to see?

The assertion should pass

What did you see instead?

Exception in thread "main" java.lang.AssertionError: expected double, got class java.math.BigDecimal
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.example.SimpleConsumer.main(SimpleConsumer.java:63)

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@tisonkun
Copy link
Member

tisonkun commented Apr 14, 2023

Thanks for your report @aymkhalil!

Can you try out 2.11.0 and build from source? I cannot reproduce this issue with current master 091ee25.

Sorry, it is still valid on master.

@tisonkun
Copy link
Member

I think it's this PR change the behavior: #15687

Note that we now use:

ObjectMapperFactory.getMapper().reader().with(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS)

... for deserializing.

cc @codelipenghui @shibd

@tisonkun
Copy link
Member

Basically I don't think it's bad but we should adjust to correct schema using schema info.

@tisonkun
Copy link
Member

I create a patch to fix your use case #20104. You can give it a try and help in review.

I don't know if it affects other use case but if it's good to go, I can add your test case :)

Thanks for your reporting again especially the reproduce steps.

@aymkhalil
Copy link
Contributor Author

aymkhalil commented Apr 14, 2023

Thanks @tisonkun - I verified your patch locally and it works well.

I think it is reasonable because IIUC the USE_BIG_DECIMAL_FOR_FLOATS works for deserialization and used internally to fix the referenced SQL issue, but your patch makes the JsonNode respect the schema which is what an external user would expect (in between, there wouldn't be any precision loss).

If it makes sense, one potential test case is to make sure the GenericJsonObject and the JsonNode available via NativeObject return the same node types. For example, before your patch

((JsonNode)message.getValue().getNativeObject()).get("xdouble").numberValue() --> BigDecimal
((GenericJsonRecord)message.getValue()).getField("xdouble") --> Double

The reason why the latter would work before your fix is because it uses the constructor variant that accepts the schema info. Actually, it is because of this.

A test to detect a mismatch would've been one early indicator.

@congbobo184
Copy link
Contributor

congbobo184 commented Apr 19, 2023

for user issues:

((JsonNode)message.getValue().getNativeObject()).get("xdouble").numberValue() --> BigDecimal
((GenericJsonRecord)message.getValue()).getField("xdouble") --> Double

I think we don't need to think about JsonNode.get("xxxx") it's not the API of Pulsar and the user knows the field name and the user also knows it's the type and how the field encoding.

for the need to improve:


now we use GenericJsonRecord.getField(), if is the FloatingPointNumber we return double type directly if the field schema is decimal, we should return decimal. its Pulsar API needs to guarantee.

@aymkhalil
Copy link
Contributor Author

I think we don't need to think about JsonNode.get("xxxx") it's not the API of Pulsar

Ideally yes, but my understanding is that the public API is limited and hence nativeObject was exposed (as far as end user is concerned this is part of the API no?).

With that in mind, I think it is safer if the native object and the public API behavior match (or I can't think of a good reason why they wouldn't).

if is the FloatingPointNumber we return double type directly if the field schema is decimal, we should return decimal

This actually demonstrates the confusion because the generic pulsar schema API doesn't have a DECIMAL type. As a user, I think of nativeObject as my source of truth. Let me know if this is the wrong expectation.

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants