Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning for float->string conversions. #212

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add warning for float->string conversions. #212

wants to merge 1 commit into from

Conversation

ryanleh
Copy link
Contributor

@ryanleh ryanleh commented Apr 19, 2021

Closes #211

@@ -21,6 +21,8 @@ Out of the existing `Spark SQL types <https://spark.apache.org/docs/latest/sql-r
- ``TimestampTime``, ``DateType``
- ``ArrayType``, ``MapType``

*Warning:* Conversions from ``FloatType`` to ``StringType`` may produce incorrect results in the least significant portion of the floating point value. See `this issue <https://github.com/mc2-project/opaque/issues/211>`_ for further details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "incorrect" is the right word to use here; after reading the issue, it seems that they're both the same float representation. Maybe something like "may produce a correct result rounded to a different decimal place than the one shown by FloatType"? But I also kind of don't think this warning is really necessary, since that should be pretty obvious when comparing show and collect results for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that "incorrect" is the wrong word. Maybe the rephrase could be:

"Converting a FloatType to StringType may display a result rounded to a different decimal place than the one shown by FloatType even though the internal representations are identical."

I don't know if it's good to assume a user would run both show and collect. Even if they did, I it's very confusing if there was a discrepancy between the two. Thoughts @wzheng ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point to consider is that show only has "inconsistent behavior" when a user explicitly compares the results to Spark SQL. Opaque's behavior is correct as a standalone SQL engine. So I'd add this point as well to specify what we're comparing to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error deserializing float to string
3 participants