Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #438 - Corrections to BigDecimal consistency #440

Merged
merged 5 commits into from
Oct 6, 2018

Conversation

johnjaylward
Copy link
Contributor

@johnjaylward johnjaylward commented Oct 2, 2018

What problem does this code solve?

  • Fixes opt/getBigDecimal to be consistent
  • Updates get<NumberType> operations to return the result of getNumber for consistency and less copy/paste of code
  • Updates opt<NumberType> operations to return the result of optNumber for consistency and less copy/paste of code
  • Performance: Updates JSONWriter to use a regex to decide if writing as a number is best.

Risks
Due to the incorrect behavior of getBigDecimal, some code may now return different values than previously. The new behavior tries to keep the underlying value as close as possible, same as the optBigDecimal implementation. This means that values like 1.234f will no longer return as new BigDecimal("1.234") and will instead return as new BigDecimal(1.234f). The caller of get/optBigDecimal can then choose the number of significant digits they need by using the BigDecimal.setScale and choosing a rounding mode.

Changes to the API?
no.

Will this require a new release?
yes.

Should the documentation be updated?
Updated JavaDoc to reflect the possible inconsistencies callers may see when using get/optBigDecimal

Does it break the unit tests?
No. There was no test case that covered verifying that opt/getBigDecimal values matched. New test cases were added. See stleary/JSON-Java-unit-test#88

Was any code refactored in this commit?
Yes,

  • JSONObject.writeValue and JSONWriter.valueToString. I updated how numbers are validated before outputting them. We now use a regex that properly matches JSON Numbers instead of the BigDecimal constructor. Performance testing between the two implementations shows the regex offers significant speed gains over the BigDecimal constructor in most use cases. (see the new test case in the test case PR). For the most common cases, the RegEx offers about 80% less runtime (9355ms for BigDecimal vs 1902ms RegEx on my machine). Adding in the worst case scenario (an extremely long string that looks like a number until the last digit) all the test cases ran at approximately 12417ms for BigDecimal and 10039ms for the RegEx. The savings are still there, but not nearly as significant.
  • The get<numberType> and opt<number type> methods have been changed to use the common getNumber and optNumber under the hood. Functionality should be identical, but there is now less duplicated code and a single (two really...) place where numbers are possibly coerced from other types instead of it happening in many code points.

Review status
ACCEPTED

@johnjaylward
Copy link
Contributor Author

forgot to update JSONArray. I'll do that quick now.

@johnjaylward johnjaylward force-pushed the FixForBigDecimal branch 2 times, most recently from cdb3bfb to 6661986 Compare October 2, 2018 17:44
@johnjaylward johnjaylward changed the title Fixes #438 Fixes #438 - Corrections to BigDecimal consistency Oct 2, 2018
@johnjaylward johnjaylward force-pushed the FixForBigDecimal branch 5 times, most recently from 46e7e2f to afcd9ab Compare October 2, 2018 19:23
John J. Aylward added 3 commits October 2, 2018 15:28
* Performance: Updates JSONWriter to use a regex to decide if writing as a number is best.
@stleary
Copy link
Owner

stleary commented Oct 4, 2018

Accepted, starting 3 day comment window. Thanks @johnjaylward, changes look good.

@johnjaylward
Copy link
Contributor Author

@stleary can you review this latest commit too. fine if you want to restart the 3 day window. I was making sure we didn't have more cases of inconsistencies with numbers and decided to unify the Number methods a little more.

@stleary
Copy link
Owner

stleary commented Oct 5, 2018

No worries, changes look reasonable, will verify the unit tests tonight.

@stleary stleary merged commit 1a811f1 into stleary:master Oct 6, 2018
Copy link

@guidomedina guidomedina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another drawback of this code clean up is, say you have the following Json, a number represented as String which I have seen in some APIs:

{ "a" : "3"}

because native calls to Xxx.parseXxx were removed for the following case an intermediary instance of Number will always be created generating more garbage to be GCed, in other words; you are always "boxing" to Number and unboxing to the request native value.

JSONArray.java Show resolved Hide resolved
@johnjaylward
Copy link
Contributor Author

Another drawback of this code clean up is, say you have the following Json, a number represented as String which I have seen in some APIs:

{ "a" : "3"}

because native calls to Xxx.parseXxx were removed for the following case an intermediary instance of Number will always be created generating more garbage to be GCed, in other words; you are always "boxing" to Number and unboxing to the request native value.

The values are all stored in a Map object, which means they are all boxed anyway. I don't think I added any extra boxing in that wasn't present before. Are there any areas you can point out specifically?

@guidomedina
Copy link

guidomedina commented Aug 14, 2019

In the example I put a = "3" which is a String, in your case it is first wrapped into a Number and then the native value is returned, let's assume the number is Integer which is the extra instance.

In the old code String "3" would be passed to Integer.parseInt("3") which returns the a native int without any object allocation.

Please note that I'm talking about the Json {"a" : "3"} and not {"a" : 3}, the 1st example will store a String in the Map, the 2nd example which is the standard and most common it will be indeed a subclass of Number, because number as a String is supported by this API it is only natural to take this case into consideration as well.

@johnjaylward
Copy link
Contributor Author

Thanks, I'll take a look. That was probably an over site on my part.

@johnjaylward johnjaylward deleted the FixForBigDecimal branch August 14, 2019 16:09
@johnjaylward
Copy link
Contributor Author

johnjaylward commented Aug 14, 2019

@guidomedina , I looked closer at the full implementation and I believe the reason I made for going this route was as follows:

  • A JSON Text of {"key":"3"} will usually be passed to a JSON object like so: jo = New JSONObject("{\"key\":\"3\"}");. The JOSN parser will parse the String 3 into an Integer 3 and store the Integer in the map.

The only way we would see the conversion happening in the get/opt and making extra Objects for GC, would be something like these cases:
Conversion from XML:

String xml = "<root><key>3</key></root>";
JSONObject jo = XML.ToJSONObject(xml, true); // keep all values as strings
jo.getInt(key); // creates the intermediate Number which then calls Number.intValue()

Custom built up JSONObject/array:

JSONObject jo = new JSONObject();
jo.put("key", "3"); // explicitly put the value as a string
jo.getInt("key"); // creates the intermediate Number which then calls Number.intValue()

The first one could be avoided by pre-processing the string using XML.ToJSONObject(xml), but that comes with it's own issues.

The second one seems totally avoidable altogether if the coder had planed out ahead of time what types go into the keys instead of lazily putting in a String and depending on the conversions. The "parse" of the sting will likely take up much more CPU time than the Number intermediary creation and GC, but that would probably be best to actually write some performance tests against instead of making generalizations.

@guidomedina
Copy link

guidomedina commented Aug 14, 2019

This is not always the case, in the utility industry in general IoT providers come in many flavors, we cannot control how a third party provides us their Json.

Creating an intermediary instance of any of the Number sub-classes has to parse first and then wrap the native value within a reference.

See, Number is an interface:

public static Integer valueOf(String s) throws NumberFormatException  {
    return new Integer(parseInt(s, 10));
}

@johnjaylward
Copy link
Contributor Author

I double checked the implementation, and I misspoke. A call like this JSONObject jo = new JSONObject("{\"key\":\"3\"}"); will always store the "3" as a String. The parser only trys finding/coercing values that are unquoted, and if that fails, uses it as a string. So JSONObject jo = new JSONObject("{\"key\":3a}"); where 3a is unquoted in the JSON would try to be parsed as a Number first but fails, then will be stored as a String.

I'll look into updating the calls to get the extra Number instances removed. I don't really like so much copy/paste code, but I'm not seeing a more generic way of doing it.

@guidomedina
Copy link

There isn't really a generic way to do this unfortunately, see this other implementation for example: https://github.com/eclipse-vertx/vert.x/blob/master/src/main/java/io/vertx/core/json/JsonObject.java

@johnjaylward
Copy link
Contributor Author

@guidomedina I finally remembered my final reasoning for changing the functions to always use the get/optNumber functions.

Previously, the parser for get/optInt would call Integer.parseInt(string). This would fail for things like {"key":"3.14"}. However, the same request for opt/getBigInteger would return a BigInteger with a value of 3. By switching to the more generic method and using the Number interface to get the underlying values, I was able to make these calls consistent.

jo = new JSONObject("{\"key\":\"3.14\"}");
jo.optInt("key", 0); // before the change, `0`, after the change `3`
jo.optLong("key", 0); // before the change, `0`, after the change `3`
jo.optBigInteger("key", 0); // before the change, `3`, after the change `3`

The end result was that we ended up with more GC, but more consistent functionality.

@johnjaylward
Copy link
Contributor Author

Would you prefer the get<Number> methods to have the harder checks (i.e. no coercion from float/double to int) while the opt<Number> methods retain their more lax approach?

The changes in this PR aren't quite what I mentioned above (it was a few PRs to get where the current code is). This one, the opt<Number> methods where specifically changed from using a BigDecimal parser then down casting to an int via BigDecimal.intValue. In both cases (BigDecimal call directly, or the optNumber call, we end up with a short lived intermediate object to get the double/float/long/int.

Personally, I feel that the opt/get methods should use the same type conversions in the background, so using the same example {"key":"3.14"}, a call to getInt and optInt should return the same value of 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants