Skip to content

Conversation

@wenshao
Copy link
Contributor

@wenshao wenshao commented Sep 11, 2023

improve date toString performance, includes:

java.util.Date.toString
java.util.Date.toGMTString
java.time.Instant.toString
java.time.LocalDate.toString
java.time.LocalDateTime.toString
java.time.LocalTime.toString


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8315999: Improve Date toString performance (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15658/head:pull/15658
$ git checkout pull/15658

Update a local copy of the PR:
$ git checkout pull/15658
$ git pull https://git.openjdk.org/jdk.git pull/15658/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15658

View PR using the GUI difftool:
$ git pr show -t 15658

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15658.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 11, 2023

👋 Welcome back wenshao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 11, 2023
@openjdk
Copy link

openjdk bot commented Sep 11, 2023

@wenshao The following labels will be automatically applied to this pull request:

  • core-libs
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added security security-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Sep 11, 2023
@mlbridge
Copy link

mlbridge bot commented Sep 11, 2023

@wenshao
Copy link
Contributor Author

wenshao commented Sep 11, 2023

@liach can you help me to review this PR?

@wenshao
Copy link
Contributor Author

wenshao commented Sep 11, 2023

@cl4es can you help me to review this PR?

@liach
Copy link
Member

liach commented Sep 11, 2023

I think you should make this a dependent of #15651, so you can change the base branch of pr to openjdk:pr/15651 and you can base your work off that pull request too.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 11, 2023

I think you should make this a dependent of #15651, so you can change the base branch of pr to openjdk:pr/15651 and you can base your work off that pull request too.

After PR #15651 is merged, changes will be made based on the new API. Until then, you can help review other changes.

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @liach says this should be done on top of #15651. As it stands now a thorough reviews seems a bit premature.

Correct me if I'm wrong but the gains come from inlining code into the various toString methods to reduce need to go through DateTimeFormatters - which allocate StringBuilder and might go through chains of CompositePrinterParser and such.. It seems that before inlining and duplicating logic - making the code overall more fragile and harder to maintain - that we ought to see if we can get close enough by optimizing DateTimeFormatter and the corresponding builders to produce types that may optimize better.

*/
@Override
public String toString() {
return DateTimeFormatter.ISO_INSTANT.format(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered potentially more generalizable optimizations to DateTimeFormatter.ISO_INSTANT.format(this) here?

Hand-rolling a fixed-length buffer, skipping the StringBuilder .. understandably this can have a performance edge, but perhaps a DateTimeFormatter like ISO_INSTANT can be optimized to get closer to whatever speed-up this gets you - with broader implications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance of optimizing DateTimeFormatter cannot be as fast as using ixed-length buffer directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, the optimization of DateTimeFormatter is more general, and we can spend time doing it later. The format of toString is fixed, we can not use DateTimeFormatter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered potentially more generalizable optimizations to DateTimeFormatter.ISO_INSTANT.format(this) here?

Hand-rolling a fixed-length buffer, skipping the StringBuilder .. understandably this can have a performance edge, but perhaps a DateTimeFormatter like ISO_INSTANT can be optimized to get closer to whatever speed-up this gets you - with broader implications.

I submitted an optimization for the commonly used format of DateTimeFormatter::format

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a link to that PR? Is there an RFE filed for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization of DateTimeFormatter::format should be another PR, I created a branche but the work is unfinished.

if (yearValue > 9999) {
buf.append('+');
if (year > 9999) {
buf[off] = '+';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buf[off++] = '+';?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this place doesn't need off++

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, though it's a bit opaque that yearSize includes room for the '+' that gets added on years > 9999 but that jla.getChars won't print that. This makes the logic somewhat fragile, which I think could be improved.

int yearAbs = Math.abs(year);
if (yearAbs < 1000) {
if (year < 0) {
buf[off] = '-';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buf[off++] = '-';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this place doesn't need off++

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a suggestion, implicitly paired with removing + (year < 0 ? 1 : 0) from line 2188.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not using off++ is that it can be executed out of order, which may result in better performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that would outweigh the extra branch and the increased code size. I'd opt for simplicity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified LocalDate::getChars based on your suggestion

@liach
Copy link
Member

liach commented Sep 11, 2023

After PR #15651 is merged, changes will be made based on the new API. Until then, you can help review other changes.

I mean, OpenJDK's GitHub bots has a special handling system that allows you to include code from #15651 in this PR:

  1. Merge your local reduce_duplicate branch into your optimization date to string branch
  2. Edit PR title image
  3. Change PR branch image

Then OpenJDK bot will automatically add that you have a prerequisite in the issue body, and after merging #15651, this PR will automatically target against master again.

The main advantage is that PR's "File changed" on GitHub and webrevs will not include changes from #15651, so we can save review time and you don't have to update this PR after #15651 is merged.


OpenJDK的GitHub机器人能自动处理有前置依赖的补丁:
1、先把 #15651reduce_duplicateGit分支合并到你现在这个optim_for_date_to_string_2分支里面,上传
2、按Pull Request的标题边上的"Edit"修改按钮,可以修改对应分支
3、把目标分支修改为pr/15651

这样下来OpenJDK机器人会自动标注你这个补丁的前置依赖,然后依赖合并(integrated)后会自动把目标分支重新改为master。

这个的优点是"Files changed"文件改动页面不会显示前置依赖的变动,review更方便,同时你也不用在前置合并后重新改代码。

@wenshao
Copy link
Contributor Author

wenshao commented Sep 11, 2023

benchmark script

1. Script

bash configure
make images
sh make/devkit/createJMHBundle.sh
bash configure --with-jmh=build/jmh/jars
make test TEST="micro:java.time.ToStringBench.*

2. MacBookPro M1 Pro Benchmark

-Benchmark                                 Mode  Cnt    Score    Error   Units (baseline)
-ToStringBench.testInstantToString        thrpt   15    3.900 ?  0.289  ops/ms
-ToStringBench.testLocalDateTimeToString  thrpt   15   11.081 ?  0.141  ops/ms
-ToStringBench.testLocalDateToString      thrpt   15   18.566 ?  1.852  ops/ms
-ToStringBench.testLocalTimeToString      thrpt   15    8.428 ?  1.066  ops/ms
-ToStringBench.testZoneOffsetOffHours     thrpt   15  115.342 ? 13.978  ops/ms
-ToStringBench.testZonedDateTimeToString  thrpt   15    4.669 ?  1.986  ops/ms

+Benchmark                                 Mode  Cnt    Score   Error   Units (PR cc82e8c1)
+ToStringBench.testInstantToString        thrpt   15   47.971 ? 0.131  ops/ms (+1130.02%)
+ToStringBench.testLocalDateTimeToString  thrpt   15   71.310 ? 0.633  ops/ms (+543.53%)
+ToStringBench.testLocalDateToString      thrpt   15  108.236 ? 1.613  ops/ms (+482.97%)
+ToStringBench.testLocalTimeToString      thrpt   15  128.264 ? 1.052  ops/ms (+1421.87%)
+ToStringBench.testZoneOffsetOffHours     thrpt   15  694.463 ? 2.671  ops/ms (+502.29%)
+ToStringBench.testZonedDateTimeToString  thrpt   15   46.975 ? 0.850  ops/ms (+906.10%)

@wenshao wenshao changed the base branch from master to pr/15651 September 11, 2023 16:28
@openjdk
Copy link

openjdk bot commented Sep 11, 2023

⚠️ @wenshao This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@openjdk-notifier openjdk-notifier bot changed the base branch from pr/15651 to master September 12, 2023 16:38
@openjdk-notifier
Copy link

The parent pull request that this pull request depends on has now been integrated and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork:

git checkout optim_for_date_to_string_2
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push

# Conflicts:
#	src/java.base/share/classes/java/lang/StringUTF16.java
#	src/java.base/share/classes/jdk/internal/util/DecimalDigits.java
Copy link

@ExE-Boss ExE-Boss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalTime::getNanoChars(byte[], int, int) can use DecimalDigits::digitTriple(int) instead of a local copy of DecimalDigits.DIGITS_K:

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's still too much going on in this PR, and it should probably be split up into multiple PRs.

I think some of this is going a bit overboard. For starters I don't want to see us adding a 1000-element lookup table for "DecimalDigits::digitTriple" without significant evidence that that is worth it at application level. As has been pointed out elsewhere lookup tables are prone to look great on microbenchmarks but may behave poorly and even regress real world applications by competing for cache. Any further additions needs to be justified with a thorough examination and benchmarks that better simulate noisy real world applications.

*/
@Override
public String toString() {
return DateTimeFormatter.ISO_INSTANT.format(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a link to that PR? Is there an RFE filed for it?

@wenshao
Copy link
Contributor Author

wenshao commented Sep 13, 2023

Do you have a link to that PR? Is there an RFE filed for it?

@cl4es Optimization of DateTimeFormatter::format should be another PR, I created a branche but the work is unfinished.

@AlanBateman
Copy link
Contributor

/reviewers 2

@openjdk
Copy link

openjdk bot commented Sep 13, 2023

@AlanBateman
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@RogerRiggs
Copy link
Contributor

Based on the use cases cited, if your library needs specific performance improvements for specific formats, they should be done in that library.

@RogerRiggs
Copy link
Contributor

As a consideration to core-libs-dev readers, push commits when you are convinced are ready for review and you don't intend to make more changes. It will improve the signal to noise ratio. Thanks

@wenshao
Copy link
Contributor Author

wenshao commented Sep 13, 2023

Based on the use cases cited, if your library needs specific performance improvements for specific formats, they should be done in that library.

I already do this in fastjson2 for now, but more libraries would benefit if improved in jdk.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 13, 2023

As a consideration to core-libs-dev readers, push commits when you are convinced are ready for review and you don't intend to make more changes. It will improve the signal to noise ratio. Thanks

As a consideration to core-libs-dev readers, push commits when you are convinced are ready for review and you don't intend to make more changes. It will improve the signal to noise ratio. Thanks

Sorry, I will make sure to do more preparation before submitting any PRs in the future.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 13, 2023

Based on the use cases cited, if your library needs specific performance improvements for specific formats, they should be done in that library.

Not only JSON libraries, toString optimization of Date/Instant/LocalDateTime and other classes will benefit in many places, such as logging in business systems, etc.

Instant bizTime = ...;
LOG.log(Level.INFO, "bizDate {0}", bizTime);

@wenshao
Copy link
Contributor Author

wenshao commented Sep 16, 2023

/label remove security

@openjdk openjdk bot removed the security security-dev@openjdk.org label Sep 16, 2023
@openjdk
Copy link

openjdk bot commented Sep 16, 2023

@wenshao
The security label was successfully removed.

@RogerRiggs
Copy link
Contributor

RogerRiggs commented Sep 19, 2023

Lots of adhoc changes bulk up the source and make it less maintainable.
Given the common element of time and date formatting strings as a domain specific language, there would be more benefit to leveraging one of several existing mechanisms to generate optimized formatters from application or library format strings.
For example, the general purpose JEP 430: String Templates could be used to compose the string and leverage its optimizations of formatting the components.
Making improvements there either to performance or functionality would benefit more applications.

This also applies to PR#15722.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 24, 2023

Lots of adhoc changes bulk up the source and make it less maintainable. Given the common element of time and date formatting strings as a domain specific language, there would be more benefit to leveraging one of several existing mechanisms to generate optimized formatters from application or library format strings. For example, the general purpose JEP 430: String Templates could be used to compose the string and leverage its optimizations of formatting the components. Making improvements there either to performance or functionality would benefit more applications.

This also applies to PR#15722.

I submitted PR #15776 (Regex-free parsing of Formatter and FormatProcessor specifiers) to optimize the parse performance of String.format and StringFormatter. I submitted another PR #15699, the refactored code is related to String Template.
After these two PRs are completed, I will continue to optimize java.util.Formatter.print and improve the performance of String.format and StringTemplate.

However, String.format and StringTemplate are different scenarios from this PR and PR#15722. Date/LocalDateTime/Instant#toString and DateTimeFormatter#format are very widely used, more widely used than String.format(Date/LocalDateTme/Instant). These Usage scenarios usually don't use String.format and StringTemplate instead.

The performance of using String.format and StringTemplate is not as good as using formatter and toString directly.

If you feel that the maintainability of these codes is not good enough, for example, we should reduce the use of ByteArrayLittleEndian, I can continue to improve these.

Because Date/LocalDateTime/Instant#toString and DateTimeFormatter#format are so widely used, I have also seen these methods taking a lot of time in flame graphs in business systems. I submitted this PR based on my business system optimization experience, so The optimization will directly improve the performance of the business system in certain scenarios.

@liach
Copy link
Member

liach commented Sep 24, 2023

For DateTimeFormatter, it first isn't effectively constant since its composite printerparser isn't immutable/stable. Also we can probably try if we can pre-allocate like for string concat to see if eliminated allocations help. For StringBuilder allocation, if JIT cannot eliminate it, we can probably switch to writing byte array + index since the interface in Formatter seems to be internal.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 24, 2023

For DateTimeFormatter's parse and format, an effective way to improve performance is to perform fast-path optimization on commonly used patterns. PR #15658 is the fast-path of DateTimeFormatter#format for commonly used patterns.

I also found a way to find the ZoneOffset faster for ZoneIds without daylight saving time, for example Asia/Shanghai.

The combination of these optimizations can make the performance of DateTime's parse and format very fast.

In the baseline code, only Instant#toString uses DateTimeFormatter. The toString Method of Date/LocalDate/LocalDateTime does not use DateTimeFormatter. The current implementation of DateTimeFormatter is not fast enough. After we improve the performance of DateTimeFormatter, we can implement the toString methods of these classes based on DateTimeFormatter.

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 22, 2023

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 19, 2023

@wenshao This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

6 participants