Fix Auto-Decoding Issue With DateTime From String Methods #6073

JackStouffer · 2018-01-26T15:26:55Z

DateTime's from string methods use countUntil in order to check for the seperator between
the date and the time strings. This is fine until the result is used to slice the original
string, which can cause incorrect results. This is due to the problem that countUntil gives
the number of code points until the Needle and not the number of code units, which is the
how it's sliced.

Normally this doesn't actually pose a problem because actual time strings only contain ASCII
characters, but this does fix two things

No more auto decoding in the function, so it's faster
Better error messages if there are non-ASCII characters in the string because the break-up is correct

DateTime's from string methods use countUntil in order to check for the seperator between the date and the time strings. This is fine until the result is used to slice the original string, which can cause incorrect results. This is due to the problem that countUntil gives the number of code points until the Needle and not the number of code units, which is the how it's sliced. Normally this doesn't actually pose a problem because actual time strings only contain ASCII characters, but this does fix two things 1. No more auto decoding in the function, so it's faster 2. Better error messages if there are non-ASCII characters in the string because the break-up is correct

dlang-bot · 2018-01-26T15:26:55Z

Thanks for your pull request, @JackStouffer!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

quickfur

LGTM, and good catch!

In any case, countUntil on autodecoded strings is really the wrong tool to use. That's what std.string.indexOf is for. But since we're killing autodecoding as well, let's just go with that.

quickfur · 2018-01-26T16:22:18Z

P.S. perhaps a unittest might be in order, to prevent future regressions.

quickfur · 2018-01-26T16:22:59Z

Off-topic remark: I wonder how many more nails we need to drive into autodecoding before it tips the balance towards actively killing it with fire... Just a wishful thought. :-P

JackStouffer · 2018-01-26T16:28:47Z

I remember convincing Andrei that the right way forward on that front was to have his RCStr type not have a range interface by default, but to require people to explicitly define the iteration when passed into functions, e.g. func(myrcstr.byGrapheme);/func(myrcstr.byCodePoint); instead of func(myrcstr);

JackStouffer · 2018-01-26T16:39:47Z

On my benchmarks, byCodeUnit.countUntil is way faster on LDC

This is going to look like a mistake, but I made sure that the code was actually executing

$ ldc2 -O -release test.d && ./test
testing execution 400000000
index	24 ms, 247 μs, and 8 hnsecs
count	0 hnsecs

import std.stdio;
import std.algorithm;
import std.conv;
import std.ascii;
import std.range;
import std.traits;
import std.string;
import std.datetime.date;
import std.datetime.stopwatch;
import std.utf;

enum testCount = 20_000_000;
__gshared immutable a = "2010-07-04T07:06:12";

void main()
{
    long res;
    auto result = to!Duration(benchmark!(() => res += a.indexOf('T'))(testCount)[0]);
    auto result2 = to!Duration(benchmark!(() => res += a.byCodeUnit.countUntil('T'))(testCount)[0]);
    writeln("testing execution ", res);

    writeln("index", "\t", result);
    writeln("count", "\t", result2);
}

JackStouffer · 2018-01-26T16:40:42Z

$ dmd -O -inline -release test.d && ./test
testing execution 400000000
index	159 ms, 157 μs, and 7 hnsecs
count	310 ms, 567 μs, and 3 hnsecs

JackStouffer · 2018-01-26T16:42:01Z

This is probably due to countUntil using find under the hood if length is defined, which can be inlined while memchr in indexOf can't.

quickfur · 2018-01-26T17:25:04Z

@JackStouffer When benchmarks show zero measurements, it means you're not running enough iterations for the cost to show through. :-P Either that, or the code actually isn't executing, e.g., if you made a careless mistake.

JackStouffer · 2018-01-26T17:34:58Z

@quickfur I showed that the code was running with the writeln("testing execution ", res); Going all the way up to uint.max iterations still yielded 0.

quickfur · 2018-01-26T17:46:57Z

Hmm. That's fishy. Did you look at the disassembly? Perhaps LDC cheated and precomputed stuff at compile-time and just inlined the result. :-P

JackStouffer · 2018-01-26T18:02:23Z

Perhaps LDC cheated and precomputed stuff at compile-time and just inlined the result. :-P

That's exactly what happened. Because it was module level immutable LDC precomputed the result and substituted the operation with 10. Wow, I didn't think LDC was that smart!

Anyway, indexOf has a tiny edge on byCodeUnit.countUntil, but I had to increase the iterations to one billion to see it.

quickfur · 2018-01-26T18:59:32Z

Yeah, highly-optimizing compilers sometimes go to great lengths to optimize the heck out of something. That's why sometimes you see benchmarks containing code explicitly designed to foil the optimizer, because otherwise you won't be able to measure what you think you're measuring. :-P

JackStouffer requested a review from jmdavis as a code owner January 26, 2018 15:26

quickfur approved these changes Jan 26, 2018

View reviewed changes

jmdavis approved these changes Jan 26, 2018

View reviewed changes

JackStouffer added Severity:Bug Fix Merge:auto-merge labels Jan 26, 2018

dlang-bot merged commit 242ef22 into dlang:master Jan 26, 2018

JackStouffer deleted the datetime-autodecoding branch January 26, 2018 17:35

JackStouffer mentioned this pull request Jan 29, 2018

Modified the behavior of std.stdio.LockingTextWriter.put #6092

Merged

Uh oh!

Fix Auto-Decoding Issue With DateTime From String Methods #6073

Fix Auto-Decoding Issue With DateTime From String Methods #6073

Uh oh!

Conversation

JackStouffer commented Jan 26, 2018

Uh oh!

dlang-bot commented Jan 26, 2018

Bugzilla references

Uh oh!

quickfur left a comment

Choose a reason for hiding this comment

Uh oh!

quickfur commented Jan 26, 2018

Uh oh!

quickfur commented Jan 26, 2018

Uh oh!

JackStouffer commented Jan 26, 2018

Uh oh!

JackStouffer commented Jan 26, 2018

Uh oh!

JackStouffer commented Jan 26, 2018

Uh oh!

JackStouffer commented Jan 26, 2018

Uh oh!

quickfur commented Jan 26, 2018

Uh oh!

JackStouffer commented Jan 26, 2018

Uh oh!

quickfur commented Jan 26, 2018

Uh oh!

JackStouffer commented Jan 26, 2018

Uh oh!

quickfur commented Jan 26, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants