Skip to content

Conversation

@JackStouffer
Copy link
Contributor

DateTime's from string methods use countUntil in order to check for the seperator between
the date and the time strings. This is fine until the result is used to slice the original
string, which can cause incorrect results. This is due to the problem that countUntil gives
the number of code points until the Needle and not the number of code units, which is the
how it's sliced.

Normally this doesn't actually pose a problem because actual time strings only contain ASCII
characters, but this does fix two things

  1. No more auto decoding in the function, so it's faster
  2. Better error messages if there are non-ASCII characters in the string because the break-up is correct

DateTime's from string methods use countUntil in order to check for the seperator between
the date and the time strings. This is fine until the result is used to slice the original
string, which can cause incorrect results. This is due to the problem that countUntil gives
the number of code points until the Needle and not the number of code units, which is the
how it's sliced.

Normally this doesn't actually pose a problem because actual time strings only contain ASCII
characters, but this does fix two things

1. No more auto decoding in the function, so it's faster
2. Better error messages if there are non-ASCII characters in the string because the break-up is correct
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @JackStouffer!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@JackStouffer JackStouffer requested a review from jmdavis as a code owner January 26, 2018 15:26
Copy link
Member

@quickfur quickfur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and good catch!

In any case, countUntil on autodecoded strings is really the wrong tool to use. That's what std.string.indexOf is for. But since we're killing autodecoding as well, let's just go with that.

@quickfur
Copy link
Member

P.S. perhaps a unittest might be in order, to prevent future regressions.

@quickfur
Copy link
Member

Off-topic remark: I wonder how many more nails we need to drive into autodecoding before it tips the balance towards actively killing it with fire... Just a wishful thought. :-P

@JackStouffer
Copy link
Contributor Author

I remember convincing Andrei that the right way forward on that front was to have his RCStr type not have a range interface by default, but to require people to explicitly define the iteration when passed into functions, e.g. func(myrcstr.byGrapheme);/func(myrcstr.byCodePoint); instead of func(myrcstr);

@JackStouffer
Copy link
Contributor Author

On my benchmarks, byCodeUnit.countUntil is way faster on LDC

This is going to look like a mistake, but I made sure that the code was actually executing

$ ldc2 -O -release test.d && ./test
testing execution 400000000
index	24 ms, 247 μs, and 8 hnsecs
count	0 hnsecs
import std.stdio;
import std.algorithm;
import std.conv;
import std.ascii;
import std.range;
import std.traits;
import std.string;
import std.datetime.date;
import std.datetime.stopwatch;
import std.utf;

enum testCount = 20_000_000;
__gshared immutable a = "2010-07-04T07:06:12";

void main()
{
    long res;
    auto result = to!Duration(benchmark!(() => res += a.indexOf('T'))(testCount)[0]);
    auto result2 = to!Duration(benchmark!(() => res += a.byCodeUnit.countUntil('T'))(testCount)[0]);
    writeln("testing execution ", res);

    writeln("index", "\t", result);
    writeln("count", "\t", result2);
}

@JackStouffer
Copy link
Contributor Author

$ dmd -O -inline -release test.d && ./test
testing execution 400000000
index	159 ms, 157 μs, and 7 hnsecs
count	310 ms, 567 μs, and 3 hnsecs

@JackStouffer
Copy link
Contributor Author

This is probably due to countUntil using find under the hood if length is defined, which can be inlined while memchr in indexOf can't.

@quickfur
Copy link
Member

@JackStouffer When benchmarks show zero measurements, it means you're not running enough iterations for the cost to show through. :-P Either that, or the code actually isn't executing, e.g., if you made a careless mistake.

@JackStouffer
Copy link
Contributor Author

@quickfur I showed that the code was running with the writeln("testing execution ", res); Going all the way up to uint.max iterations still yielded 0.

@JackStouffer JackStouffer deleted the datetime-autodecoding branch January 26, 2018 17:35
@quickfur
Copy link
Member

Hmm. That's fishy. Did you look at the disassembly? Perhaps LDC cheated and precomputed stuff at compile-time and just inlined the result. :-P

@JackStouffer
Copy link
Contributor Author

Perhaps LDC cheated and precomputed stuff at compile-time and just inlined the result. :-P

That's exactly what happened. Because it was module level immutable LDC precomputed the result and substituted the operation with 10. Wow, I didn't think LDC was that smart!

Anyway, indexOf has a tiny edge on byCodeUnit.countUntil, but I had to increase the iterations to one billion to see it.

@quickfur
Copy link
Member

Yeah, highly-optimizing compilers sometimes go to great lengths to optimize the heck out of something. That's why sometimes you see benchmarks containing code explicitly designed to foil the optimizer, because otherwise you won't be able to measure what you think you're measuring. :-P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants