-
-
Notifications
You must be signed in to change notification settings - Fork 746
Fix Auto-Decoding Issue With DateTime From String Methods #6073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
DateTime's from string methods use countUntil in order to check for the seperator between the date and the time strings. This is fine until the result is used to slice the original string, which can cause incorrect results. This is due to the problem that countUntil gives the number of code points until the Needle and not the number of code units, which is the how it's sliced. Normally this doesn't actually pose a problem because actual time strings only contain ASCII characters, but this does fix two things 1. No more auto decoding in the function, so it's faster 2. Better error messages if there are non-ASCII characters in the string because the break-up is correct
|
Thanks for your pull request, @JackStouffer! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. |
quickfur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and good catch!
In any case, countUntil on autodecoded strings is really the wrong tool to use. That's what std.string.indexOf is for. But since we're killing autodecoding as well, let's just go with that.
|
P.S. perhaps a unittest might be in order, to prevent future regressions. |
|
Off-topic remark: I wonder how many more nails we need to drive into autodecoding before it tips the balance towards actively killing it with fire... Just a wishful thought. :-P |
|
I remember convincing Andrei that the right way forward on that front was to have his |
|
On my benchmarks, This is going to look like a mistake, but I made sure that the code was actually executing import std.stdio;
import std.algorithm;
import std.conv;
import std.ascii;
import std.range;
import std.traits;
import std.string;
import std.datetime.date;
import std.datetime.stopwatch;
import std.utf;
enum testCount = 20_000_000;
__gshared immutable a = "2010-07-04T07:06:12";
void main()
{
long res;
auto result = to!Duration(benchmark!(() => res += a.indexOf('T'))(testCount)[0]);
auto result2 = to!Duration(benchmark!(() => res += a.byCodeUnit.countUntil('T'))(testCount)[0]);
writeln("testing execution ", res);
writeln("index", "\t", result);
writeln("count", "\t", result2);
} |
|
|
This is probably due to |
|
@JackStouffer When benchmarks show zero measurements, it means you're not running enough iterations for the cost to show through. :-P Either that, or the code actually isn't executing, e.g., if you made a careless mistake. |
|
@quickfur I showed that the code was running with the |
|
Hmm. That's fishy. Did you look at the disassembly? Perhaps LDC cheated and precomputed stuff at compile-time and just inlined the result. :-P |
That's exactly what happened. Because it was module level immutable LDC precomputed the result and substituted the operation with Anyway, |
|
Yeah, highly-optimizing compilers sometimes go to great lengths to optimize the heck out of something. That's why sometimes you see benchmarks containing code explicitly designed to foil the optimizer, because otherwise you won't be able to measure what you think you're measuring. :-P |
DateTime's from string methods usecountUntilin order to check for the seperator betweenthe date and the time strings. This is fine until the result is used to slice the original
string, which can cause incorrect results. This is due to the problem that
countUntilgivesthe number of code points until the Needle and not the number of code units, which is the
how it's sliced.
Normally this doesn't actually pose a problem because actual time strings only contain ASCII
characters, but this does fix two things