-
-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std.regex: major internal redesign, also fixes issue 13532 #5722
std.regex: major internal redesign, also fixes issue 13532 #5722
Conversation
Thanks for your pull request, @DmitryOlshansky! Bugzilla references
|
b3ca451
to
dc32bbe
Compare
671a5f7
to
b6bfd87
Compare
Sadly AFAIK CodeCov doesn't support this. I was working on querying the API from dlang bot and sending a similar CI status which is in our control. |
4f8f3e3
to
542d1b8
Compare
Wo-ho. Jenkins test succeeded. With auto-tester tackled, I think I'm done here. UPDATE: still has bugs on Win32.. working on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am not very familiar with std.regex
, but most changes are rather trivial.
One a first pass, I only found a couple of nits, but it would be great if someone experienced with std.regex
could have an eye on this overall big change as well.
std/regex/internal/backtracking.d
Outdated
@@ -716,7 +718,7 @@ template BacktrackingMatcher(bool CTregex) | |||
debug(std_regex_matcher) writeln("pop array SP= ", lastState); | |||
} | |||
|
|||
static if (!CTregex) | |||
static if (true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents you from removing the static if
entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehm. Will revisit this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the only thing missing. If you don't feel like trying to push this we could, of course, also remove this afterwards in a separate PR
std/regex/internal/backtracking.d
Outdated
{ | ||
_refCount = 1; | ||
re = program; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving these two lines to initialize
? They are repeated in all constructors ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re is const so can't assign to it outside of constructor. A very unpleasant limitation
charsets = re.charsets; | ||
foreach (ref set; re.charsets) | ||
{ | ||
charsets ~= set.intervals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@UplinkCoder will be happy about all this allocation at CTFE ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's the number of character classes in the regex pattern. I hope it's in range of 10s-100s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about
auto oldLength = charsets.length;
charsets.length += re.charsets.length
charsets[oldLength .. $] = re.charsets[];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefan-koch-sociomantic Not bad, honestly this piece of code isn't anywhere near top of performance sensitive functions. Trie construction is.
See code around this line:
https://github.com/dlang/phobos/blob/master/std/uni.d#L3871
std/regex/internal/tests.d
Outdated
import std.conv : to; | ||
enum re1 = ctRegex!`[0-9][0-9]`; | ||
immutable static re2 = ctRegex!`[0-9][0-9]`; | ||
immutable iterations = 1000_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: 1_000_000
std/regex/internal/tests.d
Outdated
assert(result1 == result2); | ||
auto ratio = 1.0 * enumTime.total!"usecs" / staticTime.total!"usecs"; | ||
// enum is faster or the diff is less < 30% | ||
assert(ratio < 1.0 || abs(ratio - 1.0) < 0.3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing that you experimented quite a bit with this, this still seems a bit fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it is. It seems I can't provide even basic performance guarantees in the unittest.
This is a problem
return ++m.refCount; | ||
} | ||
|
||
override size_t decRef(Matcher!Char m) const @trusted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use postblit this(this)
and destructor ~this()
instead of rolling your own mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ploymorphism! structs are not polymorphic, this class is then wrapped in a proper struct that does the inc/dec
// This only maintains internal ref-count, | ||
// deallocation happens inside MatcherFactory | ||
@property ref size_t refCount() @safe; | ||
// Copy internal state to another engine, using memory arena 'memory' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arena? Did you mean "zone" or "region"?
https://en.wikipedia.org/wiki/Region-based_memory_management
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really do not care.. a chunk of memory it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wilzbach area or region it's all the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but "arena" sounded strange. Note that I don't care about it, it just struck me while reading the diff
std/regex/internal/parser.d
Outdated
// check if we have backreferences, if so - use backtracking | ||
if (__ctfe) factory = null; // allows us to use the awful enum re = regex(...); | ||
else | ||
if (re.backrefed.canFind!"a != 0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Common Phobos style is to have else if
in the same line
{ | ||
auto r = cast() this; | ||
r.factory = factory; | ||
return r; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is a nice pattern, you completely work around const
here.
At least it's internal ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I feel like a compiler can prove this is legal iff all member of struct are value types or const references.
std/regex/package.d
Outdated
@@ -1530,15 +1468,15 @@ private: | |||
@trusted this(Range input, RegEx separator) | |||
{//@@@BUG@@@ generated opAssign of RegexMatch is not @trusted | |||
_input = input; | |||
separator.flags |= RegexOption.global; | |||
auto re = separator.withFlags(separator.flags | RegexOption.global); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/auto/const
634883f
to
5b3e698
Compare
Disable "benchmark" in unittest, it's too volatile with different compiler flags Also use GC.addRange/GC.removeRange
5b3e698
to
41c2296
Compare
Nay, sadly this random segfault has been appearing in the last two weeks. I nor anyone else hasn't had time to dive into. However, it only happens for every third run ... |
Alright, I think I addressed all of comments. @CyberShadow @wilzbach @ZombineDev |
So why does this fail jenkins and codecov? |
I'd say because it sees comments as uncovered code ? |
Jenkin passes just fine. Codecov is a mess and doesn't work with std.regex package structure. |
Jup sadly that's true, but it's only intended as tool to help reviewers anyhow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should let @DmitryOlshansky move forward with this. The only thing I saw was the static if (true)
, but than one can be addressed later as well...
Should somehow indicate that it's optional. The common wisdom is red cross - no merge. |
std/regex/internal/backtracking.d
Outdated
tmp.initExternalMemory(memory); | ||
return tmp; | ||
auto backtracking = cast(BacktrackingMatcher) m; | ||
backtracking.s = s; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with (backtracking)
comes to mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then
with(backtracking) {
s = s;
}
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah one name would need to be changed :)
886d99c
to
13ea5f9
Compare
@wilzbach
The code looks like this: @trusted bool func(BacktrackingMatcher!Char matcher)
{
debug(std_regex_ctr) pragma(msg, source);
mixin(source);
} I'm f**king tired of optimistic checks landing into our CI infrastructure. |
0f86e7d
to
a292141
Compare
a292141
to
f4c963b
Compare
Great stuff, looks like it regressed our funky dlang-bot regex-split-joiner again. |
There is also a reported performance regression @DmitryOlshansky. |
Finally I pulled this one off :
More importantly this opens up future enhancements: