Enhance std.string.indexOf() to work with ranges #3172

WalterBright · 2015-04-10T06:20:51Z

http://dlang.org/phobos/std_string.html#.indexOf

Currently it only works with string inputs.

It still throws on invalid UTF.

burner · 2015-04-10T07:54:42Z

std/string.d

Search"es" for "a" character in "a" range.

burner · 2015-04-10T08:02:31Z

I'm not sure about the merit of this function. I wonder why you would search for a index of a char in a ,maybe even temporary, range. Also this function is then basically the stop for every range chain.

Performance: have you checked the performance before and after? I hate to be pushy here but #2995

WalterBright · 2015-04-10T08:37:36Z

I updated it to remove the throwing and memory allocation. The documentation now says it assumes correct UTF. It assumed that before, throwing only on some invalid UTF.

The merit of this function is kind of irrelevant, it exists and we can't remove it. All the algorithms need to work with ranges.

I haven't benchmarked it, but I know it does less work because it doesn't decode unless necessary, whereas the former version always decoded.

burner · 2015-04-10T08:50:04Z

std/string.d

what is going on here? is there no std.(uni|utf) function to do this?

I suppose this qualifies as magic numbers. I would at least like to see a comment about what happens here.

nope

you'll see those numbers all over std.utf :-)

that is just sad

so what actually happens here

burner · 2015-04-10T08:51:24Z

The merit of this function is kind of irrelevant, it exists and we can't remove it. All the algorithms need to work with ranges.

This should be written in the vision document

maybe it doesn't always decode, but you added control-flow branches. you never know before you didn't benchmark

WalterBright · 2015-04-10T09:01:02Z

Those branches exist in the auto-decoder.

WalterBright · 2015-04-10T09:05:10Z

Updated with near 100% test coverage.

burner · 2015-04-10T09:18:09Z

Maybe I'm not clear. I'm not trying to argue that your code is bad, slow or didn't pay attention. What I'm trying to do is to say that you can only be sure about the performance if you benchmark. Which is the reason I created #2995. Even if you looked at the asm, you never know what the branch prediction of the cpu does.

Rant On

100% test coverage is a lie. Have you tested all possible combinations of all possible branches. If not, this is not 100% test coverage. This is maybe: "100% of the code was executed by the tests"

Rant Off

WalterBright · 2015-04-10T09:33:17Z

This should be written in the vision document

"We aim to make the standard library usable in its entirety without a garbage collector."
http://wiki.dlang.org/Vision/2015H1

We've been jawboning for years about getting this done and nothing has happened. I'm getting it done now.

WalterBright · 2015-04-10T09:40:42Z

What I'm trying to do is to say that you can only be sure about the performance if you benchmark.

100% sure, you are correct. But I've been around the block enough to know that less work == faster, nearly always. Also, this PR is not about making it faster. It's about that vision thing.

100% test coverage is a lie. Have you tested all possible combinations of all possible branches. If not, this is not 100% test coverage. This is maybe: "100% of the code was executed by the tests"

I use -cov to check the coverage. I know that doesn't prove all the logic is correct. It only proves that all lines of code were executed. That's what I meant by "coverage". It's only possible to test a function for all possible input for trivial functions.

On the other hand, my experience for decades with 100% test coverage (of lines executed) correlates strongly with very few bugs discovered later. It's a practical proxy.

If you run the Phobos unittests with -cov, you'll also discover that the line coverage is rather poor. I suspect I'm the only Phobos developer using -cov.

MartinNowak · 2015-04-10T10:42:30Z

I suspect I'm the only Phobos developer using -cov.

We have to make it more accessible, e.g. dmd -main - unittest -defaultlib= generated/linux/.../libphobos2.a -run std/string isn't something people will use. The other important part it, that pull requests should only improve coverage. This could be enforced by the autotester.
If you want that to happen, it needs to be a mechanical requirement.

MartinNowak · 2015-04-10T10:47:55Z

The bugzilla ticket to make this happen is blocked by a back end bug.
Issue 14063 - Add coverage enforcement for Phobos' posix.mak.

burner · 2015-04-10T11:03:43Z

@WalterBright I know all that and it is all true, I only have a problem with the wording "test coverage".
My theoretical CS Prof would kill me, and rightfully so, if I would call -cov test coverage. How about "100% of the code gets executed by the unittests." Or short "100%CE"

JakobOvrum · 2015-04-10T11:11:34Z

I suspect I'm the only Phobos developer using -cov.

Like I said before, you're not.

WalterBright · 2015-04-10T18:40:34Z

@burner BTW, it is unnecessarily provocative to call people liars because your professor disagrees with usage of a term. I've been calling execution of lines "test coverage" for 30 years, and the tools that do it are called "coverage analyzers".

https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro

You're the first to complain about it. I think you're going to have an uphill battle with it. I'm not going to change the way I use the term, even if your professor sends me a strongly worded letter :-)

WalterBright · 2015-04-10T18:42:20Z

e.g. dmd -main - unittest -defaultlib= generated/linux/.../libphobos2.a -run std/string isn't something people will use.

I use:

dmd std/string -unittest -main -cov

Not that bad.

WalterBright · 2015-04-10T18:45:30Z

Like I said before, you're not.

That's good to hear. Please help me in changing the culture on this in Phobos - for a PR for a new function, ask if it has 100% unittest coverage.

MartinNowak · 2015-04-10T19:12:44Z

dmd std/string -unittest -main -cov

Tried that, doesn't work most of the time, I'll work on a posix.mak integration.

dmd std/typecons -unittest -main -cov
typecons.o:__main.d:function _D3std8typecons34__T8NullableTS3std4json9JSONValueZ8Nullable11__xopEqualsFKxS3std8typecons34__T8NullableTS3std4json9JSONValueZ8NullableKxS3std8typecons34__T8NullableTS3std4json9JSONValueZ8NullableZb: error: undefined reference to '_D3std4json9JSONValue8opEqualsMxFKxS3std4json9JSONValueZb'
typecons.o:__main.d:function _D3std4conv52__T7emplaceTC3std8typecons19__unittestL5606_102FZ1AZ7emplaceFAvZC3std8typecons19__unittestL5606_102FZ1A: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
typecons.o:__main.d:function _D3std4conv54__T7emplaceTC3std8typecons19__unittestL5606_102FZ1ATiZ7emplaceFAvKiZC3std8typecons19__unittestL5606_102FZ1A: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
typecons.o:__main.d:function _D3std4conv53__T7emplaceTC3std8typecons19__unittestL5650_103FZ2C0Z7emplaceFNaNbNiAvZC3std8typecons19__unittestL5650_103FZ2C0: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
typecons.o:__main.d:function _D3std4conv53__T7emplaceTC3std8typecons19__unittestL5650_103FZ2C1Z7emplaceFNaNbNiAvZC3std8typecons19__unittestL5650_103FZ2C1: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
collect2: error: ld returned 1 exit status
--- errorlevel 1

MartinNowak · 2015-04-10T19:17:04Z

std/string.d

How about using codeLength?

It's an internal function, not part of the api. Please, let's not bikeshed trivia.

I don't quite agree that this is trivial. Your code duplicates an artifact which we already have in Phobos, complete with its own unit tests (although they are not very extensive). Adding another copy of it increases maintenance cost for no good reason, especially since the function in question already uses symbols from std.utf.

And it's a public function, click on the documentation link.

sorry, I had thought you meant rename it.

WalterBright · 2015-04-10T19:22:08Z

Tried that,

I just tried it, too (on Windows). Worked fine. Coverage report is 89% coverage - rather poor.

MartinNowak · 2015-04-10T19:22:51Z

std/string.d

Usually the fastest loop is while (i < s.length) if (decode(s, i) == c) return i;, because it avoids the codeLength part, but it is only useable with random index strings. Maybe we can improve byDchar to provide optional iteration with index/counter.
Anyhow, you replaced the druntime based foreach decoding, so this is going to be faster.

I did not want to require the input range to be indexable.

I considered improving byDchar, but it's rather awkward, and gave up on the idea.

MartinNowak · 2015-04-10T19:28:58Z

Acceptance list

replace numberCodeUnits with std.utf.codeLength

MartinNowak · 2015-04-11T04:29:11Z

Auto-merge toggled on

MartinNowak · 2015-04-11T04:29:30Z

Thx

Enhance std.string.indexOf() to work with ranges

schuetzm · 2015-04-15T14:07:31Z

This introduced a regression in vibe.d: vibe-d/vibe.d#1071

Fixed-sized arrays aren't accepted any longer, because they don't match isInputRange. IMO it's anyway better to slice such an array explicitly to make it clear that a reference could escape, but it's still a regression...

burner · 2015-04-15T15:27:02Z

@MartinNowak add -cov "temporally" to the build flags of the *.test target. that is quite convenient.

dlang/phobos#3172

WalterBright · 2015-04-15T16:58:19Z

Fixed size arrays as template arguments also lead to template bloat, as every different sized array results in a different instantiation.

yebblies · 2015-04-15T18:55:33Z

Fixed size arrays as template arguments also lead to template bloat, as every different sized array results in a different instantiation.

But not in this case.

MartinNowak · 2015-04-15T20:52:18Z

It needs to be fixed anyhow, should be possible without template bloat.

WalterBright · 2015-04-16T07:07:13Z

It needs to be fixed anyhow

Sounds like a job for Bugzilla!

MartinNowak · 2015-04-17T13:09:28Z

There you go, Issue 14455 – [Reg 2.068-devel] std.string.indexOf no longer accepts static arrays. Please follow up on this.

schuetzm · 2015-04-17T13:42:33Z

@MartinNowak:
#3191 is already merged. Should it have been made against stable instead of master? Do you want me to create a cherry-pick PR?

MartinNowak · 2015-04-18T00:32:43Z

No, the bug isn't in stable, so the fix doesn't go there either.

burner reviewed Apr 10, 2015
View reviewed changes

std/string.d Outdated

Copy link

Member

burner Apr 10, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search"es" for "a" character in "a" range.

WalterBright force-pushed the indexOf-Range branch from 399177c to 9d06e20 Compare April 10, 2015 08:32

burner reviewed Apr 10, 2015
View reviewed changes

WalterBright force-pushed the indexOf-Range branch from 9d06e20 to 6577e1e Compare April 10, 2015 08:59

MartinNowak reviewed Apr 10, 2015
View reviewed changes

WalterBright force-pushed the indexOf-Range branch from 6577e1e to 9ab3acc Compare April 10, 2015 22:28

Enhance std.string.indexOf() to work with ranges

9ab3acc

MartinNowak added a commit that referenced this pull request Apr 11, 2015

Merge pull request #3172 from WalterBright/indexOf-Range

fab8708

Enhance std.string.indexOf() to work with ranges

MartinNowak merged commit fab8708 into dlang:master Apr 11, 2015

WalterBright deleted the indexOf-Range branch April 11, 2015 07:55

schuetzm mentioned this pull request Apr 15, 2015

More compatibility fixes for changes in DMD & Phobos vibe-d/vibe.d#1071

Merged

schuetzm added a commit to schuetzm/vibe.d that referenced this pull request Apr 15, 2015

Workaround for regression in Phobos

d1c753f

dlang/phobos#3172

schuetzm mentioned this pull request Apr 16, 2015

Re-add overload for fixed-size arrays to std.string.indexOf #3191

Merged

MartinNowak added changelog_v2.068 labels Aug 9, 2015

Uh oh!

Enhance std.string.indexOf() to work with ranges #3172

Enhance std.string.indexOf() to work with ranges #3172

Uh oh!

Conversation

WalterBright commented Apr 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

burner commented Apr 10, 2015

Uh oh!

WalterBright commented Apr 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

burner commented Apr 10, 2015

Uh oh!

WalterBright commented Apr 10, 2015

Uh oh!

WalterBright commented Apr 10, 2015

Uh oh!

burner commented Apr 10, 2015

Uh oh!

WalterBright commented Apr 10, 2015

Uh oh!

WalterBright commented Apr 10, 2015

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

burner commented Apr 10, 2015

Uh oh!

JakobOvrum commented Apr 10, 2015

Uh oh!

WalterBright commented Apr 10, 2015

Uh oh!

WalterBright commented Apr 10, 2015

Uh oh!

WalterBright commented Apr 10, 2015

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WalterBright commented Apr 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MartinNowak commented Apr 10, 2015

Uh oh!

MartinNowak commented Apr 11, 2015

Uh oh!

MartinNowak commented Apr 11, 2015

Uh oh!

schuetzm commented Apr 15, 2015

Uh oh!

burner commented Apr 15, 2015

Uh oh!

WalterBright commented Apr 15, 2015

Uh oh!

yebblies commented Apr 15, 2015

Uh oh!

MartinNowak commented Apr 15, 2015

Uh oh!

WalterBright commented Apr 16, 2015