Skip to content

Conversation

@WalterBright
Copy link
Member

http://dlang.org/phobos/std_string.html#.indexOf

Currently it only works with string inputs.

It still throws on invalid UTF.

std/string.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search"es" for "a" character in "a" range.

@burner
Copy link
Member

burner commented Apr 10, 2015

I'm not sure about the merit of this function. I wonder why you would search for a index of a char in a ,maybe even temporary, range. Also this function is then basically the stop for every range chain.

Performance: have you checked the performance before and after? I hate to be pushy here but #2995

@WalterBright
Copy link
Member Author

I updated it to remove the throwing and memory allocation. The documentation now says it assumes correct UTF. It assumed that before, throwing only on some invalid UTF.

The merit of this function is kind of irrelevant, it exists and we can't remove it. All the algorithms need to work with ranges.

I haven't benchmarked it, but I know it does less work because it doesn't decode unless necessary, whereas the former version always decoded.

std/string.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is going on here? is there no std.(uni|utf) function to do this?

I suppose this qualifies as magic numbers. I would at least like to see a comment about what happens here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. nope
  2. you'll see those numbers all over std.utf :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. that is just sad

so what actually happens here

@burner
Copy link
Member

burner commented Apr 10, 2015

The merit of this function is kind of irrelevant, it exists and we can't remove it. All the algorithms need to work with ranges.

This should be written in the vision document

maybe it doesn't always decode, but you added control-flow branches. you never know before you didn't benchmark

@WalterBright
Copy link
Member Author

Those branches exist in the auto-decoder.

@WalterBright
Copy link
Member Author

Updated with near 100% test coverage.

@burner
Copy link
Member

burner commented Apr 10, 2015

Maybe I'm not clear. I'm not trying to argue that your code is bad, slow or didn't pay attention. What I'm trying to do is to say that you can only be sure about the performance if you benchmark. Which is the reason I created #2995. Even if you looked at the asm, you never know what the branch prediction of the cpu does.

Rant On

100% test coverage is a lie. Have you tested all possible combinations of all possible branches. If not, this is not 100% test coverage. This is maybe: "100% of the code was executed by the tests"

Rant Off

@WalterBright
Copy link
Member Author

This should be written in the vision document

"We aim to make the standard library usable in its entirety without a garbage collector."
http://wiki.dlang.org/Vision/2015H1

We've been jawboning for years about getting this done and nothing has happened. I'm getting it done now.

@WalterBright
Copy link
Member Author

What I'm trying to do is to say that you can only be sure about the performance if you benchmark.

100% sure, you are correct. But I've been around the block enough to know that less work == faster, nearly always. Also, this PR is not about making it faster. It's about that vision thing.

100% test coverage is a lie. Have you tested all possible combinations of all possible branches. If not, this is not 100% test coverage. This is maybe: "100% of the code was executed by the tests"

I use -cov to check the coverage. I know that doesn't prove all the logic is correct. It only proves that all lines of code were executed. That's what I meant by "coverage". It's only possible to test a function for all possible input for trivial functions.

On the other hand, my experience for decades with 100% test coverage (of lines executed) correlates strongly with very few bugs discovered later. It's a practical proxy.

If you run the Phobos unittests with -cov, you'll also discover that the line coverage is rather poor. I suspect I'm the only Phobos developer using -cov.

@MartinNowak
Copy link
Member

I suspect I'm the only Phobos developer using -cov.

We have to make it more accessible, e.g. dmd -main - unittest -defaultlib= generated/linux/.../libphobos2.a -run std/string isn't something people will use. The other important part it, that pull requests should only improve coverage. This could be enforced by the autotester.
If you want that to happen, it needs to be a mechanical requirement.

@MartinNowak
Copy link
Member

The bugzilla ticket to make this happen is blocked by a back end bug.
Issue 14063 - Add coverage enforcement for Phobos' posix.mak.

@burner
Copy link
Member

burner commented Apr 10, 2015

@WalterBright I know all that and it is all true, I only have a problem with the wording "test coverage".
My theoretical CS Prof would kill me, and rightfully so, if I would call -cov test coverage. How about "100% of the code gets executed by the unittests." Or short "100%CE"

@JakobOvrum
Copy link
Contributor

I suspect I'm the only Phobos developer using -cov.

Like I said before, you're not.

@WalterBright
Copy link
Member Author

@burner BTW, it is unnecessarily provocative to call people liars because your professor disagrees with usage of a term. I've been calling execution of lines "test coverage" for 30 years, and the tools that do it are called "coverage analyzers".

https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro

You're the first to complain about it. I think you're going to have an uphill battle with it. I'm not going to change the way I use the term, even if your professor sends me a strongly worded letter :-)

@WalterBright
Copy link
Member Author

e.g. dmd -main - unittest -defaultlib= generated/linux/.../libphobos2.a -run std/string isn't something people will use.

I use:

dmd std/string -unittest -main -cov

Not that bad.

@WalterBright
Copy link
Member Author

Like I said before, you're not.

That's good to hear. Please help me in changing the culture on this in Phobos - for a PR for a new function, ask if it has 100% unittest coverage.

@MartinNowak
Copy link
Member

dmd std/string -unittest -main -cov

Tried that, doesn't work most of the time, I'll work on a posix.mak integration.

dmd std/typecons -unittest -main -cov
typecons.o:__main.d:function _D3std8typecons34__T8NullableTS3std4json9JSONValueZ8Nullable11__xopEqualsFKxS3std8typecons34__T8NullableTS3std4json9JSONValueZ8NullableKxS3std8typecons34__T8NullableTS3std4json9JSONValueZ8NullableZb: error: undefined reference to '_D3std4json9JSONValue8opEqualsMxFKxS3std4json9JSONValueZb'
typecons.o:__main.d:function _D3std4conv52__T7emplaceTC3std8typecons19__unittestL5606_102FZ1AZ7emplaceFAvZC3std8typecons19__unittestL5606_102FZ1A: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
typecons.o:__main.d:function _D3std4conv54__T7emplaceTC3std8typecons19__unittestL5606_102FZ1ATiZ7emplaceFAvKiZC3std8typecons19__unittestL5606_102FZ1A: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
typecons.o:__main.d:function _D3std4conv53__T7emplaceTC3std8typecons19__unittestL5650_103FZ2C0Z7emplaceFNaNbNiAvZC3std8typecons19__unittestL5650_103FZ2C0: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
typecons.o:__main.d:function _D3std4conv53__T7emplaceTC3std8typecons19__unittestL5650_103FZ2C1Z7emplaceFNaNbNiAvZC3std8typecons19__unittestL5650_103FZ2C1: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
collect2: error: ld returned 1 exit status
--- errorlevel 1

std/string.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using codeLength?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal function, not part of the api. Please, let's not bikeshed trivia.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite agree that this is trivial. Your code duplicates an artifact which we already have in Phobos, complete with its own unit tests (although they are not very extensive). Adding another copy of it increases maintenance cost for no good reason, especially since the function in question already uses symbols from std.utf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's a public function, click on the documentation link.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I had thought you meant rename it.

@WalterBright
Copy link
Member Author

Tried that,

I just tried it, too (on Windows). Worked fine. Coverage report is 89% coverage - rather poor.

std/string.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the fastest loop is while (i < s.length) if (decode(s, i) == c) return i;, because it avoids the codeLength part, but it is only useable with random index strings. Maybe we can improve byDchar to provide optional iteration with index/counter.
Anyhow, you replaced the druntime based foreach decoding, so this is going to be faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to require the input range to be indexable.

I considered improving byDchar, but it's rather awkward, and gave up on the idea.

@MartinNowak
Copy link
Member

Acceptance list

  • replace numberCodeUnits with std.utf.codeLength

@MartinNowak
Copy link
Member

Auto-merge toggled on

@MartinNowak
Copy link
Member

Thx

MartinNowak added a commit that referenced this pull request Apr 11, 2015
Enhance std.string.indexOf() to work with ranges
@MartinNowak MartinNowak merged commit fab8708 into dlang:master Apr 11, 2015
@WalterBright WalterBright deleted the indexOf-Range branch April 11, 2015 07:55
@schuetzm
Copy link
Contributor

This introduced a regression in vibe.d: vibe-d/vibe.d#1071

Fixed-sized arrays aren't accepted any longer, because they don't match isInputRange. IMO it's anyway better to slice such an array explicitly to make it clear that a reference could escape, but it's still a regression...

@burner
Copy link
Member

burner commented Apr 15, 2015

@MartinNowak add -cov "temporally" to the build flags of the *.test target. that is quite convenient.

schuetzm added a commit to schuetzm/vibe.d that referenced this pull request Apr 15, 2015
@WalterBright
Copy link
Member Author

Fixed size arrays as template arguments also lead to template bloat, as every different sized array results in a different instantiation.

@yebblies
Copy link
Contributor

Fixed size arrays as template arguments also lead to template bloat, as every different sized array results in a different instantiation.

But not in this case.

@MartinNowak
Copy link
Member

It needs to be fixed anyhow, should be possible without template bloat.

@WalterBright
Copy link
Member Author

It needs to be fixed anyhow

Sounds like a job for Bugzilla!

@MartinNowak
Copy link
Member

There you go, Issue 14455 – [Reg 2.068-devel] std.string.indexOf no longer accepts static arrays. Please follow up on this.

@schuetzm
Copy link
Contributor

@MartinNowak:
#3191 is already merged. Should it have been made against stable instead of master? Do you want me to create a cherry-pick PR?

@MartinNowak
Copy link
Member

No, the bug isn't in stable, so the fix doesn't go there either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants