-
-
Notifications
You must be signed in to change notification settings - Fork 746
Enhance std.string.indexOf() to work with ranges #3172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
std/string.d
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search"es" for "a" character in "a" range.
|
I'm not sure about the merit of this function. I wonder why you would search for a index of a char in a ,maybe even temporary, range. Also this function is then basically the stop for every range chain. Performance: have you checked the performance before and after? I hate to be pushy here but #2995 |
399177c to
9d06e20
Compare
|
I updated it to remove the throwing and memory allocation. The documentation now says it assumes correct UTF. It assumed that before, throwing only on some invalid UTF. The merit of this function is kind of irrelevant, it exists and we can't remove it. All the algorithms need to work with ranges. I haven't benchmarked it, but I know it does less work because it doesn't decode unless necessary, whereas the former version always decoded. |
std/string.d
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is going on here? is there no std.(uni|utf) function to do this?
I suppose this qualifies as magic numbers. I would at least like to see a comment about what happens here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- nope
- you'll see those numbers all over std.utf :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- that is just sad
so what actually happens here
This should be written in the vision document maybe it doesn't always decode, but you added control-flow branches. you never know before you didn't benchmark |
9d06e20 to
6577e1e
Compare
|
Those branches exist in the auto-decoder. |
|
Updated with near 100% test coverage. |
|
Maybe I'm not clear. I'm not trying to argue that your code is bad, slow or didn't pay attention. What I'm trying to do is to say that you can only be sure about the performance if you benchmark. Which is the reason I created #2995. Even if you looked at the asm, you never know what the branch prediction of the cpu does. Rant On 100% test coverage is a lie. Have you tested all possible combinations of all possible branches. If not, this is not 100% test coverage. This is maybe: "100% of the code was executed by the tests" Rant Off |
"We aim to make the standard library usable in its entirety without a garbage collector." We've been jawboning for years about getting this done and nothing has happened. I'm getting it done now. |
100% sure, you are correct. But I've been around the block enough to know that less work == faster, nearly always. Also, this PR is not about making it faster. It's about that vision thing.
I use -cov to check the coverage. I know that doesn't prove all the logic is correct. It only proves that all lines of code were executed. That's what I meant by "coverage". It's only possible to test a function for all possible input for trivial functions. On the other hand, my experience for decades with 100% test coverage (of lines executed) correlates strongly with very few bugs discovered later. It's a practical proxy. If you run the Phobos unittests with -cov, you'll also discover that the line coverage is rather poor. I suspect I'm the only Phobos developer using -cov. |
We have to make it more accessible, e.g. dmd -main - unittest -defaultlib= generated/linux/.../libphobos2.a -run std/string isn't something people will use. The other important part it, that pull requests should only improve coverage. This could be enforced by the autotester. |
|
The bugzilla ticket to make this happen is blocked by a back end bug. |
|
@WalterBright I know all that and it is all true, I only have a problem with the wording "test coverage". |
Like I said before, you're not. |
|
@burner BTW, it is unnecessarily provocative to call people liars because your professor disagrees with usage of a term. I've been calling execution of lines "test coverage" for 30 years, and the tools that do it are called "coverage analyzers". https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro You're the first to complain about it. I think you're going to have an uphill battle with it. I'm not going to change the way I use the term, even if your professor sends me a strongly worded letter :-) |
I use: Not that bad. |
That's good to hear. Please help me in changing the culture on this in Phobos - for a PR for a new function, ask if it has 100% unittest coverage. |
Tried that, doesn't work most of the time, I'll work on a posix.mak integration. dmd std/typecons -unittest -main -cov
typecons.o:__main.d:function _D3std8typecons34__T8NullableTS3std4json9JSONValueZ8Nullable11__xopEqualsFKxS3std8typecons34__T8NullableTS3std4json9JSONValueZ8NullableKxS3std8typecons34__T8NullableTS3std4json9JSONValueZ8NullableZb: error: undefined reference to '_D3std4json9JSONValue8opEqualsMxFKxS3std4json9JSONValueZb'
typecons.o:__main.d:function _D3std4conv52__T7emplaceTC3std8typecons19__unittestL5606_102FZ1AZ7emplaceFAvZC3std8typecons19__unittestL5606_102FZ1A: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
typecons.o:__main.d:function _D3std4conv54__T7emplaceTC3std8typecons19__unittestL5606_102FZ1ATiZ7emplaceFAvKiZC3std8typecons19__unittestL5606_102FZ1A: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
typecons.o:__main.d:function _D3std4conv53__T7emplaceTC3std8typecons19__unittestL5650_103FZ2C0Z7emplaceFNaNbNiAvZC3std8typecons19__unittestL5650_103FZ2C0: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
typecons.o:__main.d:function _D3std4conv53__T7emplaceTC3std8typecons19__unittestL5650_103FZ2C1Z7emplaceFNaNbNiAvZC3std8typecons19__unittestL5650_103FZ2C1: error: undefined reference to '_D3std4conv16testEmplaceChunkFNaNbNiAvmmAyaZv'
collect2: error: ld returned 1 exit status
--- errorlevel 1 |
std/string.d
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using codeLength?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an internal function, not part of the api. Please, let's not bikeshed trivia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite agree that this is trivial. Your code duplicates an artifact which we already have in Phobos, complete with its own unit tests (although they are not very extensive). Adding another copy of it increases maintenance cost for no good reason, especially since the function in question already uses symbols from std.utf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it's a public function, click on the documentation link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I had thought you meant rename it.
I just tried it, too (on Windows). Worked fine. Coverage report is 89% coverage - rather poor. |
std/string.d
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually the fastest loop is while (i < s.length) if (decode(s, i) == c) return i;, because it avoids the codeLength part, but it is only useable with random index strings. Maybe we can improve byDchar to provide optional iteration with index/counter.
Anyhow, you replaced the druntime based foreach decoding, so this is going to be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not want to require the input range to be indexable.
I considered improving byDchar, but it's rather awkward, and gave up on the idea.
|
Acceptance list
|
6577e1e to
9ab3acc
Compare
|
Auto-merge toggled on |
|
Thx |
Enhance std.string.indexOf() to work with ranges
|
This introduced a regression in vibe.d: vibe-d/vibe.d#1071 Fixed-sized arrays aren't accepted any longer, because they don't match |
|
@MartinNowak add -cov "temporally" to the build flags of the *.test target. that is quite convenient. |
|
Fixed size arrays as template arguments also lead to template bloat, as every different sized array results in a different instantiation. |
But not in this case. |
|
It needs to be fixed anyhow, should be possible without template bloat. |
Sounds like a job for Bugzilla! |
|
There you go, Issue 14455 – [Reg 2.068-devel] std.string.indexOf no longer accepts static arrays. Please follow up on this. |
|
@MartinNowak: |
|
No, the bug isn't in stable, so the fix doesn't go there either. |
http://dlang.org/phobos/std_string.html#.indexOf
Currently it only works with string inputs.
It still throws on invalid UTF.