RFC: changing code idiom for more accurate coverage information #11792

ScottPJones · 2015-06-20T23:13:10Z

I've run into issues with the way the coverage tools only work on lines, and not basic blocks,
so that the very common idioms in Julia code of condition && action or condition || action mean that you won't be able to tell if the action part is covered.
Inspired by the tireless work of @kshyatt, which will help make Julia more reliable, I was thinking of
going through the code in Base (with a simple file-query-regex-replace) and changing all occurrences of:

    condition && statement # esp. things like throw, return, etc.
or
    condition || statement

to either

    condition &&
        statement
and respectively
    condition ||
        statement
or
    !(condition) &&
        statement

or

    if condition
        statement
    end
and respectively
    if !(condition)
        statement
    end

I think we need to talk about what the best idiom for use in Julia code is, taking into account the issues of coverage accuracy, conciseness, and ease of understanding the code (esp. for newcomers).
See #11782, and #11735

The text was updated successfully, but these errors were encountered:

tkelman · 2015-06-21T03:18:12Z

This would likely end up being a large PR that would be highly likely to go conflict-stale quickly, but I do think now that we have regular coverage measurements it would be a good idea to make it easier to see which error paths are or are not currently tested throughout base. Would be nice to have better coverage tools that can be more sophisticated than line-based ones, but that's a harder problem than doing a mass aesthetics-only refactoring. My preference is for the if - end form, it's simpler to understand for people who aren't used to Julia idioms yet and the ends aren't a big deal.

quinnj · 2015-06-21T03:20:12Z

Maybe we need another push for the if x then y idiom to be actually
implemented as a better shorthand to x && y.

On Sat, Jun 20, 2015 at 9:18 PM, Tony Kelman notifications@github.com
wrote:

This would likely end up being a large PR that would be highly likely to
go conflict-stale quickly, but I do think now that we have regular coverage
measurements it would be a good idea to make it easier to see which error
paths are or are not currently tested throughout base. Would be nice to
have better coverage tools that can be more sophisticated than line-based
ones, but that's a harder problem than doing a mass aesthetics-only
refactoring. My preference is for the if - end form, it's simpler to
understand for people who aren't used to Julia idioms yet and the ends
aren't a big deal.

—
Reply to this email directly or view it on GitHub
#11792 (comment).

ScottPJones · 2015-06-21T03:22:31Z

So far, 3 votes for if - end form, @kshyatt, you, and myself, none for a 2-line && or || form.
Why would it go conflict-stale quickly? It's the sort of thing I'd do during a single TV show...
it goes into a PR, gets tested, and hopefully somebody merges it quickly.

ScottPJones · 2015-06-21T03:23:28Z

@quinnj if x then y wouldn't help the problem I'm trying to address here... inaccurate coverage info.

ScottPJones · 2015-06-21T03:24:25Z

(I don't have anything against it, just wouldn't use it until there were a basic block coverage tool)

tkelman · 2015-06-21T03:25:15Z

Just because it would touch a large number of lines of code (in a trivial way sure, but git doesn't care about that), and because of the

hopefully

It should still be looked over manually just to be absolutely sure your automated refactoring worked correctly. Maybe better as separate small chunks instead of massively all at once. Definitely split it into a few separate commits at least, even if you put them in the same PR.

ScottPJones · 2015-06-21T03:26:15Z

Yes, I was thinking of doing it as a bunch of separate commits, or even PRs... right now, I just want to know which idiom people would prefer.

carnaval · 2015-06-21T03:33:36Z

Well, it seems a bit strange to me to butcher the codebase because coverage tools had this dumb idea that line numbers are a decent approximation of program control flow. I mean, I'm all for being practical and make this change where needed, but let's not make all the code ugly just because the tooling sucks.

It would seem better to me to put this effort into making a better coverage tool, e.g., based on expressions evaluated instead of lines.

Again, in the meantime, if it helps people make tests better I'm fine with it being applied sparsely, I just don't want for it to be a systematic change. It strikes me as strange to have constructs in the language that are "considered bad". Either you have it and it's great, or you don't.

IainNZ · 2015-06-21T03:52:27Z

As an aside, I've learnt that not all coverage reporting tools have this line-equals-meaning problem. So Coveralls, the one we are using essentially just because it was relatively popular and I was the first one to bother to create the interface to link our coverage results to it, doesn't support it AFAIK. However https://codecov.io/ does support partial line coverage, which suggests its achievable with tooling on our end. Of course, we don't really need any of these services anyway - they just provide a nice way to automatically process the raw output in a more easy-to-read form. I'm in Camp @carnaval, don't think there should be a systematic change for this right now but doing it as other changes happen seem reasonable, and work on partial line coverage reporting would be even better.

ScottPJones · 2015-06-21T04:38:31Z

Here's the problem: The current condition && xxx idiom (which I don't have a problem with), combined with whatever is producing the .cov files when you use --code-coverage, does not show the coverage for what are probably some of the most important parts of the code... error cases that should be exercised via the unit tests.
@carnaval - @kshyatt is already systematically changing things like that, at least in the linalg code, so that the coverage info is accurate (and also adding much better error messages).

ScottPJones · 2015-06-21T04:40:31Z

@IainNZ I don't think Coveralls has anything to do with it... it's not producing the .cov files.

timholy · 2015-06-21T14:43:19Z

The point still stands. We currently use Coveralls to visualize (and quantify) the results, and it's limited to lines. But if there are services that are more flexible, then the main bottleneck is on our side and therefore something we could fix. It would be interesting to hear what @Keno and others think of the feasibility of doing something like this---does any of the newer LLVM goodness make this easier? Here's a chunk of relevant code:

julia/src/codegen.cpp

Lines 1138 to 1158 in cb77503

    
           static void coverageVisitLine(std::string filename, int line) 
        
           { 
        
               if (filename == "" || filename == "none" || filename == "no file") 
        
                   return; 
        
               logdata_t::iterator it = coverageData.find(filename); 
        
               if (it == coverageData.end()) { 
        
                   coverageData[filename] = std::vector<GlobalVariable*>(0); 
        
               } 
        
               std::vector<GlobalVariable*> &vec = coverageData[filename]; 
        
               if (vec.size() <= (size_t)line) 
        
                   vec.resize(line+1, NULL); 
        
               if (vec[line] == NULL) { 
        
                   vec[line] = addComdat(new GlobalVariable(*jl_Module, T_int64, false, 
        
                                                            GlobalVariable::InternalLinkage, 
        
                                                            ConstantInt::get(T_int64,0), "lcnt")); 
        
               } 
        
               GlobalVariable *v = prepare_global(vec[line]); 
        
               builder.CreateStore(builder.CreateAdd(builder.CreateLoad(v), 
        
                                                     ConstantInt::get(T_int64,1)), 
        
                                   v); 
        
           }

carnaval · 2015-06-21T14:48:24Z

As I said, when this change is practically useful (i.e., when you actually add the tests that go with it, and probably not tests just for this case unless the function is well covered already), I'm all for it. Hell, if @kshyatt said that a single character per line makes writing tests easier I'd be fine with it (although I hope it won't come to this :-)).

However, it does not mean we cannot admit that this is a failure of the tooling and not the language construct itself. Changing the whole base without even adding the tests that go with it seems like going overboard (and declaring defeat) to me. My nightmare scenario goes something likes this : we change the whole base, when random contributor 133 comes in with this construct in the PR, someone (rightfully) says "please don't use this to stay consistent with the rest of the code", and this happens for a few PR. Then, someone, (rightfully again), being bored of repeating this puts it in the documentation somewhere "the && style of control flow should not be used because random coverage mumbling". There you go, you now have a perfectly valid language construct that is second-class and "you should not use it because best practice" although it's only really a failure of implementation.

The problem is that those things have a tendency to be long lived, and I'd bet you a beer of your choice that a year later you'd still have people somewhere not using && for errors because they read somewhere it was "bad".

Ok, aside from my day dreaming, what is really needed here is to use coverage at the lowered level of the code : regular (= non exceptional) control flow is linearized so at least you have this level of precision because you are closer to the actual program semantic, not this text lines things we petty humans use to input code :-)

There would still be some issues where f(may_throw()) would be reported as tested even if it threw the exception but that's still better than what we have now since it handles && (or any tricks like putting everything on a single lines with semicolons).

I don't know how our coverage stuff works so maybe it's already operating at that level and there is a bug, or the output is just not taking it into account fully.

ScottPJones · 2015-06-21T14:49:40Z

@timholy I had raised the issue earlier, and had been told that line coverage was all that could be done.
I'd much prefer having the tools changed, because it isn't just in these easy to change situations with throw and return, but rather any conditional code on a single line.

carnaval · 2015-06-21T15:00:36Z

Ok so looking at the code (

julia/src/codegen.cpp

Line 4752 in cb77503

coverageVisitLine(filename, lno);

), it is emitting a coverage slot only when encountering a line number ast node. It should probably emit a slot for every statement in the function, linking those back to the current line number (= last line node encountered). Then, if the coverage tool supports it, you say that a line was only visited X% if only X% of the slots referring to that line are visited by the program.

I think experimenting in that direction is a much better use of energy that going around changing all the base error checking at once :-)

ScottPJones · 2015-06-21T15:01:47Z

@carnaval That was precisely my point on when I commented on @kshyatt's #11518.
Note, that PR was merged without any controversy, and it basically does the same thing as this PR,
and doesn't add any tests. I don't see why I (or Kate) should be responsible for adding all the tests, that really should be the responsibility of the author(s) of the code in question.
@StefanKarpinski made these comments:

+1 – even though I like the terse one-line idiom, having clear, accurate coverage metrics is more important. Also these error messages tend to be long, so having them on a separate line is helpful.

and

There's nothing we can do about coverage tools having that problem – as long as the external services are line-based, they're going to have that problem. I would very much like to have basic-block-based coverage analysis, however.

ScottPJones · 2015-06-21T15:11:16Z

@carnaval

I'm concerned about how well covered the code in base is (just as Kate seems to be)
I don't have the knowledge (yet) to do anything about the lack of basic block support in the coverage output (it seems like you do though!) (although it shouldn't be a slot per statement, it needs to have a slot per basic block [and I thought with a few exceptions, such as return, using, and import julia was just expressions, not statements (or so Stefan IIRC said)
Multiple people have said that using an if was actually more understandable (see Stefan's comment above, a few others have also said the same)
Experimenting in the direction of improving coverage analysis would be a LOT more energy for me than simply using file-query-regex-replace while I'm watching the TV with my wife! 😀 If you (or somebody else with experience in the area) have the energy to do so, that would absolutely fantastic,
and that is what I originally asked for in RFC: prettify single line throws/returns and add more descriptive errors for linalg #11518. However, that doesn't detract from the value of getting better coverage information NOW, especially in parts of the code that have been implicated in performance issues/regressions. The more information we have, the better, IMO.
Who said anything about changing all of the base error checking at once??? 😀 I don't watch that much TV!

ScottPJones · 2015-06-21T15:15:42Z

Also, I agree totally with you that this shouldn't ever be put in the documentation as the recommended way to do things... if the question comes up (before the coverage tools are improved), I'd just say clearly,
for now only, you need to do this to get better coverage info, but that that situation will be changing hopefully soon.

carnaval · 2015-06-21T15:22:38Z

Don't make me say something I didn't, I'm not saying anyone was responsible for writing tests
Changing all with a regexp is precisely what I don't want. I would want this to only be done where we (you, @kshyatt, me, whoever wants to) actually plan to improve testing coverage.
The if being more understandable is unrelated, wherever it is more understandable it should be used, if it happens to be more understandable everywhere then we simply remove short circuit (but I'm sure that's not the case).
We do need a slot per statement because of exceptions which are not modeled by llvm basic blocks. So a();b();c();d() is a single BB but can abort in the middle. return is a statement, everything have it's own line in code_lowered is a statement.
If it is what it takes to avoid a full blown polemic on whether short circuit is a good thing in itself I'll try to find time to make this work better, say at the juliacon hackathon or something

ScottPJones · 2015-06-21T15:48:19Z

OK, but you had said this:

when this change is practically useful (i.e., when you actually add the tests that go with it, and probably not tests just for this case unless the function is well covered already), I'm all for it.

It sounded like, in contrast to RFC: prettify single line throws/returns and add more descriptive errors for linalg #11518, you felt this change to abstractarray.jl should only go in if I added tests for all of the cases... that does seem like you felt I should be responsible for it (at least, if I want to have this get merged). I did use this immediately to test coverage of abstractarray.jl, and found that it's almost not covered at all... I don't know just how to get those methods called to improve the testing myself. Do you want to see my .cov file?
The if being more understandable was related, in that I also could have used a 2-line short circuit just as well, which is why I created an RFC issue to sound out which one people liked better.
OK, in the COS compiler I worked on, do a(),b(),c(),d() would be considered 4 BBs, for the same reason, you can't tell what the side effects are, like exceptions, changing global variables, etc.
I'm surprised that it is considered a single BB here, for the same reason.
Now you see my evil plan, I've managed to annoy a really smart person into fixing the problem I'd raised! 😀

mbauman · 2015-06-21T18:13:33Z

+1 for not doing this systematically, and instead doing it on a case by case basis driven by active work in improving coverage. There's hope that coverage tooling may improve in the future so let's not churn the waters unnecessarily.

JeffBezanson · 2015-06-21T19:12:43Z

Then, if the coverage tool supports it, you say that a line was only visited X% if only X% of the slots referring to that line are visited by the program.

Sounds pretty good to me. I'd much rather do this than start picking bits of language syntax to effectively ban.

ScottPJones added a commit to ScottPJones/julia that referenced this issue Jun 21, 2015

Address JuliaLang#11792 in AbstractArray.jl

54e7713

JeffBezanson closed this as completed Jun 21, 2015

carnaval mentioned this issue Jun 21, 2015

WIP More precise coverage #11802

Closed

ScottPJones mentioned this issue Jun 21, 2015

Improve error messages in abstractarray.jl #11797

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: changing code idiom for more accurate coverage information #11792

RFC: changing code idiom for more accurate coverage information #11792

ScottPJones commented Jun 20, 2015

tkelman commented Jun 21, 2015

quinnj commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

tkelman commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

carnaval commented Jun 21, 2015

IainNZ commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

timholy commented Jun 21, 2015

carnaval commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

carnaval commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

carnaval commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

mbauman commented Jun 21, 2015

JeffBezanson commented Jun 21, 2015

RFC: changing code idiom for more accurate coverage information #11792

RFC: changing code idiom for more accurate coverage information #11792

Comments

ScottPJones commented Jun 20, 2015

tkelman commented Jun 21, 2015

quinnj commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

tkelman commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

carnaval commented Jun 21, 2015

IainNZ commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

timholy commented Jun 21, 2015

carnaval commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

carnaval commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

carnaval commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

mbauman commented Jun 21, 2015

JeffBezanson commented Jun 21, 2015