-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve quality of coverage reports by parsing source files for functions #36
Conversation
You would also have to turn off inlining, but this might be a step forward. At least it would give a more honest percentage of coverage results. |
We can just do that with |
@timholy, with the second bullet point, does that apply to only functions that were not caught before, or does it apply to all functions? |
I'd like to try this out on JuMP, it seems like a great step forward. |
Looking at it a bit more, unfortunately the line numbering is borked. I think this needs to be fixed in JuliaParser. |
OK, I think I've fixed both of the problems above. I hacked around the line number problem by first finding the file position corresponding to the beginning of each new source line, and then using that to keep track of where I am in the file. And it seems that with proper line number accounting, my concerns about |
One remaining problem is: what to do about whole files that lack tests? In that case, there is no EDIT: I decided to include them in the coverage results |
This also limits testing to *.jl files, and disables the `rethrow` in case of error. The problem is that if the user has a broken *.jl file sitting in that directory, JuliaParser will complain about it.
AFAICT, this will never pass travis; it seems that Pkg uses the old |
+1000 This is a vastly better state of affairs than previously. I'd stopped looking at coverage results, since adding a single test gave you a 90% coverage statistics rate. |
@timholy couldn't you (as a bandaid) Pkg.add JuliaParser in the .travis.yml. This is going to be ace!! The |
@hayd, once this gets merged that change wouldn't be necessary anymore. So I'm not sure it's worth it. Yes, there absolutely are subtleties (or really, not-so-subtleties...) that this does not handle well. I don't currently have any great ideas about how to take the next step, though. It seems you want some kind of summary number. |
Getting good coverage results, or any notion of "coverage percentage", is going to be difficult in any language that does runtime code generation. I think we just have to look at this as a flawed, but useful metric. Even Stefan's suggestion, instrumenting basic blocks, only will work for code that has actually been generated. This is complicated by things that generate code at runtime, an example being stagedfunctions. @timholy, I don't know what packages you have used but maybe we should agree to a non-trivial one to test on. Glancing at the code, doesn't this have issues with functions that are not at the toplevel of the file. ex. begin
func() = ...
end or module Foo
module Bar
func() = .....
end
end I'll look into making JuliaParser easier to use for this application. There have been some regressions in terms of compatibility with base during the 0.4 dev period. |
My earlier worries about non-toplevel functions were erased by later commits---basically I think this is perfect 😄. Aside from generated code, the one exception I know of is this construction:
The reason is that because of the
|
Oh, regarding your question about testing on a package; my email to julia-dev tested two "mystery" packages, but both have a nontrivial amount of code (>2k lines each, one was over 4k). |
BTW, I've removed the "RFC"; I'm not planning on pushing any more patches. I do wonder about hooking this into PackageEval and seeing if we can come up with coverage fractions for all registered packages. That could happen either before or after merging this. |
LGTM |
Improve quality of coverage reports by parsing source files for functions
I think getting coverage numbers in PackageEval is definitely possible. |
@timholy can you tag a new version of Coverage when you are ready? I have limited internet access |
Hmm, I just did a push on Images, and indeed it modified my coverage stats: https://coveralls.io/builds/1701691. (Alternative: JuliaImages/Images.jl#243 (comment)) I expect it to go down, but not to 0%. Seems like something is wrong. The part I can't understand is that the first element in that array of Dicts looks like this: which seems entirely reasonable. Any insights about why that file nevertheless is scored as having 0% coverage on Coveralls? |
Here's a gist of the JSON-encoded output: https://gist.githubusercontent.com/timholy/e5c42bf97adee7ee4333/raw/36edd536817030d278d840976936d11e327504bd/JSON-parsed%20coverage%20stats |
Oh, actually, Images has had broken coverage data for a long time: the last good one I found dated from August 21, although there must have been a more recent one because the badge is a different number. https://coveralls.io/r/timholy/Images.jl?page=14 Dunno what's going on, anyone have a clue? |
OK, I'm getting reports for HDF5, so clearly it's something specific to Images. It would be interesting to debug, but I'm not sure how to go about that. |
This alters---and perhaps improves---our ability to quantify coverage. It leverages @jakebolewski's JuliaParser.jl to extract the beginning/end line range of "simple" function expressions. It then amends the coverage information by inserting 0 instead of
nothing
for functions that were never compiled.This is not perfect. Here are deficits that I'm aware of currently:
for f in (:min, :max, ...) @eval begin...
style block (this error will contribute to overly-optimistic estimates of coverage)end
statements and the initialfunction f(...)
line in a multi-line function (this error will contribute to overly-pessimistic estimates of coverage)There is also the genuine risk of the line count getting "out of sync", since the logic for extracting line numbers is not entirely trivial. I haven't seen it yet, but frankly I've only tested against a single source file.
I suspect both of these deficits may be fixable, so I am not certain this is yet worth merging. OTOH, I am not entirely certain I want to dump a huge amount of effort into this; if this is viewed as a step forward, perhaps we should go for it. But thanks to Jake it was surprisingly easy to get this far; maybe it wouldn't be so much harder to fix one or both of these problems (but I don't have any good ideas yet).