Skip to content

"java.lang.NumberFormatException" Race condition in coverage file #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RichardBradley opened this issue Mar 11, 2014 · 37 comments
Closed

Comments

@RichardBradley
Copy link
Contributor

I'm seeing intermittent errors like the following when running coverage reports with "scoverage" on Windows:

java.lang.NumberFormatException: For input string: "143911841188188"

I think this is because of a race condition in Invoker.invoked which is causing the "id" and the ";" to be written non-atomically.

This may only be a problem on Windows. Perhaps two adjacent small appends are atomic on POSIX but not on Windows?

I will submit a patch to consolidate the two writes into one. This seems to fix the issue on Windows, and it seems like a good idea on POSIX as well.

@RichardBradley
Copy link
Contributor Author

Oh, actually that hasn't fixed it. :(
That's the problem with intermittent bugs -- you never know for sure when they're fixed. I just got the following error from sbt scoverage:test with this patch applied (I'm reasonably sure this patch is applied):

[error] (scoverage-test:executeTests) java.lang.NumberFormatException: For input string: "14523368368"
...
at scoverage.IOUtils$.invoked(IOUtils.scala:13)     

I take it this doesn't happen on unix?

I'll look into options for atomic file writes on Windows...

@sksamuel
Copy link
Member

Might be windows new lines breaking it. Try stripping \r
On 11 Mar 2014 09:41, "Richard Bradley" notifications@github.com wrote:

Oh, actually that hasn't fixed it. :(
That's the problem with intermittent problems. I just got the following
error from sbt scoverage:test with this patch applied (I'm reasonably
sure this patch is applied):

error java.lang.NumberFormatException: For input string: "14523368368"
...
at scoverage.IOUtils$.invoked(IOUtils.scala:13)

I take it this doesn't happen on unix?

I'll look into options for atomic file writes on Windows...

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37277670
.

@RichardBradley
Copy link
Contributor Author

That's not it -- there's no whitespace in the file, just numbers and ";"s

On my machine, sometimes the numbers are interleaved with the ";"s:

Here's an extract:

 1452;14523368368;368;368;368;367;460;367;369;369;369;369;393;369;393;462;393;393;393;1025;1024;495;182;183;190;192;191;195;193;194;1134;1133;1023;1023;1023;1023;581;581;539;538;538;537;538;538;535;537;534;535;534;1023;535;;1023

Note the missing semicolons near the start, and the double semicolon near the end.

@sksamuel
Copy link
Member

Are you running tests in parallel ?
On 11 Mar 2014 09:58, "Richard Bradley" notifications@github.com wrote:

That's not it -- there's no whitespace in the file, just numbers and ";"s

On my machine, sometimes the numbers are interleaved with the ";"s:

Here's an extract:

1452;14523368368;368;368;368;367;460;367;369;369;369;369;393;369;393;462;393;393;393;1025;1024;495;182;183;190;192;191;195;193;194;1134;1133;1023;1023;1023;1023;581;581;539;538;538;537;538;538;535;537;534;535;534;1023;535;;1023

Note the missing semicolons near the start, and the double semicolon near
the end.

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37279047
.

@RichardBradley
Copy link
Contributor Author

Yes

@sksamuel
Copy link
Member

It wasn't designed to be thread safe. I guess we need to synchronize on
something in the writing code. I assume tests run in the same jvm at least.

On 11 March 2014 10:11, Richard Bradley notifications@github.com wrote:

Yes

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37280088
.

@RichardBradley
Copy link
Contributor Author

It wasn't designed to be thread safe.

I thought you might have been relying on short file appends being atomic.

I think we should be able to do something like that. I don't see why they are not atomic. Probably the java "FileWriter" isn't using the win32 api in the right way to get atomic appends.

I guess we need to synchronize on something in the writing code.

Yes, locking should also do it. It might kill my test throughput.

I assume tests run in the same jvm at least.

Yes (but if we got atomic file appends working, we wouldn't even need to rely on that).

@sksamuel
Copy link
Member

No one on linux has had issues but I don't know anyone who runs tests in
parallel. Most people disable it because of other issues they get with
other plugins. I had a google and apparently PrintWriter is atomic in the
code but its not guaranteed in the docs so might work might not. Do you
want to try that and if it works we can commit it.

On 11 March 2014 10:46, Richard Bradley notifications@github.com wrote:

It wasn't designed to be thread safe.

I thought you might have been relying on short file appends being atomic.

I think we should be able to do something like that. I don't see why they
are not atomic. Probably the java "FileWriter" isn't using the win32 api in
the right way to get atomic appends.

I guess we need to synchronize on something in the writing code.

Yes, locking should also do it. It might kill my test throughput.

I assume tests run in the same jvm at least.

Yes (but if we got atomic file appends working, we wouldn't even need to
rely on that).

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37282716
.

@RichardBradley
Copy link
Contributor Author

I just got the following error from sbt scoverage:test with this patch applied (I'm reasonably sure this patch is applied):

I was wrong. I changed the version number and did a "sbt publishLocal", but it seems it wasn't enough to get this patch to be used in my other project.

I'm trying various things to get this patch to be picked up so I can test this...

@RichardBradley
Copy link
Contributor Author

Right -- I've chased down all the different version numbers which need changing to test this patch, and I can confirm that neither the patch proposed here, nor a version using PrintWriter fix this on Windows.

I'm going to do some stress testing of Invoker.invoked in isolation and come up with a proper fix. It may need Java-level locking, if I can't get atomic appends to work on Windows.

@RichardBradley
Copy link
Contributor Author

I can't see any way around locking at the moment.
Despite http://stackoverflow.com/questions/3032482/is-appending-to-a-file-atomic-with-windows-ntfs it seems that write(byte[]) can still be interleaved on Windows:

val writer = new FileOutputStream(path, true)
val bytes = (id.toString + ';').getBytes(Charset.defaultCharset())
writer.write(bytes)
writer.close()

I'll ask on StackOverflow if anyone knows how to achieve atomic file append in Java and submit a pull request with a lock.

@RichardBradley
Copy link
Contributor Author

We should update the docs to note that using scoverage will introduce a global per-statement lock, so multi-threaded apps should test twice: once with coverage and once without (this is a sensible precaution anyway).

@sksamuel
Copy link
Member

Yeah might be better to do deferred writing somehow. A lock per statement
might be insane.
On 11 Mar 2014 13:34, "Richard Bradley" notifications@github.com wrote:

We should update the docs to note that using scoverage will introduce a
global per-statement lock, so multi-threaded apps should test twice: once
with coverage and once without (this is a sensible precaution anyway).

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37295253
.

@RichardBradley
Copy link
Contributor Author

How about a version which spawns a Future for each write and doesn't wait for the result?
I'm not 100% familiar with Scala's behaviour in that case.
Would the test runner need to wait for all the Futures to finish before generating the coverage report? How?

This version passes my load test:

def invoked(id: Int, path: String): Unit = {
  future {
    Invoker.synchronized {
      val writer = new FileWriter(path, true)
      writer.write(id.toString + ';')
      writer.close()
    }
  }
}

@RichardBradley
Copy link
Contributor Author

I suppose the main alternative is a threadsafe queue, with a worker thread pulling "invoke"s off the queue and writing them to disk. That sounds like a fair bit of functionality to write and test (and race conditions to consider etc.).

I had a look at log4j to see how it deals with this sort of issue -- it has a global lock on the logger file appender.

@sksamuel
Copy link
Member

The thing with log4j is the number of calls is a lot less. If we lock in the invoker then a statement like if (x) y else z we'll have 3 locks for just that one statement. I wonder if each thread can just write to its own file, and then the aggregator later can read them all in. What do you think.

@RichardBradley
Copy link
Contributor Author

I wonder if each thread can just write to its own file, and then the aggregator later can read them all in. What do you think.

That sounds pretty sensible. I'll see if I can code it up over the next few days.

Or how about a global concurrent Set[Statement] per JVM and write it out on process exit? Would that be better?

What do other coverage frameworks do?

@sksamuel
Copy link
Member

SCCT is the only other scala one, and that's not thread safe either IIRC. Things like jacoco instrument the bytecode by attaching to the JVM.

@sksamuel
Copy link
Member

We could do something like
val writer = new FileWriter(path + '-' + Thread.currentThread.getId, true)
Then have the reader look for all files with the given prefix. That would work.
Really though, the append says its atomic, everyone on stack overflow says its atomic, so why the hell is it not atomic 👍

@sksamuel
Copy link
Member

Writing out on exit should work too, although I'd have to think about how it knows when its finished. It won't be enough to use shutdown hooks because you might run the whole thing in a single process and then the report generators will fail because the file won't have been written.

@sksamuel
Copy link
Member

This is what I have so far

def invoked(id: Int, path: String) = {
val dir = new File(path)
dir.mkdirs()
val file = new File(path + "/" + Thread.currentThread.getId)
val writer = new FileWriter(file, true)
writer.append(id.toString + ';')
writer.close()
}

And the corresponding load

// loads all the invoked statement ids
def invoked(dir: File): Seq[Int] = {
dir.listFiles.map(file => {
val reader = new BufferedReader(new FileReader(file))
val line = reader.readLine()
reader.close()
line.split(";").filterNot(.isEmpty).map(.toInt)
}).flatten.toSeq
}

sksamuel added a commit that referenced this issue Mar 11, 2014
@RichardBradley
Copy link
Contributor Author

Things like jacoco instrument the bytecode by attaching to the JVM.

... but at some level jacoco must still have to deal with lots of "invoked" reports coming in from multiple threads. Do you know how they deal with that?

I notice that jacoco has better Scala support than it used to (see e.g. this pull request). Have you tried it lately?

Is this article still accurate? Maybe it would be easier to port statement-level coverage into jacoco rather than trying to port all of the features of a mature coverage suite into scoverage?

@sksamuel
Copy link
Member

I think since jacoco uses the agent its not writing anything at all, so it won't need locks. It's just a big list of invoked statements. We have to write out because maven won't keep the jvm alive between phases.

Jacoco's support will never be as good as scoverage. The reason is with Scala the mapping from source to bytecode is not bijective, whereas with Java its closer to being bijective. If you do a pattern match in scala, then you might have things of interest for example (did you call the match, did you match case 1, did you match case 2, ....). The bytecode generated might be 30 ops or it might be 1 op. It's hard to come back from the bytecode and map into the source level statements.

That's why I believe its better to do AST manipulation.

@RichardBradley
Copy link
Contributor Author

Jacoco's support will never be as good as scoverage

Fair enough. I've had a closer look since writing the above, and jacoco isn't as full-featured or as mature as I had first thought.

Maybe it would be easier to port statement-level coverage into jacoco rather than trying to port all of the features of a mature coverage suite into scoverage?

Looks like the latter then ;-)
Sorry for the derail.

I think since jacoco uses the agent its not writing anything at all, so it won't need locks. It's just a big list of invoked statements.

I'm still not convinced that this isn't worth looking into -- jacoco's "big list of invoked statements" is the same as scoverages, in many ways. The list must be appended to by multiple threads. The list must be exported somehow when the JVM exits for out-of-process testing, and exported somehow when the tests finish for in-process testing. They must have had all the same problems we are having. We might be able to copy their solutions (both are open source). I'll look into it.

@sksamuel
Copy link
Member

I agree that there are probably good solutions in jacoco that we can borrow.
On 12 Mar 2014 10:20, "Richard Bradley" notifications@github.com wrote:

Jacoco's support will never be as good as scoverage

Fair enough. I've had a closer look since writing the above, and jacoco
isn't as full-featured or as mature as I had first thought.

Maybe it would be easier to port statement-level coverage into jacoco
rather than trying to port all of the features of a mature coverage suite
into scoverage?

Looks like the latter then ;-)
Sorry for the derail.

I think since jacoco uses the agent its not writing anything at all, so it
won't need locks. It's just a big list of invoked statements.

I'm still not convinced that this isn't worth looking into -- jacoco's
"big list of invoked statements" is the same as scoverages, in many ways.
The list must be appended to by multiple threads. The list must be exported
somehow when the JVM exits for out-of-process testing, and exported somehow
when the tests finish for in-process testing. They must have had all the
same problems we are having. We might be able to copy their solutions (both
are open source). I'll look into it.

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37393189
.

@RichardBradley
Copy link
Contributor Author

I had a look in Jacoco. I’m not familiar with the codebase, but it appears to me that they use a single global JVM lock on each instrumented instruction.
There is a class org.jacoco.core.runtime.RuntimeData which allows you to look up an “ExecutionData” by “id”, “name” and “probeCount”, which appears to be the data structure which holds the run/not-run status for a line. This access is synchronized on the RuntimeData object, which is a singleton (it is a singleton as it is a final field of the org.jacoco.agent.rt.internal.Agent class, which is explicitly a singleton).
I find this fairly surprising. Won’t this mean that apps instrumented with Jacoco will be very slow and somewhat single-threaded? Maybe I’ll ping their dev team and ask.

Using one file per thread works well for me (my highly concurrent test suite now passes). It also seems likely to be more efficient and more similar to uninstrumented code than a single global lock. I think this solution is much better than a global lock.

I have made a few minor tweaks in this pull request (#25): I added that concurrency test for Invoker/IOUtils, as this is a good record that the file-per-thread solution works, and a good starting point for experimenting on alternatives in case we revisit this in the future; and I clarified the comments on Invoker.

@sksamuel
Copy link
Member

I'm very surprised that my humble change is better than what jacoco have done. I think if you ping them they may come up with some scenarios we didn't think of that render my file per thread incorrect. At the very least it will validate what we are doing here.

@sksamuel
Copy link
Member

In addition. I guess at the end of the day it doesn't really matter if a test takes 1ms or 10ms. It's only important in production. So I think maybe the lack of efficiency in the jacoco style lock across the entire JVM is not really an issue.

@RichardBradley
Copy link
Contributor Author

I think it's also quite likely that I have misunderstood the code and that they have a per-class coverage lock, rather than a global coverage lock. I think the "probes" get locked, for writing, which is less than global.

@RichardBradley
Copy link
Contributor Author

at the end of the day it doesn't really matter if a test takes 1ms or 10ms. It's only important in production

Not entirely -- some of my integration tests failed with coverage turned on because the instrumentation slowed them down so much that they timed out waiting for something to happen and reported failure. Operations which take 100ms un-instrumented now take 2s+ with coverage turned on. I now have to have two sets of timeouts configured: one for normal test runs and one for instrumented test runs.

@sksamuel
Copy link
Member

Ok fair enough
On 20 Mar 2014 10:03, "Richard Bradley" notifications@github.com wrote:

at the end of the day it doesn't really matter if a test takes 1ms or
10ms. It's only important in production

Not entirely -- some of my integration tests failed with coverage turned
on because the instrumentation slowed them down so much that they timed out
waiting for something to happen and reported failure. Operations which take
100ms un-instrumented now take 2s+ with coverage turned on. I now have to
have two sets of timeouts configured: one for normal test runs and one for
instrumented test runs.

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-38150521
.

sksamuel added a commit that referenced this issue Mar 20, 2014
…ition-file-per-thread

#19 Fix the Windows race condition in the measurement file by using one file per thread.
@sksamuel
Copy link
Member

Can you run your tests single threaded with the lock and without the locks. Doesn't matter if the output is garbled, I just want to see the different in timings. Basically I'm curious to know - is it the locking mechanism taking the time or is there actual contention going on.

@sksamuel
Copy link
Member

Is 0.98.0 working ok on windows now ?

@sksamuel
Copy link
Member

0.98.2 handles all this well.

@RichardBradley
Copy link
Contributor Author

Thanks, yes, scoverage works fine multi-threaded on Windows now.

There are some performance issues which I have been working on for this thread, but I'll raise those as separate pull requests.

@RichardBradley
Copy link
Contributor Author

FYI: it looks like Java's file writer implementation doesn't set the native "FILE_APPEND_DATA" or "O_APPEND" flag when appending to files, so it looks like it won't support atomic writes on either platform: http://stackoverflow.com/a/24620026/8261

@sksamuel
Copy link
Member

sksamuel commented Jul 8, 2014

Oh right. Well for our use case it's not important as each thread writes to its own file now anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants