"java.lang.NumberFormatException" Race condition in coverage file #19

RichardBradley · 2014-03-11T09:22:31Z

I'm seeing intermittent errors like the following when running coverage reports with "scoverage" on Windows:

java.lang.NumberFormatException: For input string: "143911841188188"

I think this is because of a race condition in Invoker.invoked which is causing the "id" and the ";" to be written non-atomically.

This may only be a problem on Windows. Perhaps two adjacent small appends are atomic on POSIX but not on Windows?

I will submit a patch to consolidate the two writes into one. This seems to fix the issue on Windows, and it seems like a good idea on POSIX as well.

The text was updated successfully, but these errors were encountered:

RichardBradley · 2014-03-11T09:41:35Z

Oh, actually that hasn't fixed it. :(
That's the problem with intermittent bugs -- you never know for sure when they're fixed. I just got the following error from sbt scoverage:test with this patch applied (I'm reasonably sure this patch is applied):

[error] (scoverage-test:executeTests) java.lang.NumberFormatException: For input string: "14523368368"
...
at scoverage.IOUtils$.invoked(IOUtils.scala:13)

I take it this doesn't happen on unix?

I'll look into options for atomic file writes on Windows...

sksamuel · 2014-03-11T09:51:27Z

Might be windows new lines breaking it. Try stripping \r
On 11 Mar 2014 09:41, "Richard Bradley" notifications@github.com wrote:

Oh, actually that hasn't fixed it. :(
That's the problem with intermittent problems. I just got the following
error from sbt scoverage:test with this patch applied (I'm reasonably
sure this patch is applied):

error java.lang.NumberFormatException: For input string: "14523368368"
...
at scoverage.IOUtils$.invoked(IOUtils.scala:13)

I take it this doesn't happen on unix?

I'll look into options for atomic file writes on Windows...

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37277670
.

RichardBradley · 2014-03-11T09:58:19Z

That's not it -- there's no whitespace in the file, just numbers and ";"s

On my machine, sometimes the numbers are interleaved with the ";"s:

Here's an extract:

 1452;14523368368;368;368;368;367;460;367;369;369;369;369;393;369;393;462;393;393;393;1025;1024;495;182;183;190;192;191;195;193;194;1134;1133;1023;1023;1023;1023;581;581;539;538;538;537;538;538;535;537;534;535;534;1023;535;;1023

Note the missing semicolons near the start, and the double semicolon near the end.

sksamuel · 2014-03-11T10:02:20Z

Are you running tests in parallel ?
On 11 Mar 2014 09:58, "Richard Bradley" notifications@github.com wrote:

That's not it -- there's no whitespace in the file, just numbers and ";"s

On my machine, sometimes the numbers are interleaved with the ";"s:

Here's an extract:

1452;14523368368;368;368;368;367;460;367;369;369;369;369;393;369;393;462;393;393;393;1025;1024;495;182;183;190;192;191;195;193;194;1134;1133;1023;1023;1023;1023;581;581;539;538;538;537;538;538;535;537;534;535;534;1023;535;;1023

Note the missing semicolons near the start, and the double semicolon near
the end.

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37279047
.

RichardBradley · 2014-03-11T10:11:35Z

Yes

sksamuel · 2014-03-11T10:15:54Z

It wasn't designed to be thread safe. I guess we need to synchronize on
something in the writing code. I assume tests run in the same jvm at least.

On 11 March 2014 10:11, Richard Bradley notifications@github.com wrote:

Yes

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37280088
.

RichardBradley · 2014-03-11T10:46:15Z

It wasn't designed to be thread safe.

I thought you might have been relying on short file appends being atomic.

I think we should be able to do something like that. I don't see why they are not atomic. Probably the java "FileWriter" isn't using the win32 api in the right way to get atomic appends.

I guess we need to synchronize on something in the writing code.

Yes, locking should also do it. It might kill my test throughput.

I assume tests run in the same jvm at least.

Yes (but if we got atomic file appends working, we wouldn't even need to rely on that).

sksamuel · 2014-03-11T10:49:38Z

No one on linux has had issues but I don't know anyone who runs tests in
parallel. Most people disable it because of other issues they get with
other plugins. I had a google and apparently PrintWriter is atomic in the
code but its not guaranteed in the docs so might work might not. Do you
want to try that and if it works we can commit it.

On 11 March 2014 10:46, Richard Bradley notifications@github.com wrote:

It wasn't designed to be thread safe.

I thought you might have been relying on short file appends being atomic.

I think we should be able to do something like that. I don't see why they
are not atomic. Probably the java "FileWriter" isn't using the win32 api in
the right way to get atomic appends.

I guess we need to synchronize on something in the writing code.

Yes, locking should also do it. It might kill my test throughput.

I assume tests run in the same jvm at least.

Yes (but if we got atomic file appends working, we wouldn't even need to
rely on that).

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37282716
.

RichardBradley · 2014-03-11T12:03:42Z

I just got the following error from sbt scoverage:test with this patch applied (I'm reasonably sure this patch is applied):

I was wrong. I changed the version number and did a "sbt publishLocal", but it seems it wasn't enough to get this patch to be used in my other project.

I'm trying various things to get this patch to be picked up so I can test this...

RichardBradley · 2014-03-11T12:21:28Z

Right -- I've chased down all the different version numbers which need changing to test this patch, and I can confirm that neither the patch proposed here, nor a version using PrintWriter fix this on Windows.

I'm going to do some stress testing of Invoker.invoked in isolation and come up with a proper fix. It may need Java-level locking, if I can't get atomic appends to work on Windows.

RichardBradley · 2014-03-11T12:59:31Z

I can't see any way around locking at the moment.
Despite http://stackoverflow.com/questions/3032482/is-appending-to-a-file-atomic-with-windows-ntfs it seems that write(byte[]) can still be interleaved on Windows:

val writer = new FileOutputStream(path, true)
val bytes = (id.toString + ';').getBytes(Charset.defaultCharset())
writer.write(bytes)
writer.close()

I'll ask on StackOverflow if anyone knows how to achieve atomic file append in Java and submit a pull request with a lock.

RichardBradley · 2014-03-11T13:34:26Z

We should update the docs to note that using scoverage will introduce a global per-statement lock, so multi-threaded apps should test twice: once with coverage and once without (this is a sensible precaution anyway).

sksamuel · 2014-03-11T13:45:26Z

Yeah might be better to do deferred writing somehow. A lock per statement
might be insane.
On 11 Mar 2014 13:34, "Richard Bradley" notifications@github.com wrote:

We should update the docs to note that using scoverage will introduce a
global per-statement lock, so multi-threaded apps should test twice: once
with coverage and once without (this is a sensible precaution anyway).

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37295253
.

RichardBradley · 2014-03-11T14:33:06Z

How about a version which spawns a Future for each write and doesn't wait for the result?
I'm not 100% familiar with Scala's behaviour in that case.
Would the test runner need to wait for all the Futures to finish before generating the coverage report? How?

This version passes my load test:

def invoked(id: Int, path: String): Unit = {
  future {
    Invoker.synchronized {
      val writer = new FileWriter(path, true)
      writer.write(id.toString + ';')
      writer.close()
    }
  }
}

RichardBradley · 2014-03-11T14:35:27Z

I suppose the main alternative is a threadsafe queue, with a worker thread pulling "invoke"s off the queue and writing them to disk. That sounds like a fair bit of functionality to write and test (and race conditions to consider etc.).

I had a look at log4j to see how it deals with this sort of issue -- it has a global lock on the logger file appender.

sksamuel · 2014-03-11T20:02:38Z

The thing with log4j is the number of calls is a lot less. If we lock in the invoker then a statement like if (x) y else z we'll have 3 locks for just that one statement. I wonder if each thread can just write to its own file, and then the aggregator later can read them all in. What do you think.

RichardBradley · 2014-03-11T20:41:10Z

I wonder if each thread can just write to its own file, and then the aggregator later can read them all in. What do you think.

That sounds pretty sensible. I'll see if I can code it up over the next few days.

Or how about a global concurrent Set[Statement] per JVM and write it out on process exit? Would that be better?

What do other coverage frameworks do?

sksamuel · 2014-03-11T20:42:22Z

SCCT is the only other scala one, and that's not thread safe either IIRC. Things like jacoco instrument the bytecode by attaching to the JVM.

sksamuel · 2014-03-11T20:45:17Z

We could do something like
val writer = new FileWriter(path + '-' + Thread.currentThread.getId, true)
Then have the reader look for all files with the given prefix. That would work.
Really though, the append says its atomic, everyone on stack overflow says its atomic, so why the hell is it not atomic 👍

sksamuel · 2014-03-11T20:48:01Z

Writing out on exit should work too, although I'd have to think about how it knows when its finished. It won't be enough to use shutdown hooks because you might run the whole thing in a single process and then the report generators will fail because the file won't have been written.

sksamuel · 2014-03-11T21:04:51Z

This is what I have so far

def invoked(id: Int, path: String) = {
val dir = new File(path)
dir.mkdirs()
val file = new File(path + "/" + Thread.currentThread.getId)
val writer = new FileWriter(file, true)
writer.append(id.toString + ';')
writer.close()
}

And the corresponding load

// loads all the invoked statement ids
def invoked(dir: File): Seq[Int] = {
dir.listFiles.map(file => {
val reader = new BufferedReader(new FileReader(file))
val line = reader.readLine()
reader.close()
line.split(";").filterNot(.isEmpty).map(.toInt)
}).flatten.toSeq
}

RichardBradley · 2014-03-11T22:39:22Z

Things like jacoco instrument the bytecode by attaching to the JVM.

... but at some level jacoco must still have to deal with lots of "invoked" reports coming in from multiple threads. Do you know how they deal with that?

I notice that jacoco has better Scala support than it used to (see e.g. this pull request). Have you tried it lately?

Is this article still accurate? Maybe it would be easier to port statement-level coverage into jacoco rather than trying to port all of the features of a mature coverage suite into scoverage?

sksamuel · 2014-03-12T08:15:24Z

I think since jacoco uses the agent its not writing anything at all, so it won't need locks. It's just a big list of invoked statements. We have to write out because maven won't keep the jvm alive between phases.

Jacoco's support will never be as good as scoverage. The reason is with Scala the mapping from source to bytecode is not bijective, whereas with Java its closer to being bijective. If you do a pattern match in scala, then you might have things of interest for example (did you call the match, did you match case 1, did you match case 2, ....). The bytecode generated might be 30 ops or it might be 1 op. It's hard to come back from the bytecode and map into the source level statements.

That's why I believe its better to do AST manipulation.

RichardBradley · 2014-03-12T10:20:24Z

Jacoco's support will never be as good as scoverage

Fair enough. I've had a closer look since writing the above, and jacoco isn't as full-featured or as mature as I had first thought.

Maybe it would be easier to port statement-level coverage into jacoco rather than trying to port all of the features of a mature coverage suite into scoverage?

Looks like the latter then ;-)
Sorry for the derail.

I think since jacoco uses the agent its not writing anything at all, so it won't need locks. It's just a big list of invoked statements.

I'm still not convinced that this isn't worth looking into -- jacoco's "big list of invoked statements" is the same as scoverages, in many ways. The list must be appended to by multiple threads. The list must be exported somehow when the JVM exits for out-of-process testing, and exported somehow when the tests finish for in-process testing. They must have had all the same problems we are having. We might be able to copy their solutions (both are open source). I'll look into it.

sksamuel · 2014-03-12T10:23:33Z

I agree that there are probably good solutions in jacoco that we can borrow.
On 12 Mar 2014 10:20, "Richard Bradley" notifications@github.com wrote:

Jacoco's support will never be as good as scoverage

Fair enough. I've had a closer look since writing the above, and jacoco
isn't as full-featured or as mature as I had first thought.

Maybe it would be easier to port statement-level coverage into jacoco
rather than trying to port all of the features of a mature coverage suite
into scoverage?

Looks like the latter then ;-)
Sorry for the derail.

I think since jacoco uses the agent its not writing anything at all, so it
won't need locks. It's just a big list of invoked statements.

I'm still not convinced that this isn't worth looking into -- jacoco's
"big list of invoked statements" is the same as scoverages, in many ways.
The list must be appended to by multiple threads. The list must be exported
somehow when the JVM exits for out-of-process testing, and exported somehow
when the tests finish for in-process testing. They must have had all the
same problems we are having. We might be able to copy their solutions (both
are open source). I'll look into it.

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-37393189
.

…y using one file per thread.

RichardBradley · 2014-03-19T18:03:25Z

I had a look in Jacoco. I’m not familiar with the codebase, but it appears to me that they use a single global JVM lock on each instrumented instruction.
There is a class org.jacoco.core.runtime.RuntimeData which allows you to look up an “ExecutionData” by “id”, “name” and “probeCount”, which appears to be the data structure which holds the run/not-run status for a line. This access is synchronized on the RuntimeData object, which is a singleton (it is a singleton as it is a final field of the org.jacoco.agent.rt.internal.Agent class, which is explicitly a singleton).
I find this fairly surprising. Won’t this mean that apps instrumented with Jacoco will be very slow and somewhat single-threaded? Maybe I’ll ping their dev team and ask.

Using one file per thread works well for me (my highly concurrent test suite now passes). It also seems likely to be more efficient and more similar to uninstrumented code than a single global lock. I think this solution is much better than a global lock.

I have made a few minor tweaks in this pull request (#25): I added that concurrency test for Invoker/IOUtils, as this is a good record that the file-per-thread solution works, and a good starting point for experimenting on alternatives in case we revisit this in the future; and I clarified the comments on Invoker.

sksamuel · 2014-03-19T23:25:13Z

I'm very surprised that my humble change is better than what jacoco have done. I think if you ping them they may come up with some scenarios we didn't think of that render my file per thread incorrect. At the very least it will validate what we are doing here.

sksamuel · 2014-03-19T23:28:19Z

In addition. I guess at the end of the day it doesn't really matter if a test takes 1ms or 10ms. It's only important in production. So I think maybe the lack of efficiency in the jacoco style lock across the entire JVM is not really an issue.

RichardBradley · 2014-03-20T09:33:27Z

I think it's also quite likely that I have misunderstood the code and that they have a per-class coverage lock, rather than a global coverage lock. I think the "probes" get locked, for writing, which is less than global.

RichardBradley · 2014-03-20T10:03:08Z

at the end of the day it doesn't really matter if a test takes 1ms or 10ms. It's only important in production

Not entirely -- some of my integration tests failed with coverage turned on because the instrumentation slowed them down so much that they timed out waiting for something to happen and reported failure. Operations which take 100ms un-instrumented now take 2s+ with coverage turned on. I now have to have two sets of timeouts configured: one for normal test runs and one for instrumented test runs.

sksamuel · 2014-03-20T10:09:54Z

Ok fair enough
On 20 Mar 2014 10:03, "Richard Bradley" notifications@github.com wrote:

at the end of the day it doesn't really matter if a test takes 1ms or
10ms. It's only important in production

Not entirely -- some of my integration tests failed with coverage turned
on because the instrumentation slowed them down so much that they timed out
waiting for something to happen and reported failure. Operations which take
100ms un-instrumented now take 2s+ with coverage turned on. I now have to
have two sets of timeouts configured: one for normal test runs and one for
instrumented test runs.

Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-38150521
.

…ition-file-per-thread #19 Fix the Windows race condition in the measurement file by using one file per thread.

sksamuel · 2014-03-20T18:13:03Z

Can you run your tests single threaded with the lock and without the locks. Doesn't matter if the output is garbled, I just want to see the different in timings. Basically I'm curious to know - is it the locking mechanism taking the time or is there actual contention going on.

sksamuel · 2014-04-18T00:24:55Z

Is 0.98.0 working ok on windows now ?

sksamuel · 2014-04-18T10:36:44Z

0.98.2 handles all this well.

RichardBradley · 2014-04-18T14:46:42Z

Thanks, yes, scoverage works fine multi-threaded on Windows now.

There are some performance issues which I have been working on for this thread, but I'll raise those as separate pull requests.

RichardBradley · 2014-07-08T10:37:31Z

FYI: it looks like Java's file writer implementation doesn't set the native "FILE_APPEND_DATA" or "O_APPEND" flag when appending to files, so it looks like it won't support atomic writes on either platform: http://stackoverflow.com/a/24620026/8261

sksamuel · 2014-07-08T11:17:44Z

Oh right. Well for our use case it's not important as each thread writes to its own file now anyway.

RichardBradley mentioned this issue Mar 11, 2014

Fix race condition on Windows by consolidating two file writes #20

Merged

RichardBradley mentioned this issue Mar 11, 2014

Fix race condition on Windows by locking measurement file writes #21

Closed

sksamuel added a commit that referenced this issue Mar 11, 2014

#19 added multiple file output

e155117

RichardBradley added a commit to RichardBradley/scalac-scoverage-plugin that referenced this issue Mar 19, 2014

scoverage#19 Fix the Windows race condition in the measurement file b…

af0c4b0

…y using one file per thread.

RichardBradley mentioned this issue Mar 19, 2014

#19 Fix the Windows race condition in the measurement file by using one file per thread. #25

Merged

sksamuel added a commit that referenced this issue Mar 20, 2014

Merge pull request #25 from RichardBradley/issue-19-windows-race-cond…

c4f34cb

…ition-file-per-thread #19 Fix the Windows race condition in the measurement file by using one file per thread.

This was referenced Mar 21, 2014

fix unit test now that coverage files are in a dir #29

Merged

Add support for coverage exclusion comments #26

Merged

sksamuel closed this as completed Apr 18, 2014

jdbxwyx mentioned this issue Feb 14, 2023

NumberFormatException when running coverageReport or coverageAggregated #476

Open

"java.lang.NumberFormatException" Race condition in coverage file #19

"java.lang.NumberFormatException" Race condition in coverage file #19

Comments

RichardBradley commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

sksamuel commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

sksamuel commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

sksamuel commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

sksamuel commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

sksamuel commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

sksamuel commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

sksamuel commented Mar 11, 2014

sksamuel commented Mar 11, 2014

sksamuel commented Mar 11, 2014

sksamuel commented Mar 11, 2014

RichardBradley commented Mar 11, 2014

sksamuel commented Mar 12, 2014

RichardBradley commented Mar 12, 2014

sksamuel commented Mar 12, 2014

RichardBradley commented Mar 19, 2014

sksamuel commented Mar 19, 2014

sksamuel commented Mar 19, 2014

RichardBradley commented Mar 20, 2014

RichardBradley commented Mar 20, 2014

sksamuel commented Mar 20, 2014

sksamuel commented Mar 20, 2014

sksamuel commented Apr 18, 2014

sksamuel commented Apr 18, 2014

RichardBradley commented Apr 18, 2014

RichardBradley commented Jul 8, 2014

sksamuel commented Jul 8, 2014