Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

Handle failure to persist GSON serialized JSON data (EXPOSUREAPP-3777) #1604

Merged
merged 3 commits into from
Nov 16, 2020

Conversation

d4rken
Copy link
Member

@d4rken d4rken commented Nov 13, 2020

@ralfgehrer and me found the crash issue.

The file that should contain the json is empty.

When GSON reads it and deserializes it, it doesn't care that the file is empty, it will just create a new object that looks non-null, but is actually null. You only find out about that though when you try to access it.

In this case it was triggered by the mutate call here and the mutate extension function would call toMutableMap() where Kotlin would then notice that the object is actually null.

The symptoms for affected users are fixed by forcing an evaluation via require(it.size >= 0) and this triggers an exception, and we delete the corrupt data file.

We are not 100% certain what leads to the empty file though, suspicions:

  • While the documentation states that close() calls flush() implicitly, the underlying implementation does not?
  • GSON Serialization fails, but at that point the file writer already created a new empty file. Why does it fail random though? A wrong serialization config would fail every time.
  • Process death at a really unlucky point in time, but this seems very unlikely, the timing would really have to been just right. Currently we didn't use the buffered version of the writer, so I'd expect the file to contain garbage, not just be empty, but the stacktraces show that the file must have been empty.
  • A storage/IO/Permission issue that allowed file creation, but not writing. We thought about low storage space, but the block size on Android is 4KB, and the json data is much smaller than that. So why would we be able to create a 4KB block on the file system but then not be able to write to it...
java.lang.NullPointerException:
  at de.rki.coronawarnapp.nearby.modules.calculationtracker.DefaultCalculationTracker$calculationStates$2$setupTimeoutEnforcer$1$1$1.invokeSuspend (DefaultCalculationTracker.kt:5)
  at de.rki.coronawarnapp.nearby.modules.calculationtracker.DefaultCalculationTracker$calculationStates$2$setupTimeoutEnforcer$1$1$1.invoke (DefaultCalculationTracker.kt:2)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1$invokeSuspend$$inlined$collect$1.emit (Collect.kt:6)
  at kotlinx.coroutines.flow.SharedFlowImpl.collect (SharedFlow.kt:12)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1.invokeSuspend (HotDataFlow.kt:15)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1.invoke (HotDataFlow.kt:2)
  at kotlinx.coroutines.flow.ChannelFlowBuilder.collectTo (Builders.kt:1)
  at kotlinx.coroutines.flow.internal.ChannelFlow$collectToFun$1.invokeSuspend (ChannelFlow.kt:2)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith (ContinuationImpl.kt:3)
  at kotlinx.coroutines.DispatchedTask.run (DispatchedTask.kt:15)
  at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely (CoroutineScheduler.kt:1)
  at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run (CoroutineScheduler.kt:13)

  java.lang.NullPointerException:
  at kotlin.collections.ArraysKt___ArraysKt.toMutableMap (_Arrays.kt)
  at de.rki.coronawarnapp.nearby.modules.calculationtracker.DefaultCalculationTracker$trackNewCalaculation$1.invoke (DefaultCalculationTracker.kt:6)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1$invokeSuspend$$inlined$collect$1.emit (Collect.kt:6)
  at kotlinx.coroutines.flow.SharedFlowImpl.collect (SharedFlow.kt:12)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1.invokeSuspend (HotDataFlow.kt:15)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1.invoke (HotDataFlow.kt:2)
  at kotlinx.coroutines.flow.ChannelFlowBuilder.collectTo (Builders.kt:1)
  at kotlinx.coroutines.flow.internal.ChannelFlow$collectToFun$1.invokeSuspend (ChannelFlow.kt:2)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith (ContinuationImpl.kt:3)
  at kotlinx.coroutines.DispatchedTask.run (DispatchedTask.kt:15)
  at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely (CoroutineScheduler.kt:1)
  at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run (CoroutineScheduler.kt:13)

  java.lang.NullPointerException:
  at kotlin.collections.ArraysKt___ArraysKt.toMutableMap (Unknown Source:2)
  at de.rki.coronawarnapp.nearby.modules.calculationtracker.DefaultCalculationTracker$finishCalculation$1.invokeSuspend (DefaultCalculationTracker.kt:2)
  at de.rki.coronawarnapp.nearby.modules.calculationtracker.DefaultCalculationTracker$finishCalculation$1.invoke (DefaultCalculationTracker.kt:2)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1$invokeSuspend$$inlined$collect$1.emit (Collect.kt:6)
  at kotlinx.coroutines.flow.SharedFlowImpl.collect (SharedFlow.kt:12)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1.invokeSuspend (HotDataFlow.kt:15)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1.invoke (HotDataFlow.kt:2)
  at kotlinx.coroutines.flow.ChannelFlowBuilder.collectTo (Builders.kt:1)
  at kotlinx.coroutines.flow.internal.ChannelFlow$collectToFun$1.invokeSuspend (ChannelFlow.kt:2)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith (ContinuationImpl.kt:3)
  at kotlinx.coroutines.DispatchedTask.run (DispatchedTask.kt:15)
  at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely (CoroutineScheduler.kt:1)
  at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run (CoroutineScheduler.kt:13)

  
  java.lang.NullPointerException:
  at kotlin.collections.ArraysKt___ArraysKt.toMutableMap (Unknown Source:2)
  at de.rki.coronawarnapp.nearby.modules.calculationtracker.DefaultCalculationTracker$trackNewCalaculation$1.invoke (DefaultCalculationTracker.kt:6)
  at de.rki.coronawarnapp.util.flow.HotDataFlow$internalFlow$1$invokeSuspend$$inlined$collect$1.emit (Collect.kt:6)
  at kotlinx.coroutines.flow.SharedFlowImpl.collect (SharedFlow.kt:12)
  at kotlinx.coroutines.flow.SharedFlowImpl$collect$1.invokeSuspend (Unknown Source:12)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith (ContinuationImpl.kt:3)
  at kotlinx.coroutines.DispatchedTask.run (DispatchedTask.kt:15)
  at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely (CoroutineScheduler.kt:1)
  at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run (CoroutineScheduler.kt:13)

…es, we end up with an empty file.

GSON deserializes this into a null object that looks like a non-null object until it is evaluated some time later.

Force an evaluation via `require(it.size >= 0)` and delete the corrupt data file when catching the exception.
@d4rken d4rken added bug Something isn't working maintainers Tag pull requests created by maintainers prio PRs to review first. labels Nov 13, 2020
@d4rken d4rken added this to the 1.6.0 milestone Nov 13, 2020
@d4rken d4rken requested a review from a team November 13, 2020 16:56
@d4rken d4rken changed the title Flush json data after writing it (EXPOSUREAPP-3777 Flush json data after writing it (EXPOSUREAPP-3777) Nov 13, 2020
@d4rken
Copy link
Member Author

d4rken commented Nov 13, 2020

The flush() should not be necessary as the close() call via .use { } would also internally call flush(). 🤔

This would point to the app process having died mid-write.

@d4rken
Copy link
Member Author

d4rken commented Nov 13, 2020

Still think it's the cause, but I don't get how this could be so common. While our user numbers are high, hitting this exact execution point, is like getting hit by lightning 🤔 .

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 0 Code Smells

55.6% 55.6% Coverage
0.0% 0.0% Duplication

Copy link
Contributor

@kolyaopahle kolyaopahle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Looks good to me, sadly i was not able to reproduce the error on any of my devices. I'm also quite confused about the frequency of this issue.

@d4rken
Copy link
Member Author

d4rken commented Nov 13, 2020

Upon further research the issue could also appear if GSON crashes while serializing the data, as the file writer we use will create an empty file.

So in the timeline:

  • Open file writer -> Empty file is created
  • GSON crashes for some reason

We end up with an empty data file too. I don't see how GSON could just crash randomly though.

@d4rken d4rken changed the title Flush json data after writing it (EXPOSUREAPP-3777) Handle failure to persist GSON serialized JSON data (EXPOSUREAPP-3777) Nov 13, 2020
@d4rken
Copy link
Member Author

d4rken commented Nov 13, 2020

As the issue is most likely to occur during writing, we've added .report for the exception that can be caught during writing. Our favorite dialog will show a message and the stacktrace that will help us understand the underlying issue.

@harambasicluka harambasicluka modified the milestones: 1.6.0, 1.6.1 Nov 16, 2020
@harambasicluka
Copy link
Contributor

This PR requires 3 to 4 reviews as it can't be tested and wasn't catched by the testers during 1.6.0 testing.

Copy link
Contributor

@chris-cwa chris-cwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look conclusive in theory

Copy link
Contributor

@harambasicluka harambasicluka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also @thomasaugsten tested it (s. jira comments)

@d4rken
Copy link
Member Author

d4rken commented Nov 16, 2020

Will follow up with another PR to rule out one more edge case.

@d4rken d4rken merged commit a6335e7 into release/1.6.x Nov 16, 2020
@d4rken d4rken deleted the fix/3777-flush-data-to-prevent-faulty-json branch November 16, 2020 11:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working maintainers Tag pull requests created by maintainers prio PRs to review first.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants