-
Notifications
You must be signed in to change notification settings - Fork 494
PRIO - ENS Timeout? Outdated exposure checks: CWA uses the same diagnosis key package several times in ProvideDiagnosisKeys to ENS #2880
Comments
Thanks! We have reported it to the development team as EXPOSUREAPP-6654. Let's see what they can do about this. Corona-Warn-App Open Source Team |
Some additional information: So it looks like, there is something going wrong with CWA itself. Obviously after ENS notified CWA of having finished, in the following flow the successful completion of diagnosis key matching or the result of risk calculation is not persisted correctly, so that CWA tries again with the previously downloaded diagnosis key package (instead of downloading and providing a new one) at a later time. |
And some more additional Information: after updating from CWA 1.15.2 to CWA 2.0.3 and an accidently nearly simultanous update of the ENS to v1.8 (211213000), the difference in the timestamps of CWA risk card and the ENS check log was 20 minutes last night! Before it usually was 1-2 minutes. I try to gather new information from logcat, as soon I find time. |
As the apps now work correctly, I guess I cannot do that. |
@rugk The question is, if the app indeed runs correctly now. For r1.15.2 it was not doing for me since around April 1st, but the failing error card came just 2 or 3 times. The ENS log holds entries of the last 16 days, so it could really help if you looked up the checksums there to verify you have not been affected. Thank you in advance! :) |
Short update: |
I'm wondering whether this is related to your #2880 ticket (likely fixed by #2934) 🤔. The Within the Lines 171 to 185 in de207de
I would expect val nextForcedDetectionAt = lastSuccessfulDetection?.startedAt?.plus(Duration.standardDays(1))
val hasRecentDetection = nextForcedDetectionAt != null && now.isBefore(nextForcedDetectionAt) Only if the last |
Okay, gone there and checked some keys around the error time and the hashes are all different. So my guess is, no it is not related. And, as said, I can only confirm so far that the issue is gone. I have never seen it again. If I should, I'll notice you. |
@rugk
I now also think so, that the "Risiko-Überprüfung fehlgeschlagen" card is not related to the outdated diagnosis keys that are provided to ENS, and that's a different issue. So with your help, we could track down the bug a little bit better. |
Short update:
If I'm not wrong, until (including) CWA 1.15.2, there were two workers for downloading and provision of DKs to ENS: the
I'm quite sure that this is the case.
Valid from CWA 2.0.3 on
now: 20210401-27_assembled_all-exposure-checks.txt
I guess there may be unexpected inconsistencies in these vals. I wonder, if it could be, that the worker was interrupted/stopped unexpectedly, and the persisted vals used in the worker became inconsistent in that case (and should have been reset probably in such a case). |
Here's also a logcat from April 23rd, 08:06am - 08:09am (checking exposures again with same keys) with CWA's pid:10658 and uid:10200, related CWA workers, network requests and ENS responses (I tried not to disclose personal data ;). Interestingly, there was obviously another DNS request from CWA that led to nothing, after everything was finished. Wonder, if the workers are a little bit confused.
|
@vaubaehn please retest when 2.2 is released. |
Dear all, looks like Google stopped logging from ENS to syslog (ExposureNotification) completely in recent days (logcat won't show any useful information anymore). Means, CWA's debug log feature seems to be the only thing we can rely on in the future. Triangulation (using CWA's logs together with ExposureNotification syslog) won't be possible anymore. |
Just as some unrelated background info: This could be related to a vulnerability they had, where they logged too much to that log, see https://www.cvedetails.com/cve/CVE-2021-31815/ (if I got the corect CVE here). It seems they decided to just drop the log completely. |
It's reasonable to assume that the change in logging is due to the vulnerability response. Google doesn't seem to have published any direct public response to CVE-2021-31815. Google's release notes are quite high level and don't cover this. So we can only make assumptions. Google seems to have informed COVID-19 app development partners though, as there is a CWA FAQ article about it under https://www.coronawarn.app/en/faq/#google_security_vulnerability. |
UPDATE on this issue The error is like Heisenberg's Uncertainty Principle: You can either observe the bug, or you log without any bug. After CWA 2.2.1 arrived on my phone last Friday (🍔 time!), I immediately turned on debug logging. The error did not occur since. @d4rken and @harambasicluka I'm offering to upload my debug log so far, and provide the ticket number for further analysis. Please give me a hint, whether you want to look into it. If you don't want to, I'd reset the current log, due to tiny space on my internal storage. In general it's the question whether it's worth to further inspect this bug, as it seems to
So we can (still) expect an impact of the bug for
The impact would mean that affected people might get warned with "fresh" diagnosis keys with an unwanted long delay (of > 48 hours in my case, see April 5th to April 7th in my exposure logs, or 42 hours on April 17/18). Please let me know how to proceed, thank you, |
@vaubaehn Thanks for detailed analysis. Forwarded your info to the internal ticket. (On a side note: Maybe this helps you to circumvent the uncertainty principle ;-) https://www.eurekalert.org/pub_releases/2021-05/au-etu050521.php and https://www.nist.gov/news-events/news/2021/05/nist-team-directs-and-measures-quantum-drum-duet ) |
If you can reproduce the problematic behavior, please upload a log, i.e. unexpectedly skipped exposure checks or duplicate key submissions (according to ENS hashes). I suspect a relationship with #3093 where the logs show that I checked show various timeouts when communicating with the ENS. Not sure whether the hourly packages make the issue more likely (more |
Hi @dsarkar ,
thanks for providing these links, they were indeed rather helpful! I was able to turn my old phone into a quantum computing mobile by inserting aluminum membranes that were cut out from some old aluminium foil into the phone. I put the phone into a black box, vacuumed and supra-cooled it. To create quantum states in the aluminium membranes, I simply used our kitchen's microwave oven.
all at the same time! That's fantastic! Hi @d4rken ,
Despite from what I wrote above, I was not able to reproduce the issue under the current conditions (low incidence rate and "my" CWA now doing hourly updates also on non-metered connection).
I'm rather sure that the issue is more likely on bigger packages only, as I only could observe it when incidence rates were high.
Yes, that's very likely and fits well together with my previous observations: Most of processor time ENS is consuming is for validating the DK packages against the signature (see logcat: #2880 (comment)). In case this blocks ENS for quite a bit of time, then communication timeouts are very likely. It's likely that the time ENS spends on validating signature is proportional to DK package size: higher incidence rates -> more keys -> more time needed for validating -> timeout more likely. But if that's the case, then there is still the question, why the times CWA retries on providing DKs again are so weird? It should be one hour later (next worker starting time), but sometimes it's just a couple of minutes later and sometimes 6 hours later... In the last two weeks the number of DKs doubled because of the delta variant in Europe, we're now around 100.000 keys again. Given r-value stays same, in about 5 weeks we will be back at the times, when this issue occured (DKs > 500.000). |
Unbelievable what you all do (to your phone), just to improve the Corona-Warn-App 😉 Maybe the issues with your phone you mention in #3692 (comment) don't happen because you switch between languages but because you put aluminum foil into it? 🤔😉 |
The number of diagnosis keys is increasing much currently. Probably within a few days we'll reach >500k keys. Somewhere in the next days I will run my phone on mobile data only, then let's see if the issue here is still present. |
... not knowing anything about the Google / Android / ENS specifications ... @vaubaehn |
Real time but if there are device above 8min we can increase the timeout |
Thanks for taking care, but after thinking a second time, I'm not sure if increasing the timeout is necessary.
So, if CWA can reliably get the Exposure Windows at any time, nothing would need to be changed here. At least the 2 different timeouts that I know would not change anything in the case of my reported time difference of 12 minutes between providing DKs and ENS finishing check. |
Hi everybody, a short intermediate result: |
Hard to guess, but an improvement may be possible. A proof of what actually caused CWA to provide exactly the same DK packages several times again has never been found. Unfortunately I currently can't let my phone run on mobile connection solely for at least a week to confirm whether the issue is fixed by #4929. I may try at a later time. For now I'm closing this issue. |
Avoid duplicates
Technical details
Describe the bug
At some days, CWA is providing the same previously downloaded diagnosis key package to ENS, resulting in outdated exposure checks. This already happened to me last year, and is now occuring again.
My assumption is, that a timeout of ENS and an insufficient handling of the error in CWA is the reason, because I'm experiencing the problem only in times with high incidence rates, and at the moment the DK packages contain 500k+ diagnosis keys.
So, probably many people might be affected currently without knowing it (unless they don't check the logs in ENS UI carefully).
As you can see from my attached ENS logs (see below) this happened on 10 days since April 1st.
Also, on April 18th at around 9:00 am, CWA correctly indicated, that exposure matching failed (for more that 36 hours):
When tapping on "Erneut Starten", the spinning wheel of downloading was visible for <1s, then the green card appeared, but exposure matching didn't start actually (as you can see from the logs).
Steps to reproduce the issue
Expected behaviour
Possible Fix
Re-evaluate and fix error handling and reporting for timeouts.
Additional context
Here are the latest logs of exposure matchings from ENS UI. I tried to separate the "double checks" and marked them with an asterisk for easier finding them. (The logs have been compiled from two exports on different dates!)
20210401-20210518_assembled_all-exposure-checks.txt
(Edit: updated exposure check logs 2021-05-18)
Internal Tracking-ID: EXPOSUREAPP-6654
The text was updated successfully, but these errors were encountered: