Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposure risk calculation algorithm #24

Merged
merged 5 commits into from
Jul 21, 2020
Merged

Conversation

Stypox
Copy link
Contributor

@Stypox Stypox commented Jul 4, 2020

For now I have added javadocs to risk calculation parameters and functions, containing references to Google and Apple's developer websites.
I used androidx @Nullable and @NonNull, is it ok? Jetbrains annotations were red since we are not using a jetbrains library afaik.
I also added default values for minimumRiskScore and durationAtAttenuationThresholds in ExposureConfiguration, as described in Apple documentation.

@Stypox
Copy link
Contributor Author

Stypox commented Jul 4, 2020

Writing documentation helped me organize ideas, and I realised there is something strange: apparently the ExposureConfiguration class contains a transmissionRiskScores field that contains values with user-defined meaning. So I am not sure how they have to be handled by the risk calculation algorithm, since they are user-defined.

@Stypox Stypox force-pushed the risk branch 2 times, most recently from 45b23d0 to 16a9c3e Compare July 4, 2020 22:23
@BjoernPetersen
Copy link
Contributor

Using AndroidX nullability annotations is perfectly fine. I think that was actually the only file in this project the Jetbrains annotations were used.

As for the transmissionRiskScores: I'm not sure yet how the right transmissionRiskScore is selected for each encounter, but it seems we'll just multiply the scores in the end, so we don't need to understand the semantics of the individual values.

RiskScore = attenuationScore * daysSinceLastExposureScore * durationScore * transmissionRiskScore

@Stypox
Copy link
Contributor Author

Stypox commented Jul 6, 2020

I added the risk calculation algorithm: it is surely not in the correct place and with the correct semantics, but it isn't difficult to move code if needed ;-)

@theScrabi
Copy link
Contributor

theScrabi commented Jul 7, 2020

How do we know the algorithm works correct? We don't have test vectors and can't simply extract them from the original app like we did with Sk and RPIs for the CryptoModul.

Or can we? Would it be possible?

@BjoernPetersen
Copy link
Contributor

Please change the ExposureConfiguration class to be immutable.

  • Change the visibility of all fields to private
  • Copy any arrays before returning them in public getters
  • Don't modify an existing configuration in the ExposureConfigurationBuilder, but use mutable fields in the builder instead

@Stypox
Copy link
Contributor Author

Stypox commented Jul 7, 2020

@BjoernPetersen done! ;-)
I did not create public getters for score arrays, since those should in theory only be used inside the class, so I didn't need to copy any array.

@Stypox
Copy link
Contributor Author

Stypox commented Jul 8, 2020

I found the ApplicationConfiguration used by CWA, it can be found at this url: https://svc90.main.px.t-online.de/version/v1/configuration/country/DE/app_config (obtained from coronawarnapp.service.diagnosiskey.DiagnosisKeyConstants.COUNTRY_APPCONFIG_DOWNLOAD_URL). I converted the provided binary file to an ApplicationConfiguration object by calling the autogenerated coronawarnapp.server.protocols.ApplicationConfigurationOuterClass.ApplicationConfigurationOuterClass.ApplicationConfiguration.parseFrom with the bytes of the file.

ApplicationConfiguration

app_version {
  android {
    latest {
      major: 1
      patch: 4
    }
    min {
      major: 1
      patch: 4
    }
  }
  ios {
    latest {
      minor: 8
      patch: 2
    }
    min {
      minor: 5
    }
  }
}
attenuation_duration {
  risk_score_normalization_divisor: 25
  thresholds {
    lower: 55
    upper: 63
  }
  weights {
    low: 1.0
    mid: 0.5
  }
}
exposure_config {
  attenuation {
    gt10_le15_dbm: LOWEST
    gt10_le15_dbm_value: 1
    gt15_le27_dbm: LOWEST
    gt15_le27_dbm_value: 1
    gt27_le33_dbm: LOWEST
    gt27_le33_dbm_value: 1
    gt33_le51_dbm: LOWEST
    gt33_le51_dbm_value: 1
    gt51_le63_dbm: LOWEST
    gt51_le63_dbm_value: 1
    gt63_le73_dbm: LOWEST
    gt63_le73_dbm_value: 1
    lt10_dbm: LOWEST
    lt10_dbm_value: 1
  }
  attenuation_weight: 50.0
  days_since_last_exposure {
    ge0_lt2_days: MEDIUM_HIGH
    ge0_lt2_days_value: 5
    ge10_lt12_days: MEDIUM_HIGH
    ge10_lt12_days_value: 5
    ge12_lt14_days: MEDIUM_HIGH
    ge12_lt14_days_value: 5
    ge14_days: MEDIUM_HIGH
    ge14_days_value: 5
    ge2_lt4_days: MEDIUM_HIGH
    ge2_lt4_days_value: 5
    ge4_lt6_days: MEDIUM_HIGH
    ge4_lt6_days_value: 5
    ge6_lt8_days: MEDIUM_HIGH
    ge6_lt8_days_value: 5
    ge8_lt10_days: MEDIUM_HIGH
    ge8_lt10_days_value: 5
  }
  days_weight: 20.0
  duration {
    gt10_le15_min: LOWEST
    gt10_le15_min_value: 1
    gt15_le20_min: LOWEST
    gt15_le20_min_value: 1
    gt20_le25_min: LOWEST
    gt20_le25_min_value: 1
    gt25_le30_min: LOWEST
    gt25_le30_min_value: 1
    gt30_min: LOWEST
    gt30_min_value: 1
  }
  duration_weight: 50.0
  transmission {
    app_defined1: LOWEST
    app_defined1_value: 1
    app_defined2: LOW
    app_defined2_value: 2
    app_defined3: LOW_MEDIUM
    app_defined3_value: 3
    app_defined4: MEDIUM
    app_defined4_value: 4
    app_defined5: MEDIUM_HIGH
    app_defined5_value: 5
    app_defined6: HIGH
    app_defined6_value: 6
    app_defined7: VERY_HIGH
    app_defined7_value: 7
    app_defined8: HIGHEST
    app_defined8_value: 8
  }
  transmission_weight: 50.0
}
min_risk_score: 11
risk_score_classes {
  risk_classes {
    label: "LOW"
    max: 15
    url: "https://www.coronawarn.app"
  }
  risk_classes {
    label: "HIGH"
    max: 72
    min: 15
    url: "https://www.coronawarn.app"
  }
}


From here an ExposureConfiguration object can be built (I copied the code in coronawarnapp.service.applicationconfiguration.ApplicationConfigurationService.mapRiskScoreToExposureConfiguration to do this):

ExposureConfiguration<
  minimumRiskScore: 11,
  attenuationScores: [0, 1, 1, 1, 1, 1, 1, 1],
  attenuationWeight: 50,
  daysSinceLastExposureScores: [5, 5, 5, 5, 5, 5, 5, 5],
  daysSinceLastExposureWeight: 50,
  durationScores: [0, 0, 0, 1, 1, 1, 1, 1],
  durationWeight: 50,
  transmissionRiskScores: [1, 2, 3, 4, 5, 6, 7, 8],
  transmissionRiskWeight: 50,
  durationAtAttenuationThresholds: [55, 63]
>

As you can see there is something wrong with the values: attenuationScores is 0 and then all 1s, daysSinceLastExposureScores is all 5, transmissionRiskScores is numbers 1 to 8. So either the CWA developers are not using the risk calculation algorithm to its full extent, or I am not requesting the data correctly (all other fields seem reasonable, though). What do you think?


Kotlin code to obtain the above results

        var exportBinary: ByteArray? = /* the bytes of the `export.bin` file contained in the downloaded zip*/;

        var appConfig: ApplicationConfigurationOuterClass.ApplicationConfiguration =
            ApplicationConfigurationOuterClass.ApplicationConfiguration.parseFrom(exportBinary)

        println(appConfig.toString())

        var config: ExposureConfiguration = ExposureConfiguration
            .ExposureConfigurationBuilder()
            .setTransmissionRiskScores(
                appConfig.exposureConfig.transmission.appDefined1Value,
                appConfig.exposureConfig.transmission.appDefined2Value,
                appConfig.exposureConfig.transmission.appDefined3Value,
                appConfig.exposureConfig.transmission.appDefined4Value,
                appConfig.exposureConfig.transmission.appDefined5Value,
                appConfig.exposureConfig.transmission.appDefined6Value,
                appConfig.exposureConfig.transmission.appDefined7Value,
                appConfig.exposureConfig.transmission.appDefined8Value
            )
            .setDurationScores(
                appConfig.exposureConfig.duration.eq0MinValue,
                appConfig.exposureConfig.duration.gt0Le5MinValue,
                appConfig.exposureConfig.duration.gt5Le10MinValue,
                appConfig.exposureConfig.duration.gt10Le15MinValue,
                appConfig.exposureConfig.duration.gt15Le20MinValue,
                appConfig.exposureConfig.duration.gt20Le25MinValue,
                appConfig.exposureConfig.duration.gt25Le30MinValue,
                appConfig.exposureConfig.duration.gt30MinValue
            )
            .setDaysSinceLastExposureScores(
                appConfig.exposureConfig.daysSinceLastExposure.ge14DaysValue,
                appConfig.exposureConfig.daysSinceLastExposure.ge12Lt14DaysValue,
                appConfig.exposureConfig.daysSinceLastExposure.ge10Lt12DaysValue,
                appConfig.exposureConfig.daysSinceLastExposure.ge8Lt10DaysValue,
                appConfig.exposureConfig.daysSinceLastExposure.ge6Lt8DaysValue,
                appConfig.exposureConfig.daysSinceLastExposure.ge4Lt6DaysValue,
                appConfig.exposureConfig.daysSinceLastExposure.ge2Lt4DaysValue,
                appConfig.exposureConfig.daysSinceLastExposure.ge0Lt2DaysValue
            )
            .setAttenuationScores(
                appConfig.exposureConfig.attenuation.gt73DbmValue,
                appConfig.exposureConfig.attenuation.gt63Le73DbmValue,
                appConfig.exposureConfig.attenuation.gt51Le63DbmValue,
                appConfig.exposureConfig.attenuation.gt33Le51DbmValue,
                appConfig.exposureConfig.attenuation.gt27Le33DbmValue,
                appConfig.exposureConfig.attenuation.gt15Le27DbmValue,
                appConfig.exposureConfig.attenuation.gt10Le15DbmValue,
                appConfig.exposureConfig.attenuation.lt10DbmValue
            )
            .setMinimumRiskScore(appConfig.minRiskScore)
            .setDurationAtAttenuationThresholds(
                appConfig.attenuationDuration.thresholds.lower,
                appConfig.attenuationDuration.thresholds.upper
            )
            .build()

        println(config);

@BjoernPetersen
Copy link
Contributor

I think those values seem about right. At least from a technical standpoint, the values in your ExposureConfiguration object match the data in the ApplicationConfiguration you provided. While it strikes me as odd that exposures that happened more than 14 days ago are scored the same as others, I suppose those won't happen if old contacts are regularly purged from the database.

All in all your guess about the CWA devs not fully utilizing the nuances and possibilities of the framework is probably correct, but as long as we can handle their config, that doesn't seem like a problem to me.

@Stypox
Copy link
Contributor Author

Stypox commented Jul 10, 2020

This is the ExposureConfiguration for Immuni (the italian app), and it looks as strange as the CWA one, so I guess govenments decided not to fine-tune risk score calculation. (obtained from https://get.immuni.gov.it/v1/settings?platform=android&build=1300000 )

"exposure_configuration": {
    "attenuation_thresholds": [ 50, 70 ],
    "attenuation_bucket_scores": [ 0, 0, 5, 5, 5, 5, 5, 5 ],
    "attenuation_weight": 1,
    "days_since_last_exposure_bucket_scores": [ 1, 1, 1, 1, 1, 1, 1, 1 ],
    "days_since_last_exposure_weight": 1,
    "duration_bucket_scores": [ 0, 0, 0, 0, 5, 5, 5, 5 ],
    "duration_weight": 1,
    "transmission_risk_bucket_scores": [ 1, 1, 1, 1, 1, 1, 1, 1 ],
    "transmission_risk_weight": 1,
    "minimum_risk_score": 1
},

@Stypox
Copy link
Contributor Author

Stypox commented Jul 10, 2020

What I still don't understand is why GMS can calculate an ExposureSummary based on the data from an exposure even though it never asks the framework user to provide a definition for the various transmissionRiskScore values. Since those are user-defined values, there should be some kind of way to provide GMS with a function/lambda to determine the transmissionRiskLevel corresponding to an exposure, that could then be used by GMS to access the transmissionRiskScore array at the correct index and thus calculate the final risk score.
Instead, at least from what I see here (I keep referencing the italian app since everything seems clearer to me, since basically all GMS calls are in one file) the transmissionRiskLevel is provided directly by GMS in an ExposureInformation object from the client.getExposureInformation(token) call. So apparently this value should be calculated by the framework, not by its user, even though the documentation suggests the opposite. I can't find any documentation online about how this transmissionRiskLevel should be calculated, does someone know where to find such info?

@mh-
Copy link

mh- commented Jul 10, 2020

@Stypox TRL is an input as well as an output to the matching.
As an input it is attached to each Diagnosis Key. If one of the keys matches, the TRL of that key is used as an index into the 1st row of the ExposureConfiguration table.
So it is an input into the calculation of the Exposure Risk Value.
The TRL of the matching key is also returned as an output to the app, so that the app could use the information for its own additional risk calculation.
Does this answer your question?

@mh-
Copy link

mh- commented Jul 10, 2020

Note however that Google is modifying this, see https://developers.google.com/android/exposure-notifications/exposure-notifications-api#data-structures google/exposure-notifications-server#663 and https://developers.google.com/android/exposure-notifications/exposure-key-file-format
TRL is deprecated as of v1.5, which means that the calculation will be done differently than explained above (which covers v1.0).
I have no idea however if / when RKI plans to switch to v1.5. Well, maybe I should simply ask...

BTW: Apple might also be modifying this, but it's not publicly visible - however there are strange artefacts:
image

@haitrec
Copy link
Member

haitrec commented Jul 10, 2020

I just did some more reading. Hopefully the following information helps.

First, some terminology from the EN framework docs:

TransmissionRiskLevel(s):

  • states defined in natural language
  • example: "Confirmed test - High transmission risk level"
  • this information is added to my keys before I upload them to the server

TemporaryExposureKey:

  • has a field transmissionRiskLevel, filled before uploading, used after downloading the key

TransmissionRiskScore(s):

  • an array of int values mapped to the different TransmissionRiskLevels
  • used in the actual risk calculation
  • the mapping from TransmissionRiskLevels is provided to the EN framework by the user/app (details for the german CWA below)
  • the user provided mapping from TransmissionRiskLevels is evaluated inside the EN framework
  • each score has an alias, e.g. RISK_SCORE_LOWEST for transmissionRiskScores[0]

Now implementation details from the CWA code:

There exists an ApplicationConfigurationService class .
Inside, an ApplicationConfiguration is acquired from a web server (the official server I guess) via asyncGetApplicationConfigurationFromServer() from the WebRequestBuilder

In the code, the ApplicationConfiguration type is imported via:
import de.rki.coronawarnapp.server.protocols.ApplicationConfigurationOuterClass.ApplicationConfiguration
This (probably) means that it is the ApplicationConfiguration defined in the applicationConfiguration.proto file.

After retreiving this ApplicationConfiguration object, the ApplicationConfigurationService uses it to build an ExposureConfiguration object. That object then is used for RetrieveDiagnosisKeysTransactions. More precisely, during such transactions, the list of diagnosis keys is fetched from the server, then passed to the client wrapper via its asyncRetrieveExposureConfiguration(...) call, together with the ExposureConfiguration object created before. The client wrapper passes both to the EN framework.

TL;DR:

The application configuration with the mapping between risk levels and scores is fetched from the server in the ApplicationConfigurationService. An ExposureConfiguration is built out of this and later on passed to the EN framework.

@mh-
Copy link

mh- commented Jul 10, 2020

@haitrec the TRL is not used by CWA like it’s recommended in the spec you mentioned. Instead a profile is used that maps a value (1..8) to each key, based on its age.

@mh-
Copy link

mh- commented Jul 11, 2020

ExposureConfiguration<
  minimumRiskScore: 11,
  attenuationScores: [0, 1, 1, 1, 1, 1, 1, 1],
  attenuationWeight: 50,
  daysSinceLastExposureScores: [5, 5, 5, 5, 5, 5, 5, 5],
  daysSinceLastExposureWeight: 50,
  durationScores: [0, 0, 0, 1, 1, 1, 1, 1],
  durationWeight: 50,
  transmissionRiskScores: [1, 2, 3, 4, 5, 6, 7, 8],
  transmissionRiskWeight: 50,
  durationAtAttenuationThresholds: [55, 63]

As you can see there is something wrong with the values: attenuationScores is 0 and then all 1s, daysSinceLastExposureScores is all 5, transmissionRiskScores is numbers 1 to 8. So either the CWA developers are not using the risk calculation algorithm to its full extent, or I am not requesting the data correctly (all other fields seem reasonable, though). What do you think?

I think the values are correct, and the CWA developers simply implemented the RKI risk estimation concept.

This concept places importance on the Transmission Risk - therefore the (currently) 13 uploaded Diagnosis Keys get TRL assigned based on their age. This TRL is just mapped 1->1, 2->2, 3->3 etc using the values above.

Attenuation and duration are also important inputs, but they are handled inside the CWA app. This is explained here. So the app does not make the framework do the most important parts of the calculations for this, but does them itself.

I think version v1.5 of the API with the new "ExposureWindow" concept goes into this direction: Let the apps do the complete risk calculations however they want to do them, and keep only the privacy-preserving-parts (like hiding exact timestamps) in the framework.

@Stypox
Copy link
Contributor Author

Stypox commented Jul 13, 2020

Thank you @mh- and @haitrec for your explanations, now my ideas are clearer :-D
I guess until we don't find an app that uses v1.5 we can't implement it, since we wouldn't know the details. The approach used in this PR should work fine for now and be flexible for later changes, since the risk calculation algorithm takes as parameters only the values contained in an ExposureInformation and every field is gettable if an app wants to manually calculate things. I think we can leave the ExposureWindow implementation for later (when we'll have more information) and only focus on v1 for now.
In the latest commit I added a test for the algorithm with the only "test data" I could find (i.e. the example on Apple's documentation), and the test succeeds. A part from the fact that more tests would be needed, this PR is ready in my opinion.

Copy link
Contributor

@BjoernPetersen BjoernPetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this certainly isn't perfect yet, partially due to lack of info from Google/Apple, I think this is good enough for now. I'll merge this now so we can start developing the SDK parts that rely on it, we can improve on it later.

@Stypox Thank you for the contribution, especially on such a hard-to-get-your-head-around topic.
Also thanks to everyone who provided the very valuable input in this thread!

@BjoernPetersen BjoernPetersen merged commit 340e8db into CoraLibre:master Jul 21, 2020
@ljl-covid
Copy link

Since @Stypox already mentioned the Italian Immuni app, I thought it may be worth pointing out that Ireland's contact tracing app that uses the Google/Apple API was also released as open source and donated to the Linux Foundation, with code available on GitHub, with, AIUI, separate code to communicate with the APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants