This repository contains our continous monitoring infrasturcture (based on Android-MOD, a customized Android system that records system-level traces upon the occurrence of suspicious cellular failure events) for capturing cellular failures in the wild, as well as our efforts for improving cellular reliability on Android devices. Our Android-MOD system is built upon vanilla Andorid 9/10. Therefore, you'll be able to run codes in this repo by patching these modifications to proper framework components.
[The entire codebase and sample data are available in our Github repo.]
Our modifications mainly involve the telephony component in the framework layer (whose location in AOSP tree is frameworks/opt/telephony/src/java/com/android/internal/telephony
, denoted later as TELEPHONY_SRC
).
Specifically, we modify the DcTracker.java
and DefaultPhoneNotifier.java
to instrument concerned failure points (currently we only list those related to the three major cellular failtures--Data_Stall, Data_Setup_Error, and Out_of_Service), as shown in Monitor:
Class | Failure Point | Purpose | Location in AOSP |
---|---|---|---|
DcTracker |
onDataStallAlarm |
Tracking Data_Stall events | TELEPHONY_SRC/dataconnection/DcTracker.java |
DcTracker |
onDataSetupComplete |
Tracking Data_Setup_Error events | TELEPHONY_SRC/dataconnection/DcTracker.java |
DefaultPhoneNotifier |
notifyServiceState |
Tracking Out_of_Service events | TELEPHONY_SRC/DefaultPhoneNotifier.java |
Upon cellular failures, we then notify our dedicated event logging service CellularStateProcessor and CellularReliability to log critical cellular and device information:
Information | Description |
---|---|
UID |
Unique ID generated to identify a user (cannot be related to the user's true indentity) |
TIME |
UNIX timestamp |
RAT |
Current radio access technology |
RSSI |
Signal strength in dBM |
CELL |
Base Station ID (MCC+MNC+LAC+CID) |
OS |
Android vesion |
MODEL |
Device model |
CAUSE |
Error code of Data_Setup_Error defined in DataFailCause |
APN |
Current access point names |
For event recovery, we provide similar tracing to record recovery events. In particular, for Data_Stall events we probe the network to more accurately monitor event recovery in DataStallDiagnostics.
Upon RAT transitions, our control policy would kick in to check whether current system and network states are suitable for transitions. It currently runs as a daemon thread along side the telephony service, as shown in RATTransition.
We currently provide our time-inhomogeneous Markov process (TIMP) that formalizes the Data_Stall recovery process and find proper triggers for entering each recovery stage.
We implement the TIMP model in Python (timp_model) which can automatically search in the time trigger space so as to find triggers that can minimize the expected recovery time.
For Android-related modifications, currently our code is run and tested in Android 9 and Android 10 (AOSP). Note that despite quite a number of changes have been made in Android 10 since Android 9, our code is applicable to both given that concerned tracing points remain unchanged.
For other code component such as our TIMP model, they can be run on Linux with proper Python supports.
We have released a portion of data (with proper anonymization) for references here. As to the full dataset, we are still in discussion with the authority to what extend can it be released.
Our code is licensed under Apache 2.0 in accordance with AOSP's license. Please adhere to the corresponding open source policy when applying modifications and commercial uses. Also, some of our code is currently not available but will be relased soon once we have obatained permissions.