-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic server-side implementation of configurable expectation proposal #828
Conversation
+ Adds pipeline stage EXPECTATION_POPULATION, which will compute the user input expectation for each trip, between LABEL_INFERENCE and CREATE_CONFIRMED_OBJECTS - The stage currently uses a placeholder algorithm that marks every trip as not expected Tested by confirming that confirmed trips are as desired and that client behavior is unchanged
+ expectation_notification_config.py: * Loads the configuration file * Implements rule- and schedule-matching algorithms * Ultimately exposes methods to get trip- and collection mode-level settings + TestExpectationNotificationConfig.py: Tests all of that. Currently passing. Schema changes: - Removes draftDelay option + Notes that omitting a time zone is valid and has a specific meaning None of this is yet connected to the pipeline.
The bug was due to the fact that the test datasets shankari_2015-07-22 and shankari_2015-jul-22 both exist and are different things. I was loading shankari_2015-jul-22 and it was not actually loading any trips; switching to shankari_2015-07-22 solved this. + Added a test case so if this happens in the future the tests actually fail
+ The expectation stage of the pipeline now consults the configuration to determine a trip's expectation status - For now, the only expect types implemented are "all" and "none"; the others default to "all" - Still no verification against the schema + Includes extensive unit tests Tested by running the unit tests and confirming that client-side behavior is as expected.
The everlasting UI proposal, for reference — though it's a bit out of date and could use a new draft. A better reference for the config file is the schema itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any code changes required; these are primarily requests for clarification/additional documentation to improve long-term maintainability.
"enabled": { | ||
"description": "Whether or not to enable this mode; mode is enabled if this property is omitted", | ||
"type": "boolean" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do you anticipate deployers will/should use this v/s setting up a schedule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought I'd add a way to "comment out" a mode. It's not strictly necessary. Do you think it adds excessive complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, that's fine. Just update the documentation in a future commit to highlight this.
"type": "string" | ||
}, | ||
"confidenceThreshold": { | ||
"description": "Only display yellow labels with confidence greater than this threshold (0 to display all, 1 to display none)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this "only display" or "only mark"?
It looks like if the confidence is below the threshold, we always show it.
Concretely, I have a trip with confidence 0.6.
Am I correct that in the "intensive" mode, this will be marked ("trigger": -1,
) and in the "relaxed" mode, this will be marked "yellow"? An inline example will help because, as you might have figured out, nobody RTFMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually something I haven't figured out yet. If the confidence for a given label is below the threshold, the phone should always display that label as red and blank; that part I'm sure of (and have implemented accordingly on the client side). The current implementation has these below-threshold labels also counting as red for the purposes of rule matching as well, but I'm not sure what the best choice is there.
If we wanted to be maximally flexible, we would not have below-threshold labels match "trigger": -1
because there is a semantic difference between very low confidence inferences and no inferences at all, and if a study admin wanted to treat them the same they could just write one more rule to explicitly match these trips. The argument in favor of having below-threshold labels match "trigger": -1
(the status quo) is that it forces expectation/notification behavior to match what the user actually sees: to the user, red labels are red labels, if some trips that display red labels are appearing in To Label
and some aren't (and it's not because of some random sample option), they might get confused. It's also simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is complex. Let's discuss it after tomorrow. I really want to deploy this today so we can walk through the new screen at tomorrow's deployer/beta tester meeting.
"type": "object", | ||
"properties": { | ||
"startDate": { | ||
"description": "The base date from which to measure time; omit time zone to signify user's local time", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timezones are indeed tricky (remind me to tell you about the Brazilian timezone problem sometime), but I am concerned that this is over-engineering.
I anticipate that the related code is also tricky and brittle (now I understand why you had to do complicated time management), and it also introduces ambiguity.
Why do you think it is important to have "the user's local time" as opposed to using UTC, for example? Again:
- we are designing this for deployers, not end users, AND
- hopefully, we can auto-detect individual schedules in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for this part is actually relatively simple; the phone's time zone is embedded in the trip data object, so the server just inserts it if we're missing a time zone. (The complicated part is the schedule matching, regardless of time zones, and in particular figuring out how many months/weeks/days there are between two dates — see expectation_notification_config.py:64
).
Let's say we want to do primary data collection for one week, Monday to Sunday, every eight weeks. For simplicity, say we want the user to label all trips at the end of the day under primary data collection and none under secondary data collection. Under the existing schema, we can implement this as a schedule that begins at 00:00
user's local time and lasts 7
days, and specify a startDate
that is a Monday. Then the user will have trips appear on their To Label
screen for exactly 7 days every 56 days; if they label trips at the end of each day as they are supposed to, they will be interacting with the app 1 in 8 days.
Say we remove the ability to specify a time in the user's local time zone. We implement the schedule as beginning at 00:00
UTC. If the user is in a US time zone, primary mode will begin on Sunday evening and last until the next Sunday evening. Though this is the same length of time and on average will cover the same number of trips, it now spans 8 days instead of 7. If the user is labeling at the end of each day, they will be interacting with the app 1 in 7 days. This is worse.
To summarize, the ability to use a user's local time zone minimizes the number of days the user has to interact with the app. Even if they'll be labeling the same number of trips either way, every day we can avoid making the user do something is a hassle avoided.
I suppose if we wanted to simplify we could only use local time and not give the option to specify a time zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, let's discuss this after tomorrow. Right now, I'm going to merge this and get the actual recommendations to show up.
eaue._test_options = test_options_stash | ||
|
||
def preprocess(self, trip): | ||
# See eacilp.placeholder_predictor_2 for an explanation of the "fingerprint" technique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this explanation here - only a one-liner about Timestamp2index
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can flesh out that explanation in an upcoming commit.
def _get_expectation_for_trip(trip): | ||
raw_expectation = eace.get_expectation(trip) | ||
# For now, expect always labeling unless the config file specifies no labeling at all | ||
processed_expectation = not raw_expectation["type"] == "none" | ||
return {"to_label": processed_expectation} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add in a comment about what the difference between raw
and processed
is and is intended to be?
My understanding is that the raw expectation is of the form
"expect": {
"type": "randomDays",
"value": 2
},
and the processed is a simple boolean true/false on whether trips need to be labeled or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Will add the documentation in an upcoming commit.
} | ||
for trip in self.expected_trips: | ||
fingerprint = trip["data"]["start_local_dt"]["hour"]*60+trip["data"]["start_local_dt"]["minute"] | ||
if answers[fingerprint] is not None: self.assertEqual(trip["data"]["expectation"]["to_label"], answers[fingerprint]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this? won't the "randomDays"
expectations also return True
with your current implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the tests to enforce the actual intended behavior, not the current implementation. The actual intended behavior for the random options is not currently well-defined, so we don't test it — returning True
for the random options is an implementation detail (and a placeholder, at that).
|
||
|
||
def testGetRuleForLabel(self): | ||
# An answers entry is a (labels, index of corresponding rule under intensive mode, index of corresponding rule under relaxed mode) tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: expand on this a bit to explain how you go from the list of all rules to the selected rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please handle this in a subsequent commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to go ahead and merge this now so we can deploy tonight, demo tomorrow and start getting feedback. I've left the "will handle in subsequent commit" comments unresolved for your reference.
This makes it compatible with https://github.com/e-mission/e-mission-server/releases/tag/v3.0.0 in particular, by merging e-mission/e-mission-server#829 and e-mission/e-mission-server#828 e-mission/e-mission-server#825
What this PR does:
TestLabelInferencePipeline.py
What this PR doesn't do:
expect
types implemented are"all"
and"none"
; the others default to"all"