feat(flags): dynamic cohort matching in rust #25776

dmarticus · 2024-10-23T22:29:26Z

Problem

Implements cohort matching for dynamic cohorts in Rust. Required for the new /flags endpoint. One very exciting thing that this change does is support Cohort not_in matching for the new flags service, something that we didn't support in the original feature flags service. There is more work to be done to support this flow in the feature flags product (need to change the UI to expand cohort matching options and also redo user_blast_radius), but having the platform to support this type of behavior is great.

Changes

NB: I know the diff is huge, but it's also kinda something that makes sense to do in a bit chunk. I'm going to do a rigorous self-review and call out the parts that I want extra attention on, but if this is still too beefy to review then please let me know and I can break it into two parts – data modeling, and then matching.

that said, here's all that this change introduces:

cohort_definitions, similar to flag_definitions, this module introduces an interface for mapping Cohorts to Postgres. It also includes some utility methods for working with cohorts, including parse_filters, which turns a cohort property into property filters, and sort_cohorts_topologically, which I rewrote in rust to support doing nested property matching for cohorts that contain cohorts. All of these methods have tests.
I added new operators to the OperatorType, but rather than trying to resize property_matching to handle in/not_in for cohorts, I do that at a higher level. I'll call out my implementation.
I added dynamic cohort matching for cohorts, nested cohorts, and both in and not_in values into the flag_matching module. This has tests too.

A few things I want to call out that I want to revisit.

DB access patterns; flags with cohorts introduce more DB calls since we need to pull cohort definitions to convert to properties to match. Would be extremely cool to have some way of pre-flattening all this beforehand and turning them all into property filters at the top, like I do with the other ones. Haven't gotten my head around that yet and I think it's worth doing. Wanted to show my current approach first, though.
probably could write more tests, I have a separate task to port over all the django ones and see if any of this stuff breaks.
needs more comments in certain scary places, I'll probably note those when I do my self-review.

Does this work well for both Cloud and self-hosted?

No impact yet

How did you test this code?

added the following new tests:

pulling cohorts from postgres
parsing cohort filters into property filters
sorting cohorts topologically
basic cohort matching (IN cohort with matching person properties)
basic cohort matching (NOT IN cohort with matching person properties)
test not it cohort matching user in cohort
test cohort that depends on another cohort

Could add more cases here, too

…g into feat/dynamic-cohorts-rust

dmarticus · 2024-10-25T04:40:03Z

rust/feature-flags/src/property_matching.rs

+        OperatorType::In | OperatorType::NotIn => {
+            // TODO: we handle these in cohort matching, so we can just return false here
+            // because by the time we match properties, we've already decomposed the cohort
+            // filter into multiple property filters
+            Ok(false)
+        }


This seems safe enough to do – I handle these condition within evaluate_cohort_filters, not with match_property directly.

dmarticus · 2024-10-25T04:43:12Z

rust/feature-flags/src/flag_matching.rs

+            // Separate cohort and non-cohort filters
+            let (cohort_filters, non_cohort_filters): (Vec<PropertyFilter>, Vec<PropertyFilter>) =
+                flag_property_filters
+                    .iter()
+                    .cloned()
+                    .partition(|prop| prop.prop_type == "cohort");


I want to be able to evaluate the cohort properties and non-cohort properties with different methods, since I have to handle the in/not_in case for cohorts that I don't have to deal with using match_properties.

rust/feature-flags/src/flag_matching.rs

dmarticus · 2024-10-25T04:47:06Z

rust/feature-flags/src/flag_matching.rs

+        // Separate cohort filters from non-cohort filters
+        for filter in filters {
+            if filter.prop_type == "cohort" {
+                cohort_filters.push(filter);
+            } else {
+                non_cohort_filters.push(filter);
+            }
+        }


It's not so much that I'm separating, I'm differentiating. Basically, non_cohort_filters can be evaluated using match_property, but cohort_filters have to be pulled apart until they are property filters. This lets us evaluate cohorts properties like this

{"properties": {"type": "OR", "values": [{"type": "OR", "values": [{"key": "id", "type": "cohort", "value": 8, "negation": false}, {"key": "$browser", "type": "person", "value": ["Safari"], "negation": false, "operator": "exact"}]}]}}

where one property is a regular person property, and the other one needs to look up the cohort to evaluate.

dmarticus · 2024-10-25T04:47:43Z

rust/feature-flags/src/flag_matching.rs

+        if !cohort_filters.is_empty() {
+            let cohort_ids: HashSet<CohortId> = cohort_filters
+                .iter()
+                .filter_map(|f| f.value.as_i64().map(|id| id as CohortId))


value needs to be i64 but then I cast it to match the DB type. It's dumb but safe enough.

dmarticus · 2024-10-25T04:48:31Z

rust/feature-flags/src/flag_matching.rs

+                .map(|cohort| (cohort.id, CohortOrEmpty::Cohort(cohort)))
+                .collect();
+
+            let sorted_cohort_ids = sort_cohorts_topologically(cohort_ids, &seen_cohorts_cache);


if this cohort filter depends on multiple cohorts, sort them topologically.

rust/feature-flags/src/flag_matching.rs

dmarticus · 2024-10-25T04:49:12Z

rust/feature-flags/src/flag_matching.rs

+                    match filter.operator {
+                        Some(OperatorType::In) if !cohort_match => return Ok(false),
+                        Some(OperatorType::NotIn) if cohort_match => return Ok(false),
+                        _ => {}


we handle In and NotIn here, which is a cool feature.

rust/feature-flags/src/flag_matching.rs

dmarticus · 2024-10-25T20:49:47Z

Oh yeah, this also implements this almost 2-year old feature request! #13145

rust/feature-flags/src/cohort_models.rs

rust/feature-flags/src/cohort_operations.rs

rust/feature-flags/src/flag_matching.rs

rust/feature-flags/src/cohort_operations.rs

oliverb123 · 2024-10-28T11:43:07Z

Only real concern to me is that cohort recursion thing, that's going to brutalise PG if you do it on every request. Sketched out a couple of mitigations you could make, one order-of-evaluation one and one caching/fetching approach one.

…ht idea. Next up I will implement a version that stores the dependency graph as well so that we can only cache the relevant cohorts instead of caching and iterating through cohort

neilkakkar · 2024-10-30T10:50:47Z

flyby on cohorts, there really should only be 1 query to fetch all cohorts for the given team (the entire table is small) - and no additional queries - these should be cached for the lifetime of the query (this is seen_cohorts_cache). This is how old decide does it. (There was a TODO to start caching all cohorts in redis as well, which might reduce the calls even more, but wasn't really the limiting factor earlier). In this new architecture, you can potentially cache cohorts for all teams with a ttl, in memory, and then you don't need it in redis either and also sometimes make no pg queries at all 👌 .

imo its not worth optimising trying to fetch only the specific cohorts you need by walking down the dependency graph, much easier to always just have all and use them as you need them. Specially since this dataset is not huge. See max no. of cohorts by team id ->

dmarticus · 2024-10-31T21:33:51Z

rust/feature-flags/src/flag_definitions.rs

+// TODO: see if you can combine these two structs, like we do with cohort models
+// this will require not deserializing on read and instead doing it lazily, on-demand
+// (which, tbh, is probably a better idea)


this is a problem for future me, not in scope for this PR

rust/feature-flags/src/flag_matching.rs

rust/feature-flags/Cargo.toml

dmarticus · 2024-11-04T19:08:15Z

rust/feature-flags/Cargo.toml

@@ -39,6 +39,8 @@ health = { path = "../common/health" }
 common-metrics = { path = "../common/metrics" }
 tower = { workspace = true }
 derive_builder = "0.20.1"
+petgraph = "0.6.5"
+moka = { version = "0.12.8", features = ["future"] }


caching lib with support for TTL and feature weighting

As a heads up, this is already in the workspace, you can probably pull it in (we're using it in error tracking).

rust/feature-flags/src/cohort_cache.rs

…g into feat/dynamic-cohorts-rust

oliverb123

Bunch of comments here, that I think you should write down somewhere separately and then just merge this as-is - it's been around too long and is too big to keep doing review iterations on efficiently for anyone involved, it getting merged is blocking you, and imo it's totally shippable - if this was deployed today I'd still approve.

oliverb123 · 2024-11-15T14:45:34Z

rust/feature-flags/src/cohort_cache.rs

+    ) -> Self {
+        // We use the size of the cohort list (i.e., the number of cohorts for a given team)as the weight of the entry
+        let weigher =
+            |_: &TeamId, value: &Vec<Cohort>| -> u32 { value.len().try_into().unwrap_or(u32::MAX) };


Just a thought about casts generally, I think this is totally fine and you shouldn't pay the CI time to change it:

I'd almost argue for a raw unwrap here (or an expect with a helpful message), under the consideration you probably do want to fail loudly if a team has more than u32::MAX cohorts, but also, you'll never end up in this situation because fetching them would bring down postgres, you'd OOM, etc, so I'd then almost go for an as cast instead, knowing the truncation will never happen.

oliverb123 · 2024-11-15T14:47:17Z

rust/feature-flags/src/server.rs

@@ -54,6 +55,8 @@ where
        }
    };

+    let cohort_cache = Arc::new(CohortCacheManager::new(postgres_reader.clone(), None, None));


If I was you, I'd do the effort now of piping the cache sizes all the way into the Config object - during initial deployment and tuning, it's WAY nicer to be able to change those by editing the deployment's env vars directly, rather than needing a whole build cycle, and it's easy to forget since everything will work.

oliverb123 · 2024-11-15T14:48:11Z

rust/feature-flags/src/cohort_cache.rs

+        let cache = Cache::builder()
+            .time_to_live(Duration::from_secs(ttl_seconds.unwrap_or(300))) // Default to 5 minutes
+            .weigher(weigher)
+            .max_capacity(max_capacity.unwrap_or(10_000)) // Default to 10,000 cohorts


This default strikes me as quite low, I'd bump it an order of magnitude (or set it an order of magnitude larger) - that's a pure gut feeling though.

oliverb123 · 2024-11-15T14:49:35Z

rust/feature-flags/src/cohort_cache.rs

+
+impl CohortCacheManager {
+    pub fn new(
+        postgres_reader: PostgresReader,


nit, but if the variable name is the same as the type name, I go for stuff like "pr: PostgresReader" - the ide tells me everything I need to know about it anyway. I'd make it reader: PostgresReader in the struct declaration

oliverb123 · 2024-11-15T14:51:02Z

rust/feature-flags/src/cohort_cache.rs

+#[derive(Clone)]
+pub struct CohortCacheManager {
+    postgres_reader: PostgresReader,
+    per_team_cohort_cache: Cache<TeamId, Vec<Cohort>>,


Nit, same as below re: postgres_reader I suppose, but I know from the type that the cache is per-team (it's got TeamId as a key), and I know it's caching cohorts. You can be shorter here, the type shows up everywhere it's used.

This is purely taste though, if you disagree feel free to ignore.

oliverb123 · 2024-11-15T15:04:21Z

rust/feature-flags/src/cohort_operations.rs

+
+    /// Returns all cohorts for a given team
+    #[instrument(skip_all)]
+    pub async fn list_from_pg(


A note for later - this and cache hit/miss paths are GREAT place to put a metric btw - a counter of cohort fetches lets you see rates super easily, extremely useful for when you're first tuning the deployment

oliverb123 · 2024-11-15T15:08:52Z

rust/feature-flags/src/cohort_operations.rs

+        })?;
+
+        let query = "SELECT id, name, description, team_id, deleted, filters, query, version, pending_version, count, is_calculating, is_static, errors_calculating, groups, created_by_id FROM posthog_cohort WHERE team_id = $1";
+        let cohorts = sqlx::query_as::<_, Cohort>(query)


An aside but I'm happy to log on a bit late if you want to sync up for an hour some day next week, I can show you how to get sqlx query macros working nicely with CI / your local dev flow - we're finding them super useful over in error tracking land because they give errors at compile time (and therefor also from rust analyser while writing the code) if there's a problem with a query or a struct definition, letting you skip writing a lot of tests asserting simple queries are correct. I wouldn't go re-writing all the existing ones in this codebase, but for new ones you might find it handy. Feel free to throw something in my calendar.

I'd love that; I'm in Austin next week so I'll be even closer to you timezone-wise. Grabbed some time on Wednesday

oliverb123 · 2024-11-15T15:11:11Z

rust/feature-flags/src/cohort_operations.rs

+    ///   ]
+    /// }
+    /// ```
+    pub fn to_property_filters(&self) -> Vec<PropertyFilter> {


Just checked - you can consume the self here and nothing breaks, there's nowhere you want to both get a cloned copy of the inner filters and keep the wrapper around. This function becomes:

pub fn to_property_filters(self) -> Vec<PropertyFilter> { self.values .into_iter() .flat_map(|value| value.values) .collect() }

And I'd call it something like to_inner(). A semantic note but in rust, to_ prefixed functions almost always consume the self, whereas as or get ones take a &str (the to_ implying no allocation).

oliverb123 · 2024-11-15T15:14:35Z

rust/feature-flags/src/cohort_operations.rs

+            .properties
+            .to_property_filters()
+            .into_iter()
+            .filter(|f| !(f.key == "id" && f.prop_type == "cohort"))


You can use retain here, like:

let mut props = cohort_property.properties.to_property_filters(); props.retain(|f| !(f.key == "id" && f.prop_type == "cohort")); Ok(props)

Or you can make to_property_filters return an impl Iter<Item = PropertyFilter> and then do the filter().collect() as you already do - collecting to a vec just to into_iter and then collect again is an antipattern.

oliverb123 · 2024-11-15T15:26:52Z

rust/feature-flags/src/flag_matching.rs

+                .get_cohort_id()
+                .ok_or(FlagError::CohortFiltersParsingError)?;
+            let match_result =
+                evaluate_cohort_dependencies(cohort_id, target_properties, cohorts.clone())?;


This /definitely/ doesn't need a clone of the cohorts vec. Cohorts are cheap, but not free, cloning the cohort set once per request is already expensive enough. This diff compiles:

@@ -1108,10 +1108,9 @@ impl FeatureFlagMatcher { fn evaluate_cohort_dependencies( initial_cohort_id: CohortId, target_properties: &HashMap<String, Value>, - cohorts: Vec<Cohort>, + cohorts: &[Cohort], ) -> Result<bool, FlagError> { - let cohort_dependency_graph = - build_cohort_dependency_graph(initial_cohort_id, cohorts.clone())?; + let cohort_dependency_graph = build_cohort_dependency_graph(initial_cohort_id, cohorts)?; // We need to sort cohorts topologically to ensure we evaluate dependencies before the cohorts that depend on them. // For example, if cohort A depends on cohort B, we need to evaluate B first to know if A matches. @@ -1216,7 +1215,7 @@ fn apply_cohort_membership_logic( /// The graph is acyclic, which is required for valid cohort dependencies. fn build_cohort_dependency_graph( initial_cohort_id: CohortId, - cohorts: Vec<Cohort>, + cohorts: &[Cohort], ) -> Result<DiGraph<CohortId, ()>, FlagError> { let mut graph = DiGraph::new(); let mut node_map = HashMap::new();

dmarticus added 3 commits October 15, 2024 15:12

unifying some types

ca431b5

in progress but not done yet

e89f169

Merge branch 'master' into feat/static-cohorts-rust

4f20e07

dmarticus marked this pull request as draft October 23, 2024 22:29

dmarticus added 7 commits October 23, 2024 15:33

oh lol right let's actually ship

ed00224

or default

fb8aab8

Merge branch 'master' into feat/dynamic-cohorts-rust

899a99c

Merge branch 'master' into feat/dynamic-cohorts-rust

896c31a

let's goooo

eeea8cc

Merge branch 'feat/dynamic-cohorts-rust' of github.com:PostHog/postho…

d02baec

…g into feat/dynamic-cohorts-rust

Merge branch 'master' into feat/dynamic-cohorts-rust

39dad2d

dmarticus marked this pull request as ready for review October 24, 2024 22:33

modeled the data correctly this time 😓

db8cd8d

dmarticus requested review from neilkakkar, oliverb123, Phanatic and jurajmajerik October 24, 2024 23:00

dmarticus added 3 commits October 24, 2024 16:04

clippy my frickin GUY

43cda76

some light renaming

8d2ab85

yeah

9ccf479

dmarticus commented Oct 25, 2024

View reviewed changes

remove printlns

797adbe

add note about not handling groups

71def67

oliverb123 reviewed Oct 28, 2024

View reviewed changes

dmarticus added 3 commits October 29, 2024 14:05

saving a working version that supports caching, since this is the rig…

27af814

…ht idea. Next up I will implement a version that stores the dependency graph as well so that we can only cache the relevant cohorts instead of caching and iterating through cohort

new life

4c49bc4

clippy u dawg

d4af2f0

dmarticus added 4 commits October 30, 2024 23:03

traverse the dependency graph post-cache access

870f719

cleaning up

57d9885

adding more tests

9eb0f18

test for the cohort cache

3cfc590

dmarticus commented Oct 31, 2024

View reviewed changes

dmarticus added 2 commits October 31, 2024 14:37

a few things

3e8e5d2

Merge branch 'master' into feat/dynamic-cohorts-rust

3528b31

dmarticus mentioned this pull request Oct 31, 2024

feat(flags): add support for matching static cohort membership #25942

Merged

oliverb123 reviewed Nov 1, 2024

View reviewed changes

rust/feature-flags/src/flag_matching.rs Outdated Show resolved Hide resolved

dmarticus added 8 commits November 1, 2024 13:56

Merge branch 'master' into feat/dynamic-cohorts-rust

77059f3

use global cohort cache

09317c4

less yapping

43e8692

appeasing the linter

3a65683

that should do it

a5812e6

clean up

fd52b24

rename

59f7c10

bit more

8066aff

dmarticus commented Nov 4, 2024

View reviewed changes

dmarticus requested a review from oliverb123 November 4, 2024 19:22

dmarticus added 8 commits November 4, 2024 11:23

collapse condition

4d5ecd9

Merge branch 'master' into feat/dynamic-cohorts-rust

4012ebe

resolve conflicts

8ededb1

working on it

fe37b04

Merge branch 'feat/dynamic-cohorts-rust' of github.com:PostHog/postho…

bc38940

…g into feat/dynamic-cohorts-rust

not this either

41d3db3

docs

0a409f4

Merge branch 'master' into feat/dynamic-cohorts-rust

0dd1c0b

oliverb123 approved these changes Nov 15, 2024

View reviewed changes

dmarticus merged commit 4ce7e9c into master Nov 15, 2024
80 checks passed

dmarticus deleted the feat/dynamic-cohorts-rust branch November 15, 2024 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(flags): dynamic cohort matching in rust #25776

feat(flags): dynamic cohort matching in rust #25776

dmarticus commented Oct 23, 2024 •

edited

Loading

dmarticus Oct 25, 2024

dmarticus Oct 25, 2024

dmarticus Oct 25, 2024

dmarticus Oct 25, 2024

dmarticus Oct 25, 2024

dmarticus Oct 25, 2024

dmarticus commented Oct 25, 2024

oliverb123 commented Oct 28, 2024

neilkakkar commented Oct 30, 2024 •

edited

Loading

dmarticus Oct 31, 2024

dmarticus Nov 4, 2024

oliverb123 Nov 15, 2024

oliverb123 left a comment

oliverb123 Nov 15, 2024

oliverb123 Nov 15, 2024

oliverb123 Nov 15, 2024

oliverb123 Nov 15, 2024

oliverb123 Nov 15, 2024

oliverb123 Nov 15, 2024

oliverb123 Nov 15, 2024

dmarticus Nov 15, 2024

oliverb123 Nov 15, 2024

oliverb123 Nov 15, 2024

oliverb123 Nov 15, 2024

feat(flags): dynamic cohort matching in rust #25776

feat(flags): dynamic cohort matching in rust #25776

Conversation

dmarticus commented Oct 23, 2024 • edited Loading

Problem

Changes

Does this work well for both Cloud and self-hosted?

How did you test this code?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmarticus commented Oct 25, 2024

oliverb123 commented Oct 28, 2024

neilkakkar commented Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oliverb123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmarticus commented Oct 23, 2024 •

edited

Loading

neilkakkar commented Oct 30, 2024 •

edited

Loading