Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Flags Rewrite: List of things to get done #22131

Open
25 of 50 tasks
neilkakkar opened this issue May 7, 2024 · 0 comments
Open
25 of 50 tasks

Feature Flags Rewrite: List of things to get done #22131

neilkakkar opened this issue May 7, 2024 · 0 comments
Assignees
Labels

Comments

@neilkakkar
Copy link
Collaborator

neilkakkar commented May 7, 2024

There's a lot of things we need to do to make flags on rust production ready. This is a necessary-but-not-sufficient list - will add more things as I (re)discover them.

Local dev

  • Migrate hog-rs into main posthog repo, invest in CI to make this seamless.
  • Write RFC for this if necessary (RFC: Bringing back the monorepo for rust services and main app meta#207)
  • Add developing locally instructions for Rust services
  • Add cross-directory integration tests, so changes to posthog's data model doesn't bork flags service
  • Figure out local dev flow - with using an image when not developing the flags/capture repo, and without for developing.

Business logic

  • Add geoIP maxmind DB, query data for IP addresses
  • Handle token validation + distinct id parsing
  • Add basic matching logic
  • Connect to pg to fetch person properties
  • Handle early access feature properties (super groups)
  • Handle multivariate flags
  • Handle variant overrides
  • Handle evaluation reasons
  • Handle billing - adding info to redis
  • Setup populating and fetching token/team/flags from redis
  • Handle experience continuity + write path for flags
  • Handle group properties
  • Investigate efficient data structures for the in-memory caches used within flag_matching.rs. Consider LRU vs RwLock vs other alternatives
  • ensure flag responses are consistent irrespective of internal handling of hashes
  • Handle payloads
  • Handle optimisation for not going to db if flags can be evaluated locally.
  • Handle dynamic cohorts property matching (at rust level)
  • Handle static cohorts property matching (at db level)
  • Copy all tests from main repo
  • Handle date operator comparisons
  • Figure out group_key flag matching for flags with groups – how do we use the special $group_key field? Can it be deprecated?
  • Clean up error handling across the board; need to make sure it's consistent everything and consistent with how we were handling errors in Django
  • (Not necessary for first production deploy, but necessary to deprecate old functionality): Add additional endpoints for internal calling - bulk evaluation, evaluation reason, user_blast_radius.
  • Replicate holdout groups evaluation: feat(flags): Add holdout groups #25759

Production readiness

  • Replicate prometheus metrics
  • Wire up /flags endpoint as a service that can be reached/deployed
  • Make sure /flags endpoint can handle all of the various data payload formats in production (gzip, plain-text)
  • Setup DDOS protection rate limiting
  • Implement quota limiting for the /flags endpoint so that we can block users who've gone over their flag request limit
  • Handle db healthchecks / timeouts
  • Setup env vars for controlling sampling for billing / skipping write path / adjusting rate limits
  • Integrate with PostHog error reporting
  • Setup profiling the endpoint
  • Add liveness/readiness checks

SDK changes

  • Create new version of SDK that uses /flags instead of /decide to evaluate flags. Update APIs to make sure calls to /decide still work the same way

Documentation

  • Update documentation to refer to the service endpoint as /flags, not /decide everywhere

Verification Criteria

  • Audit existing feature flag integration tests to make sure we're covering enough
  • Staged rollout plan
    • Rollout percentages
    • Mirror traffic in contour
    • Verification plan for mirrored traffic
    • Decide rollout order

Future Performance optimizations

  • Investigate using alternative caching libraries (e.g. moka, or quick_cache) of going with my hand-rolled RwLocks. I don't think we actually need concurrency support since these caches are being created for each different request, so we might be able to squeeze more perf out of this system by using actual cache libs.
  • In the same vein ^ worth looking into the following – slow start DB issues, cache warming, cache size limits, global caches, etc.

Side quests

  • Write a library for local evaluation in Rust that's used by our SDKs everywhere
@neilkakkar neilkakkar added bug Something isn't working right team/feature-success and removed bug Something isn't working right labels May 7, 2024
@neilkakkar neilkakkar self-assigned this May 7, 2024
@neilkakkar neilkakkar added the feature/feature-flags Feature Tag: Feature flags label May 7, 2024
@dmarticus dmarticus assigned dmarticus and unassigned neilkakkar Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants