-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future of our person model #39
Conversation
|
||
As a result, events done between steps 1, 2 and 4 will be tracked as from person `p1` but events between steps 3 and 4 will be under person`p2`. As a result no events are lost but some will be considered as coming from a different (anonymous) person. In the above image this corresponds to the green period. | ||
|
||
This affects use cases where users access their product (Logged out) on multiple devices regularly. However, based on our analysis the impact is only on ~0.03% events, some marketing use-cases are more significantly affected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on the marketing use-cases that would be affected? Most marketing websites don't posthog.identify
users at all and primarily rely on cookies. I guess you're maybe thinking about tracking re-activation of users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sample one is this story:
- User signs up
- Some time later user becomes inactive
- App sends user an email
- User opens link in a re-activation/marketing email on a new device, signed out
- User signs in and becomes active
If you wanted to track users who opened the link due to this change you will need some custom code on your end.
cc @marcushyett-ph you might have a few others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep its primarily around re-activation of existing users on different devices (so the size of the use-case is generally very small). By marketing I was referring primarily to scenarios where the logged out experience (usually marketing site) is part of the end-to-end funnel.
|
||
6. We will need to set up some strong consistency for message ordering in plugin-server and kafka, to make sure messages are processed in the right order per user. | ||
7. We will store more data in the events table. See below for more details. | ||
8. Deleting user properties becomes more expensive due to being stored in a redundant way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see the migration as a potential challenge? How long did it take to set up the test table for us for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This was also mentioned under open questions.
Data copying wise - no. As part of PostHog/posthog#5684 I figured out a way to do the copying 20x faster than what we've done so far, which is good enough.
Infrastructure-wise - maybe. Due to the extra storage we might need to add nodes to our cluster, which has budget ramifications etc we'd need to figure out.
Implementation-wise - definetly! Actually ingesting data in a consistent way so events get up-to-date person properties is going to be a challenge due to similar reasons it's a challenge right now. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I guess I was specifically worried about self hosting. I think there's a reasonable way to implement this whilst running both tables at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think so:
- We can leverage async migrations to do it in a non-intrusive way
- We could add the person_id and person_properties columns to the existing table and backfill, but would need logic in the app to know when a query needs to join vs when it needs to read from the table. Some questions around how to handle the difference in person semantics in the product will arise as a result but we can figure them out if this project kicks off :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is deleting user properties a common operation that we care about (could be async migration in the future)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an operation we currently provide via the UI and comes up relatively often around compliance requirements, hence brought up here.
That said it's not in the critical path of the app and not a huge blocker when implementing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue it's pretty critical that we're able to properly delete user properties everywhere. Not only is it a huge compliance red flag, I think brand-wise it could hurt that we're the privacy friendly analytics, and you can't delete potentially highly sensitive data.
There also seems to be a non-trivial amount of current usage (~1k persons deleted per month), plus we've seen cases from Enterprise customers where this becomes a total blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it clear, I'm not proposing dropping it: rather my comment was to point out it's not as performance-critical as say ingestion is and we can make this happen even in the new system by accepting the extra load it generates.
## Open questions | ||
|
||
- How do we handle historical data imports? | ||
- How do we implement the change on cloud in a cost-effective way? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cost effective? I thought we had quite a lot of spare storage on our clusters and the increase doesn't sound too crazy. Is it CPU intensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially answered in #39 (comment)
The sticking point might be disk. We have some leeway right now (In practice disk is ~50% full which includes session recordings) but the copying process + extra data requirements of the new schema might push us to the point where we want more nodes or rethink storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar problem for self-hosted (we'd need to do a migration there too)
|
||
This RFC proposes both a technical and product solution to this problem: Store `person_id` and `person_properties` on the main events table. | ||
|
||
However, doing so will require breaking changes to the product. These changes are likely to cause some issues for customers, but these have either a negligible impact (affecting around 0.03% of events) or can be mitigated through documentation and education. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have links to the analysis done here? Curious to see which users and what kind of events would be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - doc is a little messy but theres a data report section on the impact analysis and a section including customer interviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly in favour of doing this, should be a better and more consistent experience for customers + unlock way higher scale.
|
||
#### Challenges | ||
|
||
6. We will need to set up some strong consistency for message ordering in plugin-server and kafka, to make sure messages are processed in the right order per user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a definite necessity? This is a tough one, even if we put implementation aside.
There are many things at play here:
- Users might want to send us old events
- Users may accidentally send us old events (e.g. through retries, network issues, etc)
- Data imports/exports
- Plugin server processing events out of order makes things easier performance-wise
#4 is mostly implementation, but the other 3 are inherent concepts that clash with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed 100% - ingestion server changes are the meat of implementing this.
I intentionally left the details up in the air in this proposal implementation-wise and was planning to follow up with a proper plan if we decide to accept these breaking product changes. Please also take a look at these and give your thoughts on whether they're acceptable trade-offs for you.
However to answer the question:
Complexity needs to live somewhere. We're constrained by being unable to "update" data in clickhouse (even via replacing/collapsing) so this complexity naturally falls on the ingestion side of things.
I intentionally haven't defined what it means to do $set here - e.g. if you $set in an event from March 2021 what effects this should have in future events. I believe the approach we should go for is that there'll still be a set of "current" user properties (as we have currently) that will apply for new events that get ingested.
Regarding ordering messages, the high-level thinking it's doable by:
- Partitioning our kafka topic by distinct_id
- Grouping messages each batch by distinct_id, enforcing ordering per distinct_id in the plugin server.
That said, I haven't put too much thought into this and out-of-order delivery. This is an area I was hoping to recruit your + @tiina303 help on if this moves on as data consistency is a key worry.
Do the higher level principles make sense? Everything said here we can iterate on should we accept the breaking changes and would like to avoid this turning into a full technical spec. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not as worried about a) being able to enforce order on the events we do have and b) historical exports as much as I'm worried about e.g. two people in a group are using the app at the same time, one has internet woes. Or, even with distinct IDs - events from the web and server side might arrive in a different order as they're subject to different networking conditions.
So yeah, things like that. Either way, I'm not holding back the proposal on this point, was just wondering how much this whole system relies on fully ordered ingestion. If the answer is "this system relies extremely heavily on completely in-order ingestion at all costs" then I think we need a deeper discussion, else this doesn't need to be as hard of a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌 Awesome proposal. Just some questions about how we're thinking of handling certain things/situations.
1. On initial visit, users device gets assigned an an (anonymous) distinct id `anonid1`. Posthog internally associates person_id `p1` to `anonid1`, which is saved together with their events | ||
2. When after signing up `posthog.identify('userid')` is called, we associate the same person_id `p1` with the new distinct id `userid` | ||
3. If the user later visits the site with a new device, a new anonymous distinct id `anonid2` is created. Posthog internally associates these events with a new person id `p2` as no mapping for that id exists. | ||
4. If after signing in `posthos.identify('userid')` is called, we associate events going forward with person_id `p1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For merging properties I assume we'll want to do that based on timestamps, i.e. we should still ship the any order ingestion that I've been working on. Did I understand it correctly that we're planning to just stick all the properties from user props during ingestion to the event?
What about merging people (either from UI or via the alias command)? Is that planned to be treated the same way as logged-out user? Could this be a problem for our users (use cases for merging could be different & we might have longer running time when the users are separate)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For merging properties I assume we'll want to do that based on timestamps
Historical backfills are unscoped as part of this proposal - I was planning on doing a follow-up "implementation" spec should we accept this proposal focusing on the product changes. As mentioned in #39 (comment) I would love your and Yakkos help with that.
What about merging people (either from UI or via the alias command)? Is that planned to be treated the same way as logged-out user?
Not sure I understood the question here, but yes, the key concept is that we wouldn't update historical data on merges/identifies/aliases and these changes only apply going forward.
For both changes the benefits are as follows: | ||
Significant short term performance gains | ||
Ability to support the largest tiers of customer now and in the future | ||
Standard analysis such as events from paying users over time will stop yielding misleading results when people start / stop paying (as they do today) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- benefit (though this would be true with the any order ingestion stuff being in too, but since it isn't yet): Migrating events can be done in any order (for events after the switchover) because all user props are within the event we can easily trigger retries for just the missing events without worrying about user props being overwritten in a bad way as they currently would be.
|
||
User properties will change over time on events, some customers may rely on the existing behavior (where the latest property is assumed for all prior events), however the new behavior is considered much more accurate by our customer feedback. | ||
|
||
Our proposed mitigation or most use-cases (e.g. events for users who have paid at some point) cohorts can be used to achieve the same results, we will document this for affected users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Let's document this for everyone. Were there any specific properties/examples for when latest property was desired? Could we add the most common ones here (maybe from the doc & desired behaviour)
Another example alternative via cohorts: if one wants latest emails for some insight query in the past then they could create a cohort from that insight and then get the latest emails for that cohort via a e.g. recent pageview events split by email.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were there any specific properties/examples for when latest property was desired?
From speaking to customers: not really. Changing properties like "email" is not a real usecase that affects day-to-day usage of the product and in practice point-in-time properties hits real usecases (like e.g. payment breakdown) much better.
That said if you have counter-examples from your usage of posthog 100% worth posting them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My specific examples for email:
- needing emails to know who's still self-hosting posthog postgres for the marketing team to reach out (definitely wanted the latest emails only). Though I think here I was using some event property, but I can see others having this use case.
- imagine there was a problem with in the app, that caused something weird to happen for certain users - we identify an event how to find those users (e.g. users running a broken version of posthog in self hosted that can end up in a crash loop or potentially set some things in a bad way and requires manual action from the instance operator) - same here I'd want latest emails.
Agree that they are likely rare and that point-in-time properties are usually what users want. Just thought it might be helpful for us to have a short snippet in the docs for everyone for how to get latest property values for anything (i.e. cohort & then potentially random insight on that cohort)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting latest values definitely makes sense and is needed to even power the Persons page!
Let's explore (and document) this more during technical planning - as you mentioned elsewhere, this need will come up elsewhere as well (e.g. calculating cohorts).
|
||
6. We will need to set up some strong consistency for message ordering in plugin-server and kafka, to make sure messages are processed in the right order per user. | ||
7. We will store more data in the events table. See below for more details. | ||
8. Deleting user properties becomes more expensive due to being stored in a redundant way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is deleting user properties a common operation that we care about (could be async migration in the future)?
## Open questions | ||
|
||
- How do we handle historical data imports? | ||
- How do we implement the change on cloud in a cost-effective way? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar problem for self-hosted (we'd need to do a migration there too)
From a technical standpoint: Store `person_id` and `person_properties` on the main events table. | ||
|
||
From a product standpoint storing person_id and person_properties on events requires making the following changes: | ||
1. After the first session we will no longer merge events prior to login with the main person’s events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I interpreting this right? Is this saying that after making this change we'll no longer be able to merge an anoynmous user's timeline into the identified users?
So for example if I come to posthog.com as an anonymous user, distinct ID A and signup as distinct ID kunal@posthog.com – distinct ID A's will always look like a separate person from kunal@posthog.com?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends.
The section below under customer impact goes into more detail.
To elaborate in different words though: during initial session (when kunal@ hasn't ever been identified) it behaves as you expect. The difference comes in on subsequent sessions e.g. on a different device where events pre login would get associated to a different person.
Let me know if this or the other explanation below make sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - let's go a little deeper:
In today's world...
I come on incognito window from my phone:
- I have distinct ID A assigned.
- I do a bunch of things before logging in
- I login, events performed under Distinct ID A can now be found looking at the timeline for kunal@posthog.com
I then come on from an incognito window from my laptop:
- I have distinct ID B assigned
- I do a bunch of things prior to login
- I login, events performed under Distinct ID B can now be found at the timeline for kunal@
So now looking at the event timeline for kunal@ - I see all the events that were performed from both distinct IDs A and B.
Which pieces of this would break (if any) in the proposed change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create a table with what the events data would look like in the end, starting with a completely new user (starting with an existing user wouldn't highlight the change properly).
Time | Event | Device | distinct_id | person_id |
---|---|---|---|---|
0 | $pageview | Desktop | anonid1 | p1 |
1 | Signup: posthog.identify('someuserid') | Desktop | someuserid | p1 |
2 | $someevent | Desktop | someuserid | p1 |
10 | $pageview | Mobile | anonid2 | p2 |
11 | Sign in: posthog.identify('someuserid') | Mobile | someuserid | p1 |
12 | $pageview | Mobile | someuserid | p1 |
The difference with what we do currently is at Time=10 - that event wouldn't get associated with p1.
Hope that helps! You can see analysis on the impact under customer impact session + via marcus' doc here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kpthatsme it's also worth reading this doc from our friends at another YC analytics company - as far as we can tell their merging system has a similar behavior - but they might have done a better job at explaining it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have some way to "backtrack" to the anonymous events/session post login?
For example, by adding the property anon_distinct_id: "anonid2"
to events 11
and 12
in the table above.
This would allow us to link session recordings with the pre-login part, and potentially allow some flexibility in the future when someone asks for a funnel one part of it is logged out events, and the other part consists of logged in events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: The only reasonable way we can accomplish this is a buffering mechanism that is already called out in the RFC and we would likely want to avoid initially due to complexity, but might implement later.
Long version:
Problem: pre-login events aren't tied to the logged_in person
Assumptions: Pre-login we don't know the logged_in person_id, hence events will be sent with a new uuid
There are only 3 options (let me know if you can think of any more):
1. join persons
This is what we do now & has all the perf problems for queries. If we are not querying over a single person_id for unique users queries results in us joining persons. Adding any properties etc results in the same need to join persons. I'd bucket any ideas with running queries across CH/postgres here too & I think they all have the same perf problem.
2. update events in clickhouse
Updating the person_id from the anonymous one to the logged_in person_id: We can't really mutate events in CH, what we could in theory do is run an async migration on the shard (essentially copy all data to a new table modifying the distinct_id's accordingly & then rename tables like 0002 async migration). This is probably expensive in terms of resources even done for a single shard; that migration would put us temporarily in a state of delayed ingestion (while we copy the last part of the table and rename we'd be queueing up new events in kafka similar to 0002). So pretty much a no-go.
3. buffering and write with logged_in person_id:
This is already mentioned this ("As a mitigation, if necessary, we would consider a buffer mechanism ...") as said there too, good food for thought for the future, but likely unnecessary complexity for now.
There are two options here which might not be all that bad of an idea:
- delay by using some intermediary storage for logged_out users: we could dump all anonymous events somewhere keyed by uuid and then consume them once we see the identify event (I was thinking kafka, but we can't get messages by key, maybe by some separate table in ClickHouse) and we we can write the events to ClickHouse with the logged_in person_id. I was thinking up to certain delay, e.g. 1 h or 1 day could be longer in theory too, but note that during that delay we wouldn't see those events in queries (which could be acceptable). I wonder if this would actually not be a terrible idea?
- Re-ingest the events with the new person_id and mark the other uuid as a deleted person (do we filter out deleted people somehow in queries or in any case filtering out is potentially cheaper than joins). We can occasionally run an async migration that clears them out.
Additionally there's a problem tied to person properties and how they would get updated, but we could just decide on a global semantics that person properties can only be used on logged_in users (in any case this is probably less important and the funnels working etc is more important).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming at first we don't implement a solution to properly associate green events ("green" from diagram) to the proper user, is there a way we can mitigate impact to DAUs/WAUs/MAUs analytics? My biggest concern is that your active users metric becomes inaccurate because of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mitigate impact to DAUs/WAUs/MAUs analytics
perhaps filtering on logged in users would help?
|
||
This affects use cases where users access their product (Logged out) on multiple devices regularly. However, based on our analysis the impact is only on ~0.03% events, some marketing use-cases are more significantly affected. | ||
|
||
As a mitigation, if necessary, we would consider a buffer mechanism to reduce affected events to ~0.003%, we can provide further details on this solution, however we would ideally avoid implementing this as it increases the complexity of the system quite significantly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(reposting a new thread from an earlier comment)
Could we have some way to "backtrack" to the anonymous events/session post login?
For example, by adding the property anon_distinct_id: "anonid2"
to events 11
and 12
in the table above.
This would allow us to link session recordings with the pre-login part, and potentially allow some flexibility in the future when someone asks for a funnel one part of it is logged out events, and the other part consists of logged in events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty interesting. In a way, this is what the session_id
is today.
Have read through - this feels like it'll mainly cause issues around reactivation... which is a very late stage (ie Facebook size) company problem where you've already saturated the market and aren't hunting for new users. The area it'd harm doesn't reflect what our existing customers focus on - activation for example. That's demonstrated through the tiny percentage impact it'd have in the data. We should document this behavior very clearly in the docs. It's also a significant data parity win not changing historical graphs when user properties change. Again, this needs documenting somewhere too very clearly so we don't have users thinking this is a bug when they expected the reverse behavior - ie graphs that are historic would change. The top reason to do this from a customer view is performance issues are frequent and worth solving. |
Co-authored-by: Tiina Turban <tiina303@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal makes sense to me. Particularly the trade-off of having reliable metrics over time (e.g. a user that was free and moved to paid at some point) vs. having some anonymized secondary usage unidentified, seems okay. Added comments in-line.
|
||
6. We will need to set up some strong consistency for message ordering in plugin-server and kafka, to make sure messages are processed in the right order per user. | ||
7. We will store more data in the events table. See below for more details. | ||
8. Deleting user properties becomes more expensive due to being stored in a redundant way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue it's pretty critical that we're able to properly delete user properties everywhere. Not only is it a huge compliance red flag, I think brand-wise it could hurt that we're the privacy friendly analytics, and you can't delete potentially highly sensitive data.
There also seems to be a non-trivial amount of current usage (~1k persons deleted per month), plus we've seen cases from Enterprise customers where this becomes a total blocker.
From a technical standpoint: Store `person_id` and `person_properties` on the main events table. | ||
|
||
From a product standpoint storing person_id and person_properties on events requires making the following changes: | ||
1. After the first session we will no longer merge events prior to login with the main person’s events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming at first we don't implement a solution to properly associate green events ("green" from diagram) to the proper user, is there a way we can mitigate impact to DAUs/WAUs/MAUs analytics? My biggest concern is that your active users metric becomes inaccurate because of this.
7. We will store more data in the events table. See below for more details. | ||
8. Deleting user properties becomes more expensive due to being stored in a redundant way. | ||
|
||
<details><summary>Details on cost of storage challenges</summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be feasible to do a detailed cost analysis that we can share with customers? My thinking is, we'll need more disk space (more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Will figure this out as we get to implementation! :)
Just cross-posting some user context/pain from Slack |
For hoglets: This PR is a follow-up to https://github.com/PostHog/product-internal/pull/240, outlining one area of improvement.
I suggest reading the proposal via rich diff, hit the following button in Files tab: