Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOLD for payment 2024-06-28] LOW: [Tracking] Metric - Send a message TTI (Performance audit by Callstack) #35225

Closed
kacper-mikolajczak opened this issue Jan 26, 2024 · 22 comments
Assignees
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Weekly KSv2

Comments

@kacper-mikolajczak
Copy link
Contributor

kacper-mikolajczak commented Jan 26, 2024

Intro

As a part of the Callstack performance audit, the issue tracks and summarises phases for Send a message metric.

The motivation behind measuring TTI of this particular behaviour were many reported issues of bad UX whenever user tried to send a message. By having an offline-first approach, we can also discard the network latency in measurements, but keep an eye on optimistic/actual updates in general during analysis.

Issue OwnerCurrent Issue Owner: @muttmuure
@kacper-mikolajczak
Copy link
Contributor Author

kacper-mikolajczak commented Jan 26, 2024

Measurements

Details

Metric: Send a message TTI (ms)
Phase: Measurements
Commit hash: 76e5b1c

Workflow

  1. Open app
  2. Go to report screen
  3. Type a message with emoji
  4. Send the message
  5. Wait until message is displayed

Markers

Below is the list of defined markers that were used during testing:

  • messageProcessing: Time between Send the message(4) and Wait until message is displayed(5) steps of the workflow

Their implementation can be found in e8682e8.

The messageProcessing marker needed an extra metadata that were stored in the PERF module, which can later be used as a part of analytics. If that can be avoided, please let me know so we can improve/replace it 🔧 👍

Measurements

Measurements were gathered by running the aforementioned Workflow in 10 runs.

messageProcessing [ms]
14588.59
11782.90
20523.72
15050.67
14571.76
17800.07
13568.82
16305.16
10060.16
10869.97

The average time of messageProcessing is: 14512.18 ms

Auxiliary data

Additionally for broader perspective, the resources consumption during one of the test runs is included.

CPU usage
CPU usage
JS thread usage
JS Thread usage
RAM usage
RAM usage

@mountiny
Copy link
Contributor

@kacper-mikolajczak Thanks! Sorry if I am misunderstanding this but did it take 14 seconds from clicking Send button to show the message in the report in your testing? 😱

I have never experienced delay remotely close to this

@adhorodyski
Copy link
Contributor

adhorodyski commented Jan 26, 2024

@kacper-mikolajczak Thanks! Sorry if I am misunderstanding this but did it take 14 seconds from clicking Send button to show the message in the report in your testing? 😱

I have never experienced delay remotely close to this

It may be the Heavy account setting we're using eduardo.graciano+newdotperf@callstack.com - I saw this myself where app was barely usable at this point (sending a message was hardly doable). The measurements provided by @kacper-mikolajczak unfortunately match my own experience in a chat on this account.

This was all on a physical device, production build and this exact account. @rinej will post a recording soon so we can see how the experience looked like.

@kacper-mikolajczak
Copy link
Contributor Author

kacper-mikolajczak commented Jan 26, 2024

Hi @mountiny 👋 No misunderstanding whatsoever - I did not believe at first as well :D

As @adhorodyski already pointed out, measurements were aligned with the guidelines but the app with this specific setup is not performing as it should. I will post the analysis soon, just to unwrap of what is going on under the hood 👍

@rinej
Copy link
Contributor

rinej commented Jan 26, 2024

@mountiny I am sending video example with around 15s for sending message.
It is done on real device and on eduardo.graciano+newdotperf@callstack.com

short-flow.mp4

Additionally on that account sometimes sending message is bugy. It sends just part of the message and keeps the whole message in the input

Example videos:

bug1.mp4
bug2.mp4

@mountiny
Copy link
Contributor

Holy Moly

@adhorodyski
Copy link
Contributor

Holy Moly

Give us some time we're already cookin' something!! 😊

@melvin-bot melvin-bot bot added the Monthly KSv2 label Jan 29, 2024
@kacper-mikolajczak
Copy link
Contributor Author

kacper-mikolajczak commented Feb 2, 2024

Analysis

In order to analyse underlying issues on Android platform, the app was ran in release build variant and profiled using both native and JS tools (Android Studio Profiler, Chrome Profiler and React Profiler respectively).

As a brief overview, here is a high-level representation of analysis flow:

Native profiling
High-level picture of an analysis flow

Finding the dark side

Android Studio Profiler is used to determine if performance issues stem from native, JS or both sides of execution.

The results of native profiling when plugged into Perfetto visualiser look as follows:

General overview of CPU usage across threads, Perfetto Visualiser

The focus of the analysis is bound to the 4 main threads from React Native standpoint:

  • mqt_js (JS Thread)
  • UI Thread
  • mqt_native_modules
  • Render Thread

By looking at the CPU usage summary, there are two areas of high-density calculations. The biggest concern seems to be the overload of mqt_js thread in the two aforementioned timeframes, which analysis will look into the most.

JS Profiling

The detailed JS profiling trace is obtained thanks to react-native-release-profiler library, which enables to profile the Hermes (JS) thread in release build. The trace will be presented via Chrome Devtools, which in turn, is going to be helpful determining the exact root cause(s) of an javascript-land issues.

Baseline hermes profiling trace, Chrome Devtools

From the trace observed above, the key take-away were made:

  • Process of sending a message requires calculations that block the JS thread for more than 27 seconds
  • This means the app is unresponsive for such time from user perspective.

The main goal of the analysis is the reduce the amount of calculations as much as possible.

Each section down below contains information about specific finding and its results on overall performance. The results will be summed up between sections to eventually present the final outcome in very last part.

After each section with meaningful changes in timings, the table will be presented like one below. First (baseline) and the last (final) tables consist of 10 samples, while the intermediate ones have 5 samples for brevity.

# Duration [ms]
1 14588.59
2 11782.90
3 20523.72
4 15050.67
5 14571.76
6 17800.07
7 13568.82
8 16305.16
9 10060.16
10 10869.97
Avg 14811.64

Hermes Intl localeCompare

Highest amount of work is performed by native string method localeCompare which, summarized from all calls, takes about 23 seconds to complete. The issue stems from Hermes engine Intl issues which are described more in detail in this proposal by Hur who found the original fix.

Result of fixing the issue is drastic reduction of time required to complete all the calculations. We were able to reduce workload from 27 seconds to about 5 seconds.

Stack trace after Hermes Intl changes shows significant improvement
# Duration [ms]
1 3572.07
2 3512.99
3 3542.68
4 3710.00
5 3672.54
Avg 3602,06

The 3 bumps

While previous step carried out enormous gains, we still have quite a lot of work going on under the hood. Even with such reduction we still can observe the pattern when looking at the JS profiling trace - there are three significant overheads when sending the message. After investigating, those “bumps” seem to be:

  • update of hasDraft property in report object when starting typing a message
  • optimistic update when sending the message
  • actual Onyx update after message was successfully sent

Even though those bumps are a result of the different application flows, they have the same root which is tied to the SidebarLinksData component and its internal calculations.

The SidebarLinksData subscribes to collection of all chatReports from Onyx (in testing account about 14k of reports). Collection is invalidated every time one of the report changes, which happens in aforementioned bumps (updating the hasDraft property and sending the actual message flow).

Whenever chatReports collection is invalidated, it breaks up the memoization of heavy optionListItems object. Keep in mind that the memoization might also break because of other dependencies. It was not the case during the analysis of sending the message, but they appear as fragile as chatReports.

Brief explanation of SidebarLinksData ordering mechanism and its pitfalls

New Order

The major suspect of optionListItems bad performance seems to be the getOrderedReportIDs method from SidebarUtils. It is called to calculate new order of reports for SidebarLinksData whenever dependencies change.

Looking at the stack trace, the getOrderedReportIDs function call is responsible for biggest time overhead of about 2.5-3s per call.

Stack trace presenting getOrderedReportIDs overhead inside SidebarLinksData

Each call consists of number of operations performed on set of reports (and other arguments) to determine correct order of reports:

  • caching reports (JSON.stringify)
  • filtering out reports (arrayPrototypeFilter)
  • modifying reports’ display names (arrayPrototypeForEach)
  • sorting reports (arrayPrototypeSort)

Those calculations are done for all of thousands reports every time collection is invalidated, thus resulting in high overhead. Let’s look closely at those in sections below.

The Cache

As old joke says, the three hardest things in computer science are cache invalidation, naming things and off-by-one errors. The audit, based on previous findings, will try to address them in this section.

The getOrderedReportIDs function tries to avoid calculations by introducing a cache. While it is generally a good practice, in this specific example it might add more overhead that it tries to mitigate.

App spends about 150ms performing JSON.stringify on heavy list of data in order to compute the cachedReportsKey.

Then the generated key is checked against existing cache. If cache entry is hit, the cached ordered reportIDs are returned. The issue here is that the cache entry is unlikely to be hit, especially given maximum cache size of 5 elements and quite big amount of input variables:

  • currentReportId
  • allReports
  • betas
  • policies
  • priorityMode
  • allReportActions

By removing the caching mechanism, the time required to calculate getOrderedReportIDs gets decreased by about 0.3s (~15%).

By removing the caching mechanism getOrderedReportIDs gets 15% faster

Additionally, the allReportActions are solely used to compute the cache, so when removing the cache implementation, the entire allReportActions retrieval process can be omitted as well (including request to Onyx), which is another potential gain for entire SidebarLinksData. By comparing the stack traces there is ~40ms gain in performance:

Before removing allReportAction retrieval
After removing allReportAction retrieval
🚧 When conducting the analysis, there were no single observed case of a cache hit. Having that said, we are not sure about its performance benefits in wild - please treat this finding as an entry point for the discussion instead of hardcore statement.
# Message Duration [ms]
1 testing1 3032.67
2 testing2 3216.57
3 testing3 3232.38
4 testing4 3295.99
5 testing5 3195.70
Avg - 3194.86

The naming

Another heavy lifter inside getOrderedReportIDs is a computation of report’s displayName property via getReportName.

Overhead of forEach that is mostly related to getReportName calculations

In order to calculate the display name, getReportName calls internally many different helpers, but the ones that stand out the most are getDisplayNameForParticipant and getPolicyExpenseChatName.

getDisplayNameForParticipant and getPolicyExpenseChatName overhead of ~200ms

While one could try to squeeze out gains by looking for a more performant library or doing micro-optimisations, the best way to save time is to calculate static properties once (on server-side preferably), as they are not intended to change and derive dynamic values from them.

For example, in getDisplayNameForParticipant implementation we can see that the value returned from formatPhoneNumber call can be re-used:

The formattedLogin field could be part of the personalDetails itself or might live in a separate entity. As an experiment, the formatted login was stored in a makeshift map and it resulted in about 500ms of reduced overhead:

Still costly forEach with a reduced runtime performance of about 500ms

The example brought up here are meant to illustrate what can be done on broader scale in the implementation phase with app’s data flow in general. The higher in the tree of data sources we can reduce the number of calculation the better and this is something we will do in future.

Order Message Duration [ms]
1 testing1 2482.31
2 testing2 2512.94
3 testing3 2474.32
4 testing4 2461.08
5 testing5 2493.86
Avg - 2484.12

Report’s hasDraft property

An example of such high-level improvement could be process of determining whether report has a draft message. It is fired whenever user starts typing a new message and is responsible for the first of three bumps described above.

Updating draft state is mainly used to show correct order and UI in LHN. While it is an important part of the UX, we don’t necessarily want to perform 1.5s of calculations to update one property.

Unfortunately, the hasDraft property is strictly bound to the report object. Changing the (part of) report will result in expensive calculations of aforementioned getOrderedReportIDs.

In order to fix this, the hasDraft property could be extracted away from the core report structure and be served as a separate dependency.

Here are the results of mitigating the hasDraft impact - the first bump of calculation that was present in one of the first measurments is now almost completely gone. The reduction goes from about 1.5s to under 100ms of an overhead. In comparison of other bumps, it now looks negligible.

![Results of avoiding the unnecessary getOrderedReportsIDs updates on draft message]

Results of avoiding the unnecessary getOrderedReportsIDs updates on draft message

This was achieved thanks to replacing the hasDraft property with a separate Onyx key REPORT_HAS_DRAFT_ which is then used in the component of interest (e.g. SidebarLinksData , OptionRowLHN). The most significant improvement is made by skipping the costly call of getOrderedReportIDs whenever the draft status changes - instead of calculating entire list from scratch, the newly created draft is injected into previous list.

Here is the partial implementation seen from SidebarLinksData component:

SidebarLinksData improvements

Similar pattern was applied to other draft properties, like REPORT_DRAFT_COMMENT, so it might be a worthwhile to spark a discussion about handling such structures in the system.

# Message Duration [ms]
1 testing5 2485.97
2 testing 😄 2314.00
3 testing6 2237.64
4 testing7 2247.78
5 testing8 2390.47
6 testing9 2271.90
7 testing1 2255.88
8 testing3 2309.42
9 testing5:smile: 2319.85
10 testing6 2280.91
Avg - 2310.77

Summary

While there is still a room for improvements, as the goal the we are striving for is 100ms of execution time, with the proposed changes involved, sending a message time went down from ~15s to ~2.5s.

Such changes are meant to be taken as a general direction or example of implementations which will be continued in next phase of the audit.

The main goal is to reduce costly calculations that freeze the thread making app unresponsive for the user - which is even worse UX than inaccessible inputs! Following that, as presented with smaller improvements we aim to improve both performance and development experience through writing more reliable code.

Hope you like the analysis - see you soon in the implementation phase! :)

CC @mountiny @roryabraham

@mountiny
Copy link
Contributor

mountiny commented Feb 2, 2024

These analysis are really impressive and great opportunity for everyone to learn from too. Its also great to see that this is in the open source channel as new engineers/ anyone can get inspired by these.

I am very excited for bringing the proposals from this into the App!

@kacper-mikolajczak
Copy link
Contributor Author

Love to hear that @mountiny!

We are also eager to implement the proposal as soon as possible to get the benefits out of it. Before that, would you like to break down analysis with me into some Actionable Points we can treat as entries into Implementation Phase?

For convenience, here is a quick summary of main points we can try to make into AP and prioritize:

  • Hermes Intl improvements (order of magnitude improvement, ~12s)
  • getOrderedReportsIDs tweaks: (~1.2s)
    • cache removal
    • displayName formatting overhead
    • sorting improvements
  • Draft message optimisations (depends on previous optimisations, ~200ms-~800ms)
    • moving draft-related properties from Report object into separate Metadata structure
    • improving dependency management

Please keep in mind the actual time gains might roughly differ - their purpose here is to have a general overview of at what level presented change might impact the performance 👍

@mountiny
Copy link
Contributor

mountiny commented Feb 6, 2024

Thank you Kacper, I think my reasoning for ordering would be: make the biggest impact changes first and also prioritize the lowest hanging fruit. Seems like your list achieves that so let's do that

@kacper-mikolajczak
Copy link
Contributor Author

Great, I find such ordering best as well. Just to sum up, here are ordered Action Items:

  1. Hermes Intl improvements
  2. getOrderedReportsIDs tweaks
    2.1. cache removal
    2.2. displayName formatting overhead
    2.3. sorting improvements
  3. Draft message optimisations
    3.1. moving draft-related properties from Report object into separate Metadata structure
    3.2. Improving dependency management (implementation of useMemo)

We are going to use those items as an input into Implementation Phase

CC: @mountiny

@mountiny
Copy link
Contributor

mountiny commented Feb 6, 2024

Thanks!

@muttmuure
Copy link
Contributor

We're working on it

@melvin-bot melvin-bot bot removed the Overdue label Apr 18, 2024
@muttmuure
Copy link
Contributor

Working through it

@quinthar quinthar moved this from HIGH to LOW in [#whatsnext] #quality May 12, 2024
@quinthar quinthar changed the title [Tracking] Metric - Send a message TTI (Performance audit by Callstack) LOW: [Tracking] Metric - Send a message TTI (Performance audit by Callstack) May 12, 2024
@roryabraham
Copy link
Contributor

Most recent slack convo: https://expensify.slack.com/archives/C05LX9D6E07/p1713358946087599

TL;DR currently we're talking about standardizing on moize as our go-to solution for memoization outside of React.

@kirillzyusko
Copy link
Contributor

This issue was mentioned in https://expensify.slack.com/archives/C05LX9D6E07/p1711455442814489?thread_ts=1711010712.155509&cid=C05LX9D6E07

I'm going to write e2e test for that (or extend a current one) to assure the performance for such critical flows is not degrading over time 👍

@melvin-bot melvin-bot bot added Reviewing Has a PR in review Weekly KSv2 and removed Monthly KSv2 labels Jun 7, 2024
@muttmuure
Copy link
Contributor

@kacper-mikolajczak are you still on this one?

@kacper-mikolajczak
Copy link
Contributor Author

Hi @muttmuure it was part of the last performance audit and is completed for now.

@muttmuure
Copy link
Contributor

muttmuure commented Jun 12, 2024

Closing

@github-project-automation github-project-automation bot moved this from LOW to Done in [#whatsnext] #quality Jun 12, 2024
@mountiny mountiny self-assigned this Jun 17, 2024
@melvin-bot melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Weekly KSv2 labels Jun 17, 2024
@melvin-bot melvin-bot bot changed the title LOW: [Tracking] Metric - Send a message TTI (Performance audit by Callstack) [HOLD for payment 2024-06-28] LOW: [Tracking] Metric - Send a message TTI (Performance audit by Callstack) Jun 21, 2024
@melvin-bot melvin-bot bot removed the Reviewing Has a PR in review label Jun 21, 2024
Copy link

melvin-bot bot commented Jun 21, 2024

Reviewing label has been removed, please complete the "BugZero Checklist".

Copy link

melvin-bot bot commented Jun 21, 2024

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.4.85-7 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2024-06-28. 🎊

For reference, here are some details about the assignees on this issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Weekly KSv2
Projects
Development

No branches or pull requests

7 participants