Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more performant core.matrix implementation in math worker #1579

Open
metasoarous opened this issue Dec 20, 2022 · 1 comment
Open

Use more performant core.matrix implementation in math worker #1579

metasoarous opened this issue Dec 20, 2022 · 1 comment
Labels
clojure Clojure (math module) related issues 🔩 p:math 🚀 scale-and-perf Performance and scale issues

Comments

@metasoarous
Copy link
Member

Problem:
Back when writing the initial math implementation years ago, I wasn't able to put null/nill in the the more performant core.matrix implementations, and so decided to use plain Clojure vectors (of vectors) for the preliminary "raw" vote matrix (which needs a "missing value" sigil on participant/comment entries for which there is no data). While Clojure vectors are pretty performant structures for a lot of situations, using vectors of vectors to represent an array/matrix is really bad from a performance perspective.

What I've realized since is that the more performant implementations do allow for storing JVM NaN values in proper array data structures, which should be significantly more efficient. This should have a big impact on the "bootstrapping" time it takes to load up existing conversation data to a fresh math worker instance, and will probably have a big impact on memory consumption as well.

Suggested solution:
Switch to using NaN, together with one of the more performant core.matrix implementations (vectorz, ndarray?... I forget, will search)

Alternative suggestions:

@metasoarous metasoarous added feature-request For new feature suggestions 🔩 p:math clojure Clojure (math module) related issues 🚀 scale-and-perf Performance and scale issues labels Dec 20, 2022
@metasoarous metasoarous removed the feature-request For new feature suggestions label Dec 20, 2022
@jucor
Copy link
Contributor

jucor commented Jan 29, 2025

Thanks @metasoarous for flagging this and #1580 again today as we head into some bigger conversations :) Acknowledged and discussed with @colinmegill , will dive into this.
Here are the notes of our chat:

Worst-case impact on service

  • No data loss. It's mainly suboptimal from a comment routing perspective.
  • For the big conversation, main issue is that groups and comment routing freeze.
  • Secondary issue is that other conversations get blocked up and we start getting reports that the report either isn't showing or is stuck.
  • Would have to look at the server to see how it handles this, but @metasoarous thinks it just means the probabilities are frozen/stuck. Unsure what probability a comment that does [not?] have a routing priority would get (if not effectively zero).

Technical bottleneck

  • It's not even the math itself, just the loading into the data structure at the beginning of the math worker.

Short-term mitigation

  • Can be mitigated significantly by running a larger node. It's also possible to set up a separate instance using MATH_ZID_ALLOWLIST and ... BLOCKLIST variables, and with tuning of the kill timeout in bin/run. We can exclude a big conversation from the main worker instance on heroku, and set up a separate server that only runs that zid. So at least it is isolated.

Deeper mitigation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clojure Clojure (math module) related issues 🔩 p:math 🚀 scale-and-perf Performance and scale issues
Projects
None yet
Development

No branches or pull requests

2 participants