Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from boutique NamedMatrix impl and core.matrix to tech.ml stack for math #1062

Open
metasoarous opened this issue Jul 11, 2021 · 1 comment
Labels
🔩 p:math 🚀 scale-and-perf Performance and scale issues

Comments

@metasoarous
Copy link
Member

metasoarous commented Jul 11, 2021

Problem:
We're currently using a custom NamedMatrix protocol and implementation to house the raw votes in the math worker. This structure is more or less a dataframe/dataset structure. At the time it was developed, there wasn't really a good ready-made option for this, and so we built our own. This has served us decently over the years, but as the scale of conversations we've run have grown, this setup has proven limited:

Additional context:
Redoing all of the math is a potentially huge task fraught with peril

Suggested solution:
It's possible that we can start small by replacing the "raw" vote NamedMatrix (prior to zeroing out moderated out columns and imputing means for missing entries) with tech.ml, and working iteratively from there. Could possibly even reimplement the PNamedMatrix protocol against the tech.ml.dataset, but this may be easier said than done given the expected output from some of these routines. Still, maybe easier than going all-in on tech.ml right away.

Alternative suggestions:

  • Go all in right away
  • Possibly implement some of the core.matrix protocols against tech.ml data structures, so we don't have a full rewrite? Not sure of the feasibility of this or whether it's worth the time, but could be potentially useful for other projects looking to transition.

Moar context:
Testing will be important in making sure we get this right as we move towards complete transition.

@metasoarous
Copy link
Member Author

Update: I've realized that while the core.matrix implementation doesn't support proper missing values, clatrix does support using ##NAN in matrices. I wish I had thought of this way back when, as I think using a proper matrix type over a (Clojure) vector of vectors will probably make a pretty big difference. Nevertheless, a lot of the reasons mentioned above for switching to the tech.ml/dtype-next stack remain valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔩 p:math 🚀 scale-and-perf Performance and scale issues
Projects
None yet
Development

No branches or pull requests

1 participant