Skip to content

lamastex/SparkDensityTree

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparkDensityTree

This project aims to develop a nonparamteric density estimator with universal performance guarantees using distributed sparse binary trees.

Original ideas from Data Adaptive Histograms for Statistical Regular Pavings have been extended into the distributed fault-tolerant setting provided by Apache Spark. A preprint of this work in:

is in arXiv form here:

The latest PRs in 2022 by Johannes Graner for his Masters thesis work build further with bottom-up sparse trees to combat curse of dimensionality:

The PR at 15th of June 2023 by Axel Sandstedt for his Masters thesis work involves algorithms for reducing communications between machines in networks and general optimizations:

PRs between 13th of July and 13th of September were supported by Combient Mix AB through 2023 summer internship in Data Engineering Sciences to Axel Sandstedt.

Support

  • This work was initiated with support from project CORCON: Correctness by Construction, Seventh Framework Programme of the European Union, Marie Curie Actions-People, International Research Staff Exchange Scheme (IRSES) with counter-part funding from the Royal Society of New Zealand.
  • Combient Competence Centre for Data Engineering Sciences.
  • This research was partially supported by the Wallenberg AI, Autonomous Systems and Software Program funded by Knut and Alice Wallenberg Foundation and Databricks University Alliance with infrastructure credits from AWS.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 100.0%