GitHub

Reproducibility Submission SIGMOD 2021, Paper 218

Authors: Svetlana Sagadeeva, Matthias Boehm

Paper Name: SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging

Paper Links:

https://dl.acm.org/doi/10.1145/3448016.3457323
https://mboehm7.github.io/resources/sigmod2021b_sliceline.pdf (green open access)

Source Code Info:

Repository: Apache SystemDS [1] (https://github.com/apache/systemds, commit 627825c25d5a5938a772a78ce037c57e68611998)
Reproducibility Repository: https://github.com/damslab/reproducibility/tree/master/sigmod2021-sliceline-p218
Programming Language: Java, R, SystemDS DML
Packages/Libraries Needed: JDK 8, Git, Maven, R

Datasets Used:

Hardware and Software Info: The experiments use both a scale-up, two-socket Intel server, and a scale-out cluster of 1+12 single-socket AMD servers:

Scale-up node: two Intel Xeon Gold 6238 CPUs@2.2-2.5 GHz (56 physical/112 virtual cores), 768GB DDR4 RAM at 2.933GHz balanced across 6 memory channels per socket, 2 x 480GB SATA SSDs (system/home), and 12 x 2TB SATA SSDs (data).
Scale-out cluster: 1+12 nodes, each a single AMD EPYC 7302 CPU at 3.0-3.3 GHz (16 physical/32 virtual cores), 128GB DDR4 RAM at 2.933 GHz balanced across 8 memory channels, 2 x 480 GB SATA SDDs (system/home), 12 x 2TB SATA HDDs (data), and 2 x 10Gb Ethernet.
We used Ubuntu 20.04.1, OpenJDK Java 1.8.0_265 with -Xmx600g -Xms600g (on scale-up) -Xmx 100g -Xmx100g (on scale-out), Apache Hadoop 2.7.7, and Apache Spark 2.4.7.

Setup and Experiments: The repository is pre-populated with the paper's experimental results (./results), individual plots (./plots), and SystemDS source code. The entire experimental evaluation can be run via ./runAll.sh, which deletes the results and plots and performs setup, dataset download and preparation, local and distributed experiments, and plotting. However, for a more controlled evaluation, we recommend running the individual steps separately:

./run1SetupDependencies.sh;
./run2SetupSystemDS.sh;
./run3DownloadData.sh;
./run4PrepareLocalData.sh; # on scale-up node
./run5LocalExperiments.sh; # on scale-up node 
./run6PrepareDistData.sh;  # on scale-out cluster
./run7DistExperiments.sh;  # on scale-out cluster
./run8PlotResults.sh;

Last Update: May 10, 2022 (minor fixes)

[1] Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthoer, Kevin Innerebner, Florijan Klezin, Stefanie N. Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqui, and Sebastian Benjamin Wrede: SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle, CIDR 2020

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
exp		exp
.gitignore		.gitignore
README.md		README.md
run1SetupDependencies.sh		run1SetupDependencies.sh
run2SetupSystemDS.sh		run2SetupSystemDS.sh
run3DownloadData.sh		run3DownloadData.sh
run4PrepareLocalData.sh		run4PrepareLocalData.sh
run5LocalExperiments.sh		run5LocalExperiments.sh
run8PlotResults.sh		run8PlotResults.sh
runAll.sh		runAll.sh
runAllExperiments.sh		runAllExperiments.sh
runLocalExperiment1.sh		runLocalExperiment1.sh
runLocalExperiment2.sh		runLocalExperiment2.sh
runLocalExperiment3.sh		runLocalExperiment3.sh
runLocalExperiment4.sh		runLocalExperiment4.sh
runLocalExperiment5.sh		runLocalExperiment5.sh
runPlotExperiment1.sh		runPlotExperiment1.sh
runPlotExperiment2.sh		runPlotExperiment2.sh
runPlotExperiment3.sh		runPlotExperiment3.sh
runPlotExperiment4.sh		runPlotExperiment4.sh
runPlotExperiment5.sh		runPlotExperiment5.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducibility Submission SIGMOD 2021, Paper 218

About

Releases

Packages

Languages

fzoepffel/reproducibility_incSliceline

Folders and files

Latest commit

History

Repository files navigation

Reproducibility Submission SIGMOD 2021, Paper 218

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages