Skip to content

fzoepffel/reproducibility_incSliceline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproducibility Submission SIGMOD 2021, Paper 218

Authors: Svetlana Sagadeeva, Matthias Boehm

Paper Name: SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging

Paper Links:

Source Code Info:

Datasets Used:

Hardware and Software Info: The experiments use both a scale-up, two-socket Intel server, and a scale-out cluster of 1+12 single-socket AMD servers:

  • Scale-up node: two Intel Xeon Gold 6238 CPUs@2.2-2.5 GHz (56 physical/112 virtual cores), 768GB DDR4 RAM at 2.933GHz balanced across 6 memory channels per socket, 2 x 480GB SATA SSDs (system/home), and 12 x 2TB SATA SSDs (data).
  • Scale-out cluster: 1+12 nodes, each a single AMD EPYC 7302 CPU at 3.0-3.3 GHz (16 physical/32 virtual cores), 128GB DDR4 RAM at 2.933 GHz balanced across 8 memory channels, 2 x 480 GB SATA SDDs (system/home), 12 x 2TB SATA HDDs (data), and 2 x 10Gb Ethernet.
  • We used Ubuntu 20.04.1, OpenJDK Java 1.8.0_265 with -Xmx600g -Xms600g (on scale-up) -Xmx 100g -Xmx100g (on scale-out), Apache Hadoop 2.7.7, and Apache Spark 2.4.7.

Setup and Experiments: The repository is pre-populated with the paper's experimental results (./results), individual plots (./plots), and SystemDS source code. The entire experimental evaluation can be run via ./runAll.sh, which deletes the results and plots and performs setup, dataset download and preparation, local and distributed experiments, and plotting. However, for a more controlled evaluation, we recommend running the individual steps separately:

./run1SetupDependencies.sh;
./run2SetupSystemDS.sh;
./run3DownloadData.sh;
./run4PrepareLocalData.sh; # on scale-up node
./run5LocalExperiments.sh; # on scale-up node 
./run6PrepareDistData.sh;  # on scale-out cluster
./run7DistExperiments.sh;  # on scale-out cluster
./run8PlotResults.sh;

Last Update: May 10, 2022 (minor fixes)


[1] Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthoer, Kevin Innerebner, Florijan Klezin, Stefanie N. Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqui, and Sebastian Benjamin Wrede: SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle, CIDR 2020

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published