Add first implementation of augmentedNN to predict selectivity #1473

yetiancn · 2018-08-17T17:19:09Z

The model is an initial implementation to predict selectivity for range predicates.
It can be applied to queries like:
SELECT * FROM table WHERE c >= l AND c <= u.

I implement the model in augmentedNN.py and cpp wrapper code in augmentedNN.cpp, taking LSTM.py and LSTM.cpp as a reference.
Hyperparameters, especially number of training epochs, need to be discussed based on real system experiments.
Test cases for the model are also added. The test cases include a uniform distribution dataset and a skewed distribution dataset.

There are two classes defined.

class AugmentedNN (in augmentedNN.cpp). This class is just like class TimeSeriesLSTM.

Fit(): applies backpropagation.
Predict(): returns the predictions for the input.
TrainEpoch(): trains for one epoch.
ValidateEpoch(): uses one epoch for validation.

class TestingAugmentedNNUtil (in testing_forecast_util.cpp)

GetData(): generates data for training and testing. Dataset is uniform or skewed distributed.
Test(): calls the APIs mentioned above to train and test the model.

Btw, in testing_forecast_util.cpp, the argument of matrix_eig::bottomRows was wrong. It should be the number of rows counted from the bottom of the matrix_eig. I've modified it. Please check if I am right.

…nge predicates

coveralls · 2018-08-17T18:58:26Z

Coverage decreased (-0.2%) to 76.528% when pulling dc1a075 on yetiancn:master into 1fc8b55 on cmu-db:master.

linmagit

I went through the code before the tests. The high-level structure looks reasonable to me. I think we still need more comments and documentation to get a better understanding of the code.

For the questions I have, please directly add comments to the code if possible. I'll take a second pass along with the tests part after they're addressed.

linmagit · 2018-08-30T23:07:24Z

src/include/brain/workload/augmentedNN.h

+              float learn_rate, int batch_size, int epochs);
+  /**
+   * Train the Tensorflow model
+   * @param mat: Contiguous time-series data


time-series data? Looks like you need to update the document here.

linmagit · 2018-08-30T23:10:55Z

src/include/brain/workload/augmentedNN.h

+using TfFloatIn = TfSessionEntityInput<float>;
+using TfFloatOut = TfSessionEntityOutput<float>;
+
+class AugmentedNN : public BaseTFModel {


We need a higher lever documentation on what this thing is doing. That is, from a DBMS perspective, what is this AugmentedNN is modeling? What is it trying to predict? What is the input data, and what is the output data?

linmagit · 2018-08-30T23:13:24Z

src/include/brain/workload/augmentedNN.h

+   * However instead of applying backprop it obtains predicted values.
+   * Then the validation loss is calculated for the relevant sequence
+   * - this is a function of segment and horizon.
+   */


Same as this function. Looks like the comments are outdated.

linmagit · 2018-08-30T23:16:01Z

src/include/brain/workload/augmentedNN.h

+  // Function to generate the args string to feed the python model
+  std::string ConstructModelArgsString() const;
+  // Attributes needed for the Seq2Seq LSTM model(set by the user/settings.json)
+  int ncol_;


Don't use the abbreviation in variable names. Instead, use column_number_ or at least column_num_ if n means number here.

linmagit · 2018-08-30T23:16:25Z

src/include/brain/workload/augmentedNN.h

+  // Attributes needed for the Seq2Seq LSTM model(set by the user/settings.json)
+  int ncol_;
+  int order_;
+  int nneuron_;


Same here. And we need comments on what these member variables are.

linmagit · 2018-08-30T23:41:30Z

src/include/brain/workload/augmentedNN.h

+  float ValidateEpoch(const matrix_eig &mat); 
+
+  void Fit(const matrix_eig &X, const matrix_eig &y, int bsz) override;
+  matrix_eig Predict(const matrix_eig &X, int bsz) const override;


I think it's better to comment these functions as well. What are they doing? What the matrices in the function arguments exactly are? I think sometimes the matrix is the input, sometimes output, and sometimes both of them.

linmagit · 2018-08-31T03:13:40Z

src/brain/modelgen/AugmentedNN.py

+        self.optimize
+
+    @staticmethod
+    def jumpActivation(k):


We should use the same naming convention. Something like jump_activation() for functions.

linmagit · 2018-08-31T03:15:28Z

src/brain/modelgen/AugmentedNN.py

+    def jumpActivation(k):
+        def jumpActivationk(x):
+            return tf.pow(tf.maximum(0.0, 1-tf.exp(-x)), k)
+        return jumpActivationk


After all, what is this jump activation thing with different orders on each layer? Is there any reference? Or can you explain the intuition behind it?

linmagit

Thanks for adding the documentation! I have a few more comments.

I think it's better to create another selectivity folder under the brain directory (for both header and source directory) and put your new stuff there. For the tests, you should put the tests for the new model in a different file, and put TestingAugmentedNNUtil in a file called testing_augmented_nn_util.h(cpp).

Also, I prefer using all lower case augmented_nn in the file names instead of augmentedNN, but that's minor.

linmagit · 2018-09-07T02:45:22Z

test/brain/testing_forecast_util.cpp

+  size_t split_point =
+      data.rows() - static_cast<size_t>(data.rows() * val_split);
+
+  // Split into train/test data


This is splitting into train/validate data, and the test data is separate, right?

Right. I'll update the comment here.

linmagit · 2018-09-07T02:46:17Z

test/brain/model_test.cpp

+}
+
+TEST_F(ModelTests, DISABLED_AugmentedNNSkewedTest) {


I know we probably cannot enable them in the CI, but have you tested these tests locally?

Yes, I've tested them.

linmagit · 2018-09-07T02:48:15Z

src/include/brain/workload/augmentedNN.h

+ *   SELECT * FROM table WHERE c1 >= l1 AND c1 <= u1
+ *                         AND c2 >= l2 AND c2 <= u2
+ *                         AND ... 
+ * Input is [l1, u1, l2, u2, ...]


From the tests, it looks like right now we're doing only one pair of predicates, just [l1, u1], right?

linmagit

It looks like you have a duplicated Python script. Other than that it looks good to me.

…b#1473) * add first implementation of augmentedNN to predict selectivity for range predicates * add first implementation of augmentedNN to predict selectivity * add first implementation of augmentedNN to predict selectivity * add comments and modify variable names * rename some variables * create brain/selectivity; create new test file for augmented_nn. * remove duplicated files * check if travis is ok

yetiancn added 2 commits August 17, 2018 13:11

add first implementation of augmentedNN to predict selectivity for ra…

15cf3d1

…nge predicates

add first implementation of augmentedNN to predict selectivity

8c6ec93

yetiancn force-pushed the master branch 2 times, most recently from 35692f3 to 8c6ec93 Compare August 18, 2018 00:41

add first implementation of augmentedNN to predict selectivity

4c92bc9

GustavoAngulo self-requested a review August 20, 2018 23:27

apavlo requested a review from linmagit August 21, 2018 17:40

linmagit reviewed Aug 31, 2018

View reviewed changes

yetiancn changed the title ~~add first implementation of augmentedNN to predict selectivity~~ Add first implementation of augmentedNN to predict selectivity Sep 4, 2018

add comments and modify variable names

a5ceb36

yetiancn force-pushed the master branch from d24e82f to a5ceb36 Compare September 4, 2018 20:09

yetiancn mentioned this pull request Sep 4, 2018

Jenkins waiting to report status #1476

Open

Chad Dougherty and others added 3 commits September 4, 2018 19:42

Merge branch 'master' into master

f164c0a

rename some variables

57731ca

Merge branch 'tmp'

4c7ae6f

linmagit reviewed Sep 7, 2018

View reviewed changes

yetiancn force-pushed the master branch 2 times, most recently from f54b380 to a716df1 Compare September 7, 2018 20:24

create brain/selectivity; create new test file for augmented_nn.

4d13978

yetiancn force-pushed the master branch from a716df1 to 4d13978 Compare September 7, 2018 21:43

linmagit suggested changes Sep 10, 2018

View reviewed changes

yetiancn force-pushed the master branch from 5a5ef43 to 6727bd5 Compare September 12, 2018 01:59

remove duplicated files

813321d

yetiancn force-pushed the master branch 2 times, most recently from 1457242 to 813321d Compare September 17, 2018 02:34

check if travis is ok

2e04cad

yetiancn force-pushed the master branch from b17b6d6 to 2e04cad Compare September 25, 2018 16:21

Merge branch 'master' into master

dc1a075

linmagit approved these changes Sep 25, 2018

View reviewed changes

apavlo merged commit 6898305 into cmu-db:master Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add first implementation of augmentedNN to predict selectivity #1473

Add first implementation of augmentedNN to predict selectivity #1473

yetiancn commented Aug 17, 2018 •

edited

Loading

coveralls commented Aug 17, 2018 •

edited

Loading

linmagit left a comment

linmagit Aug 30, 2018

linmagit Aug 30, 2018

linmagit Aug 30, 2018

linmagit Aug 30, 2018

linmagit Aug 30, 2018

linmagit Aug 30, 2018

linmagit Aug 31, 2018

linmagit Aug 31, 2018

linmagit left a comment

linmagit Sep 7, 2018

yetiancn Sep 7, 2018

linmagit Sep 7, 2018

yetiancn Sep 7, 2018

linmagit Sep 7, 2018

yetiancn Sep 7, 2018

linmagit left a comment

		}

		TEST_F(ModelTests, DISABLED_AugmentedNNSkewedTest) {

Add first implementation of augmentedNN to predict selectivity #1473

Add first implementation of augmentedNN to predict selectivity #1473

Conversation

yetiancn commented Aug 17, 2018 • edited Loading

coveralls commented Aug 17, 2018 • edited Loading

linmagit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linmagit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linmagit left a comment

Choose a reason for hiding this comment

yetiancn commented Aug 17, 2018 •

edited

Loading

coveralls commented Aug 17, 2018 •

edited

Loading