off learning vs reinforcment learning never judge performance from trainign data Why square the sum of Sum(yi-ti)^2 Classifier and test set precision confusion matrix ranking classifier testing too many time the test set Imputation Standardization vs Normalization Non linear regression model La place smoothing maximum margin objective L1, L2 product sparsity-enforcing regulizer convulational layer padding stride kernel size output channels mode collapse generator network generator network flatten out latent space sampling steps cannot backpropagate variational autoencoder discriminator regression tree bogging reduce variance vs boosting reduce variance ensembling differential networks Markov assumption gradient discent in reinforcment learning pruning derivation EM algorithm Never judge a model from training data withhold some data You need to split the data set