Generate inferred alphas #234

henrydavidge · 2020-06-22T17:58:17Z

What changes are proposed in this pull request?

In the case that the user does not know which alpha values to provide (eg. based on heritability estimates), we should support automatically generating them. These values do not work well in the case that the phenotypes are not on the scale of 1.

How is this patch tested?

Unit tests
Integration tests
Manual tests

Add Leland's demo notebook

…or WGR (projectglow#2) * blocks Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com> * test vcf Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com> * transformer Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com> * remove extra Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com> * refactor and conform with ridge namings Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com> * test Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com> * test files Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com> * remove extra file Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com> * sort_key Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* feat: ridge models for wgr added Signed-off-by: Leland Barnard (leland.barnard@gmail.com) Signed-off-by: Leland Barnard <leland.barnard@regeneron.com> * Doc strings added for levels/functions.py Some typos fixed in ridge_model.py Signed-off-by: Leland Barnard (leland.barnard@gmail.com) Signed-off-by: Leland Barnard <leland.barnard@regeneron.com> * ridge_model and RidgeReducer unit tests added Signed-off-by: Leland Barnard (leland.barnard@gmail.com) Signed-off-by: Leland Barnard <leland.barnard@regeneron.com> * RidgeRegression unit tests added test data README added ridge_udfs.py docstrings added Signed-off-by: Leland Barnard (leland.barnard@gmail.com) Signed-off-by: Leland Barnard <leland.barnard@regeneron.com> * Changes made to accessing the sample ID map and more docstrings The map_normal_eqn and score_models functions previously expected the sample IDs for a given sample block to be found in the Pandas DataFrame, which mean we had to join them on before the .groupBy().apply(). These functions now expect the sample block to sample IDs mapping to be provided separately as a dict, so that the join is no longer required. RidgeReducer and RidgeRegression APIs remain unchanged. docstrings have been added for RidgeReducer and RidgeRegression classes. Signed-off-by: Leland Barnard (leland.barnard@gmail.com) Signed-off-by: Leland Barnard <leland.barnard@regeneron.com> * Refactored object names and comments to reflect new terminology Where 'block' was previously used to refer to the set of columns in a block, we now use 'header_block' Where 'group' was previously used to refer to the set of samples in a block, we now use 'sample_block' Signed-off-by: Leland Barnard (leland.barnard@gmail.com) Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

…rojectglow#6) * WIP Signed-off-by: Karen Feng <karen.feng@databricks.com> * existing tests pass Signed-off-by: Karen Feng <karen.feng@databricks.com> * rename file Signed-off-by: Karen Feng <karen.feng@databricks.com> * Add compat test Signed-off-by: Karen Feng <karen.feng@databricks.com> * scalafmt Signed-off-by: Karen Feng <karen.feng@databricks.com> * collect minimal columns Signed-off-by: Karen Feng <karen.feng@databricks.com> * address comments Signed-off-by: Karen Feng <karen.feng@databricks.com> * Test fixup Signed-off-by: Karen Feng <karen.feng@databricks.com> * Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching Signed-off-by: Karen Feng <karen.feng@databricks.com> * PyArrow 0.15.1 only with PySpark 3 Signed-off-by: Karen Feng <karen.feng@databricks.com> * Don't use toPandas() Signed-off-by: Karen Feng <karen.feng@databricks.com> * Upgrade pyarrow Signed-off-by: Karen Feng <karen.feng@databricks.com> * Only register once Signed-off-by: Karen Feng <karen.feng@databricks.com> * Minimize memory usage Signed-off-by: Karen Feng <karen.feng@databricks.com> * Select before head Signed-off-by: Karen Feng <karen.feng@databricks.com> * set up/tear down Signed-off-by: Karen Feng <karen.feng@databricks.com> * Try limiting pyspark memory Signed-off-by: Karen Feng <karen.feng@databricks.com> * No teardown Signed-off-by: Karen Feng <karen.feng@databricks.com> * Extend timeout Signed-off-by: Karen Feng <karen.feng@databricks.com>