Skip to content

Conversation

@karenfeng
Copy link
Collaborator

What changes are proposed in this pull request?

Glue function docs:

  • subset_struct
  • add_struct_fields
  • expand_struct
  • array_to_dense_vector
  • array_to_sparse_vector
  • vector_to_array
  • explode_matrix
  • genotype_states
  • hard_calls

How is this patch tested?

  • Unit tests
  • Integration tests
  • Manual tests
cd docs
make html
open build/html/index.html

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
karenfeng added a commit to karenfeng/glow that referenced this pull request Oct 15, 2019
# This is the 1st commit message:

WIP

# This is the commit message projectglow#2:

Get jar working

Don't use Kryo serializer

Don't parallelize un-serializable Hadoop FileStatus

Change descrip

WIP

Whoops

bintray

Not local

Quiet logs

Remove tmp file

Actually rename bintray

Setting version to 0.1.0

WIP

WIP

License fixup

Resolver

WIP

Change version

Setting version to 0.1.1

WIP

Setting version to 0.1.2

Setting version to 0.1.3-SNAPSHOT

WIP

Setting version to 0.1.2

Setting version to 0.1.3-SNAPSHOT

Exclude many GATK deps

Setting version to 0.1.3

Setting version to 0.1.4-SNAPSHOT

Setting version to 0.1.4

Setting version to 0.1.5-SNAPSHOT

Whoops

Setting version to 0.1.3

Setting version to 0.1.4-SNAPSHOT

Setting version to 0.1.4

Setting version to 0.1.5-SNAPSHOT

Setting version to 0.1.6

Setting version to 0.1.7-SNAPSHOT

Yay deps

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

Setting version to 0.1.1

Setting version to 0.1.2-SNAPSHOT

Setting version to 0.1.10

Setting version to 0.1.11-SNAPSHOT

Setting version to 0.1.15

Setting version to 0.1.16-SNAPSHOT

Setting version to 0.1.9

Setting version to 0.1.10-SNAPSHOT

WIP

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Add tests back

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Setting version to 0.1.13

Setting version to 0.1.14-SNAPSHOT

WIP

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

WIP

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

WIP

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

Setting version to 0.1.11

Setting version to 0.1.12-SNAPSHOT

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Exclude findbugs

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

WIP

Cleanup

# This is the commit message projectglow#3:

Rename org

# This is the commit message projectglow#4:

Rename env

# This is the commit message projectglow#5:

Setting version to 0.1.0

# This is the commit message projectglow#6:

Setting version to 0.1.1-SNAPSHOT

# This is the commit message projectglow#7:

Rename

# This is the commit message projectglow#8:

Work on test.pypi

# This is the commit message projectglow#9:

Fix VCFFileWriterSuite (projectglow#63)


# This is the commit message projectglow#10:

Remove SpecificInternalRow buffer in RowConverter (projectglow#65)

* Remove SpecificInternalRow buffer in RowConverter

* comment

# This is the commit message projectglow#11:

Update CircleCI badge
# This is the commit message projectglow#12:

Move build/test from README to wiki

# This is the commit message projectglow#13:

More cleanup

# This is the commit message projectglow#14:

Newline

# This is the commit message projectglow#15:

address comments

# This is the commit message projectglow#16:

Circleci fixups

# This is the commit message projectglow#17:

Un-exclude netlib from gatk

# This is the commit message projectglow#18:

CircleCI indents

# This is the commit message projectglow#19:

Change bintray org

# This is the commit message projectglow#20:

Setting version to 0.1.0

# This is the commit message projectglow#21:

Bintray repo

# This is the commit message projectglow#22:

Move bintrayrepo

# This is the commit message projectglow#23:

Setting version to 0.1.1-SNAPSHOT
@kianfar77
Copy link
Collaborator

Looks good. Please merge and I will transfer into my changes and do some minor edits.

@kianfar77 kianfar77 self-requested a review October 15, 2019 14:58
@karenfeng karenfeng merged commit 0f07636 into projectglow:master Oct 15, 2019
@karenfeng karenfeng deleted the hls-353 branch October 15, 2019 15:32
kianfar77 added a commit that referenced this pull request Oct 16, 2019
* Fix Glow tests (#6)

* oops

* fix pipe transformer cleanup

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Update log4j.properties

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* livehtml

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* fix

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* more docs

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* more docs

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Add license header (#9)

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* add CONTIBURING.md (#10)

Signed-off-by: Henry D <henrydavidge@gmail.com>
Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* notebook addresses

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* glow to Glow

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* more docs

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* sidebar width

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* utility

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* [HLS-353] Add utility function docs (#12)

* Add glue fns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* comments addressed

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>
henrydavidge pushed a commit to henrydavidge/glow that referenced this pull request Jun 22, 2020
* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* simplify tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* index map compat

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add more tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* pass args as ints

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't roll our own splitter

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename sample_index to sample_blocks

Signed-off-by: Karen Feng <karen.feng@databricks.com>
henrydavidge added a commit to henrydavidge/glow that referenced this pull request Jun 22, 2020
Signed-off-by: Henry Davidge <hhd@databricks.com>
henrydavidge pushed a commit to henrydavidge/glow that referenced this pull request Jun 22, 2020
* Add glue fns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Signed-off-by: Henry Davidge <hhd@databricks.com>
henrydavidge pushed a commit to henrydavidge/glow that referenced this pull request Jun 22, 2020
* Fix Glow tests (projectglow#6)

* oops

* fix pipe transformer cleanup

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Update log4j.properties

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* livehtml

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* fix

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* more docs

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* more docs

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Add license header (projectglow#9)

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* add CONTIBURING.md (projectglow#10)

Signed-off-by: Henry D <henrydavidge@gmail.com>
Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* notebook addresses

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* glow to Glow

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* more docs

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* sidebar width

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* utility

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* [HLS-353] Add utility function docs (projectglow#12)

* Add glue fns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* comments addressed

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

Signed-off-by: Henry Davidge <hhd@databricks.com>
henrydavidge pushed a commit to henrydavidge/glow that referenced this pull request Jun 22, 2020
* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* simplify tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* index map compat

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add more tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* pass args as ints

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't roll our own splitter

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename sample_index to sample_blocks

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Henry Davidge <hhd@databricks.com>
henrydavidge added a commit that referenced this pull request Jun 22, 2020
* Add Leland's demo notebook

* block_variants_and_samples Transformer to create genotype DataFrame for WGR (#2)

* blocks

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test vcf

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* transformer

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* refactor and conform with ridge namings

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test files

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra file

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* sort_key

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* feat: ridge models for wgr added (#1)

* feat: ridge models for wgr added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Doc strings added for levels/functions.py
Some typos fixed in ridge_model.py
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* ridge_model and RidgeReducer unit tests added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* RidgeRegression unit tests added
test data README added
ridge_udfs.py docstrings added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Changes made to accessing the sample ID map and more docstrings

The map_normal_eqn and score_models functions previously expected the
sample IDs for a given sample block to be found in the Pandas DataFrame,
which mean we had to join them on before the .groupBy().apply().  These
functions now expect the sample block to sample IDs mapping to be
provided separately as a dict, so that the join is no longer required.
RidgeReducer and RidgeRegression APIs remain unchanged.

docstrings have been added for RidgeReducer and RidgeRegression classes.

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Refactored object names and comments to reflect new terminology

Where 'block' was previously used to refer to the set of columns in a
block, we now use 'header_block'
Where 'group' was previously used to refer to the set of samples in a
block, we now use 'sample_block'

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* [HLS-539] Fix compatibility between blocked GT transformer and WGR (#6)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* PyArrow 0.15.1 only with PySpark 3

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't use toPandas()

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Upgrade pyarrow

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Only register once

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Minimize memory usage

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Select before head

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* set up/tear down

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try limiting pyspark memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* No teardown

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Extend timeout

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Simplify ordering logic in levels code (#7)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* start changing for readability

* use input label ordering

* rename create_row_indexer

* undo column sort

* change reduce

Signed-off-by: Henry D <henrydavidge@gmail.com>

* further simplify reduce

* sorted alpha names

* remove ordering

* comments

Signed-off-by: Henry D <henrydavidge@gmail.com>

* Set arrow env var in build

Signed-off-by: Henry D <henrydavidge@gmail.com>

* faster sort

* add test file

* undo test data change

* >=

* formatting

* empty

Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Limit Spark memory conf in tests (#9)

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf transform

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Set driver memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try changing spark mem

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* match java tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* remove driver memory flag

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Improve partitioning in block_variants_and_samples transformer (#11)

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Remove unnecessary header_block grouping (#10)

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Create sample ID blocking helper functions (#12)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* simplify tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* index map compat

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add more tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* pass args as ints

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't roll our own splitter

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename sample_index to sample_blocks

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add type-checking to WGR APIs (#14)

* Add type-checking to APIs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check valid alphas

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* check 0 sig

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add to install_requires list

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add covariate support (#13)

* Added necessary modifications to accomodate covariates in model fitting.

The initial formulation of the WGR model assumed a form y ~ Xb, however in general we would like to use a model of the form y ~ Ca + Xb, where C is some matrix of covariates that are separate from the genomic features X.  This PR makes numerous changes to accomodate covariate matrix C.

Adding covariates required the following breaking changes to the APIs:
 * indexdf is now a required argument for RidgeReducer.transform() and RidgeRegression.transform():
   * RidgeReducer.transform(blockdf, labeldf, modeldf) -> RidgeReducer.transform(blockdf, labeldf, indexdf, modeldf)
   * RidgeRegression.transform(blockdf, labeldf, model, cvdf) -> RidgeRegression.transform(blockdf, labeldf, indexdf, model, cvdf)

Additionally, the function signatures for the fit and transform methods of RidgeReducer and RidgeRegression have all been updated to accomodate an optional covariate DataFrame as the final argument.

Two new tests have been added to test_ridge_regression.py to test run modes with covariates:
 * test_ridge_reducer_transform_with_cov
 * test_two_level_regression_with_cov

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Cleaned up one unnecessary Pandas import
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Small changes for clarity and consistence with the rest of the code.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Forgot one usage of coalesce
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Added a couple of comments to explain logic and replaced usages of .values with .array
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Fixed one instance of the change .values -> .array where it was made in error.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Typo in test_ridge_regression.py.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Style auto-updates with yapfAll
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Flatten estimated phenotypes (#15)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Order to match labeldf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check we tie-break

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test var name

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add fit_transform function to models (#17)

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Rename levels (#20)

* Rename levels to wgr

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename test files

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add license headers (#21)

* headers

* executable

* fix template rendering

* yapf

* add header to template

* add header to template

Signed-off-by: Henry D <henrydavidge@gmail.com>

Co-authored-by: Kiavash Kianfar <kiavash.kianfar@databricks.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>
Co-authored-by: Leland <leland.barnard@gmail.com>
Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
henrydavidge added a commit that referenced this pull request Jun 22, 2020
* Add Leland's demo notebook

* block_variants_and_samples Transformer to create genotype DataFrame for WGR (#2)

* blocks

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test vcf

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* transformer

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* refactor and conform with ridge namings

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test files

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra file

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* sort_key

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* feat: ridge models for wgr added (#1)

* feat: ridge models for wgr added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Doc strings added for levels/functions.py
Some typos fixed in ridge_model.py
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* ridge_model and RidgeReducer unit tests added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* RidgeRegression unit tests added
test data README added
ridge_udfs.py docstrings added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Changes made to accessing the sample ID map and more docstrings

The map_normal_eqn and score_models functions previously expected the
sample IDs for a given sample block to be found in the Pandas DataFrame,
which mean we had to join them on before the .groupBy().apply().  These
functions now expect the sample block to sample IDs mapping to be
provided separately as a dict, so that the join is no longer required.
RidgeReducer and RidgeRegression APIs remain unchanged.

docstrings have been added for RidgeReducer and RidgeRegression classes.

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Refactored object names and comments to reflect new terminology

Where 'block' was previously used to refer to the set of columns in a
block, we now use 'header_block'
Where 'group' was previously used to refer to the set of samples in a
block, we now use 'sample_block'

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* [HLS-539] Fix compatibility between blocked GT transformer and WGR (#6)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* PyArrow 0.15.1 only with PySpark 3

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't use toPandas()

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Upgrade pyarrow

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Only register once

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Minimize memory usage

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Select before head

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* set up/tear down

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try limiting pyspark memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* No teardown

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Extend timeout

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Simplify ordering logic in levels code (#7)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* start changing for readability

* use input label ordering

* rename create_row_indexer

* undo column sort

* change reduce

Signed-off-by: Henry D <henrydavidge@gmail.com>

* further simplify reduce

* sorted alpha names

* remove ordering

* comments

Signed-off-by: Henry D <henrydavidge@gmail.com>

* Set arrow env var in build

Signed-off-by: Henry D <henrydavidge@gmail.com>

* faster sort

* add test file

* undo test data change

* >=

* formatting

* empty

Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Limit Spark memory conf in tests (#9)

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf transform

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Set driver memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try changing spark mem

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* match java tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* remove driver memory flag

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Improve partitioning in block_variants_and_samples transformer (#11)

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Remove unnecessary header_block grouping (#10)

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Create sample ID blocking helper functions (#12)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* simplify tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* index map compat

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add more tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* pass args as ints

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't roll our own splitter

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename sample_index to sample_blocks

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add type-checking to WGR APIs (#14)

* Add type-checking to APIs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check valid alphas

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* check 0 sig

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add to install_requires list

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add covariate support (#13)

* Added necessary modifications to accomodate covariates in model fitting.

The initial formulation of the WGR model assumed a form y ~ Xb, however in general we would like to use a model of the form y ~ Ca + Xb, where C is some matrix of covariates that are separate from the genomic features X.  This PR makes numerous changes to accomodate covariate matrix C.

Adding covariates required the following breaking changes to the APIs:
 * indexdf is now a required argument for RidgeReducer.transform() and RidgeRegression.transform():
   * RidgeReducer.transform(blockdf, labeldf, modeldf) -> RidgeReducer.transform(blockdf, labeldf, indexdf, modeldf)
   * RidgeRegression.transform(blockdf, labeldf, model, cvdf) -> RidgeRegression.transform(blockdf, labeldf, indexdf, model, cvdf)

Additionally, the function signatures for the fit and transform methods of RidgeReducer and RidgeRegression have all been updated to accomodate an optional covariate DataFrame as the final argument.

Two new tests have been added to test_ridge_regression.py to test run modes with covariates:
 * test_ridge_reducer_transform_with_cov
 * test_two_level_regression_with_cov

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Cleaned up one unnecessary Pandas import
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Small changes for clarity and consistence with the rest of the code.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Forgot one usage of coalesce
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Added a couple of comments to explain logic and replaced usages of .values with .array
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Fixed one instance of the change .values -> .array where it was made in error.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Typo in test_ridge_regression.py.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Style auto-updates with yapfAll
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Flatten estimated phenotypes (#15)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Order to match labeldf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check we tie-break

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test var name

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add fit_transform function to models (#17)

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* support alpha inference

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* more test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test fixups

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* sub-sample

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments - only infer alphas during fit

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* exception varies

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Rename levels (#20)

* Rename levels to wgr

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename test files

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Errors vary by Spark version

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add license headers (#21)

* headers

* executable

* fix template rendering

* yapf

Co-authored-by: Kiavash Kianfar <kiavash.kianfar@databricks.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>
Co-authored-by: Leland <leland.barnard@gmail.com>
Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
karenfeng added a commit that referenced this pull request Jun 23, 2020
* Add Leland's demo notebook

* block_variants_and_samples Transformer to create genotype DataFrame for WGR (#2)

* blocks

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test vcf

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* transformer

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* refactor and conform with ridge namings

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test files

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra file

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* sort_key

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* feat: ridge models for wgr added (#1)

* feat: ridge models for wgr added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Doc strings added for levels/functions.py
Some typos fixed in ridge_model.py
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* ridge_model and RidgeReducer unit tests added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* RidgeRegression unit tests added
test data README added
ridge_udfs.py docstrings added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Changes made to accessing the sample ID map and more docstrings

The map_normal_eqn and score_models functions previously expected the
sample IDs for a given sample block to be found in the Pandas DataFrame,
which mean we had to join them on before the .groupBy().apply().  These
functions now expect the sample block to sample IDs mapping to be
provided separately as a dict, so that the join is no longer required.
RidgeReducer and RidgeRegression APIs remain unchanged.

docstrings have been added for RidgeReducer and RidgeRegression classes.

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Refactored object names and comments to reflect new terminology

Where 'block' was previously used to refer to the set of columns in a
block, we now use 'header_block'
Where 'group' was previously used to refer to the set of samples in a
block, we now use 'sample_block'

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* [HLS-539] Fix compatibility between blocked GT transformer and WGR (#6)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* PyArrow 0.15.1 only with PySpark 3

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't use toPandas()

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Upgrade pyarrow

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Only register once

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Minimize memory usage

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Select before head

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* set up/tear down

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try limiting pyspark memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* No teardown

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Extend timeout

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Simplify ordering logic in levels code (#7)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* start changing for readability

* use input label ordering

* rename create_row_indexer

* undo column sort

* change reduce

Signed-off-by: Henry D <henrydavidge@gmail.com>

* further simplify reduce

* sorted alpha names

* remove ordering

* comments

Signed-off-by: Henry D <henrydavidge@gmail.com>

* Set arrow env var in build

Signed-off-by: Henry D <henrydavidge@gmail.com>

* faster sort

* add test file

* undo test data change

* >=

* formatting

* empty

Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Limit Spark memory conf in tests (#9)

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf transform

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Set driver memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try changing spark mem

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* match java tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* remove driver memory flag

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Improve partitioning in block_variants_and_samples transformer (#11)

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Remove unnecessary header_block grouping (#10)

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Create sample ID blocking helper functions (#12)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* simplify tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* index map compat

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add more tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* pass args as ints

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't roll our own splitter

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename sample_index to sample_blocks

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add type-checking to WGR APIs (#14)

* Add type-checking to APIs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check valid alphas

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* check 0 sig

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add to install_requires list

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add covariate support (#13)

* Added necessary modifications to accomodate covariates in model fitting.

The initial formulation of the WGR model assumed a form y ~ Xb, however in general we would like to use a model of the form y ~ Ca + Xb, where C is some matrix of covariates that are separate from the genomic features X.  This PR makes numerous changes to accomodate covariate matrix C.

Adding covariates required the following breaking changes to the APIs:
 * indexdf is now a required argument for RidgeReducer.transform() and RidgeRegression.transform():
   * RidgeReducer.transform(blockdf, labeldf, modeldf) -> RidgeReducer.transform(blockdf, labeldf, indexdf, modeldf)
   * RidgeRegression.transform(blockdf, labeldf, model, cvdf) -> RidgeRegression.transform(blockdf, labeldf, indexdf, model, cvdf)

Additionally, the function signatures for the fit and transform methods of RidgeReducer and RidgeRegression have all been updated to accomodate an optional covariate DataFrame as the final argument.

Two new tests have been added to test_ridge_regression.py to test run modes with covariates:
 * test_ridge_reducer_transform_with_cov
 * test_two_level_regression_with_cov

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Cleaned up one unnecessary Pandas import
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Small changes for clarity and consistence with the rest of the code.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Forgot one usage of coalesce
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Added a couple of comments to explain logic and replaced usages of .values with .array
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Fixed one instance of the change .values -> .array where it was made in error.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Typo in test_ridge_regression.py.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Style auto-updates with yapfAll
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Flatten estimated phenotypes (#15)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Order to match labeldf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check we tie-break

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test var name

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* remove accidental files

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add fit_transform function to models (#17)

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Rename levels (#20)

* Rename levels to wgr

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename test files

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add license headers (#21)

* headers

* executable

* fix template rendering

* yapf

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* More work

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* More cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Fix docs tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* fix regression fit description

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* fix capitalization

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address some comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* more cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* More cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* add notebook

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* update notebook

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Co-authored-by: Henry D <henrydavidge@gmail.com>
Co-authored-by: Kiavash Kianfar <kiavash.kianfar@databricks.com>
Co-authored-by: Leland <leland.barnard@gmail.com>
Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants