Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summary of GSOC calls #2

Open
tdhock opened this issue Jun 10, 2021 · 11 comments
Open

summary of GSOC calls #2

tdhock opened this issue Jun 10, 2021 · 11 comments

Comments

@tdhock
Copy link

tdhock commented Jun 10, 2021

  • template vs subclass for generic loss functions
  • generation of docs via roxygen2
  • TODOS for next week: investigate if current approach is a good idea with binary segmentation + change in mean loss.
  • set.seed(1) to control random numbers (if you use random numbers for input data)
diego-urgell added a commit that referenced this issue Jun 17, 2021
…asses instead of templates #2. Tested with a dummy algo
@tdhock
Copy link
Author

tdhock commented Jun 18, 2021

@diego-urgell during the next call please try to take notes and write them as a new comment in this issue.
yesterday we discussed the new registration in C++ code, and next steps implementing binseg for normal change in mean.

@tdhock
Copy link
Author

tdhock commented Jun 24, 2021

  • implement binary segmentation algorithm, change in mean and constant variance
  • more challenging than anticipated. run-time off by one errors.
  • summary statistics parent class for cumsum and cumsumsquared
  • figure out a way to get summary statistics as a static member of distributions? it can be determined at compile time (does not change at run time)
  • compilation warnings remove compilation warnings #4
  • add 1 to indices in RcppInterface.cpp before returning to R.
  • error handling, number of changepoints too large? invalid distribution or algorithm?
  • use tests/testthat/test-*.R files if you want to use devtools::test()
  • goals for next week, more test cases for bin seg change in mean. loss and segment mean values. different cost functions. multiple parameter return, change NumericVector before_mean,after_mean to NumericMatrix before_params,after_params
  • reading: good intro to ML / stats https://web.stanford.edu/~hastie/ElemStatLearn/

@diego-urgell
Copy link
Owner

Review:

  • Implemented cost function for Normal change in variance and Normal change in mean and variance
  • Added minSegLen constraint and a mechanism to avoid zero variance segmentation
  • Implemented a mechanism to dynamically create a numeric matrix that contains the necessary parameters, by creating virtual methods and overloading them on each distribution
  • Tested the current functions using changepoint

Next Week:

  • Create more tests with the three existing distributions (Normal). Use the changepoint package for now.
  • Decide and implement a way to determine the cost of each segment from the R interface.
  • Set up codecov along with covr for code coverage analysis.
  • Implement more distributions and start testing them.
  • GSOC mid-term evaluations.

@diego-urgell
Copy link
Owner

diego-urgell commented Jul 22, 2021

TODOS for next week:

  • Correct implementation of Poisson cost function by using MLE for the rate parameter.
  • Implement the cost function for the Exponential distribution
  • Ask @tdhock about the differences in Negative Binomial cost function and rate parameter estimation between BinSeg and gfpop
  • Implement the structure of the S4 class for the user interface
  • Set up the coef method to build segmentation models of a particular number of change points

@diego-urgell
Copy link
Owner

diego-urgell commented Aug 2, 2021

TODOS for this week:

  • Implement coef method
  • Implement graph method
  • Implement resid method
  • Finish the user-facing methods and the BinSeg class
  • Correct the poisson and exponential cost functions (reduce computing cost)
  • Include the missing parts of the log likelihood equation in R code. Achieve -2*logLikfor every distribution.
  • Start writing the user documentation.

@rkillick
Copy link

rkillick commented Aug 3, 2021

Looks like a challenge for this week. Just a quick note that there is no "graph" generic, it is plot.

@diego-urgell
Copy link
Owner

diego-urgell commented Aug 3, 2021

@rkillick It is definitely a challenge! I worked a lot on the R interface and it is almost complete, in PR #9

After this I will write documentation for the functions :)

@diego-urgell
Copy link
Owner

diego-urgell commented Aug 6, 2021

TODOS for this week

  • Finish the documentation
  • Correct the poisson and exponential cost functions (reduce computing cost)
  • Include the missing parts of the log likelihood equation in R code. Achieve -2*logLikfor every distribution.
  • Make sure the package passes all CRAN checks
  • Place static expected result on tests
  • Look at Wild Binary Segmentation paper. Start thinking/planning about the implementation

@diego-urgell
Copy link
Owner

diego-urgell commented Aug 13, 2021

TODOS for this week:

  • Correct the poisson and exponential cost functions (reduce computing cost)
  • Include the missing parts of the log likelihood equation in R code. Achieve -2*logLik for every distribution except negbin.
  • Add tests for the R interface methods
  • Check the resid function
  • Write a Readme for the Github repository
  • Set contributions in Description File
  • Setup release for final submission

@diego-urgell
Copy link
Owner

Hi @rkillick and @tdhock! I finished everything for the final submission! I created a README that has information about the package, and a report on GSOC 2021 at the end. Does it look good?

@tdhock
Copy link
Author

tdhock commented Aug 23, 2021

yes good job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants