Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why are mid/invalidates*/cost in param_mat? #20

Open
tdhock opened this issue Aug 24, 2021 · 7 comments
Open

why are mid/invalidates*/cost in param_mat? #20

tdhock opened this issue Aug 24, 2021 · 7 comments

Comments

@tdhock
Copy link

tdhock commented Aug 24, 2021

hi @diego-urgell in Algorithms.cpp I see

             this -> param_mat[i] = i + 1;
             this -> param_mat[sep + i] = optCpt -> mid + 1;
             this -> param_mat[sep * 2 + i] = optCpt -> invalidatesIndex + 1;
             this -> param_mat[sep * 3 + i] = optCpt -> invalidatesAfter;
             this -> param_mat[sep * 4 + i] = param_mat[sep * 4 + i - 1] - optCpt -> bestDecrease;

why are all of these things being stored in the same param_mat data structure? It results in lots of magic numbers (2, 3, 4) which are potentially confusing, especially some of the values are different types (for example mid is int but bestDecrease is double). To clarify this code, can these things be stored in separate members? then we could use names rather than numbers to reference them. (for example the name invalidatesIndex would be much easier to understand than the index 2)

@tdhock
Copy link
Author

tdhock commented Aug 24, 2021

see https://github.com/tdhock/binsegRcpp/blob/master/src/interface.cpp for an alternative that uses names instead of numeric indices, and Rcpp::List for return value of the interface function. (Rcpp::DataFrame could be used as well)

@tdhock
Copy link
Author

tdhock commented Aug 24, 2021

btw for reading about why to avoid the "magic numbers" anti-pattern see https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants

@diego-urgell
Copy link
Owner

I agree that it may be confusing. I definitely have to add some comments to the code so that it is clear why I did his.

At the beginning I though of using Rcpp::List. However, this requires to define vectors for every slot that will be added to the list, and then pass references of the vector to the C++ code. This is unfeasible here because I don't know how many slots I will return to C++ (since different distributions have a different number of parameters changing). Therefore, I decided to use Rcpp::NumericMatrix because I can store the data for several values (bestDecrease, invalidatesIndex, invalidatesAfter, mean, ...) while only defining a single value in the code (param_mat).

This was basically an approach to make the returned number of parameters dynamic. I am happy to look at other suggestions on how to solve this problem. But I think using a List could no be the most convenient option for this use case. What do you think?

@tdhock
Copy link
Author

tdhock commented Aug 25, 2021

you are right that "different distributions have a different number of parameters changing." so I think the param_mat should contain only those segment-specific real-valued model parameters (mean and/or variance) and everything else should be a separate arguments/variables/members with informative names and potentially different types (for example invalidates* should be int).

@diego-urgell
Copy link
Owner

Right, the other "general" parameters could be separate vectors and then I can add them to the return list, along with the matrix that contains the variable parameter estimations for each distribution.

This is a good option, but it is not a "quick fix" since I have to change not only the C++ code, but also the R interface, mainly BinSegModel and some BinSeg class methods. Would you mind if I do this on a few weeks? Right now I am very busy at university since next week I have a couple of crucial deliveries and presentations. I will get to this as soon as I finish with that, is it okay?

@tdhock
Copy link
Author

tdhock commented Aug 27, 2021

sure, no rush. I'm just trying to help you make your code easier to understand/maintain using software engineering best practices.

@diego-urgell
Copy link
Owner

Thanks @tdhock 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants