-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fixes in adding columns #9091
Bug fixes in adding columns #9091
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few comments and suggestions:
Regarding item 1 of the issue:
- It is solved, but I suspect your current solution may cause regressions elsewhere. I think the issue relates to:
- Line 685-694 of the
data_object_R6.R
file. I'm not sure when line 685 was added/modified and why the check (length(col_name) != 1)
is relevant. Ideally, the check in line 685 should stop any further execution of the function. - The script produced by the dialog:
Notice thatX <- data_book$get_columns_from_data(data_name="flags_metadata", col_names="X", use_current_filter=FALSE) wind_cols <- stringr::str_split_fixed(string=X, pattern=stringr::coll(pattern=","), n=5) data_book$add_columns_to_data(data_name="flags_metadata", col_name="wind_cols", col_data=wind_cols, before=FALSE, adjacent_column="X") rm(list=c("wind_cols", "X"))
wind_cols
will contain multiple columns, yet the dialog sets thecol_name
parameter. I think the correct solution would be to make the dialog set theuse_col_name_as_prefix
parameter to true and use the passedcol_name
as the prefix of all the newly generated column names. Ifuse_col_name_as_prefix
is set to true, you can make changes at line 685 that will make the length ofcol_name
equal to the value ofnum_cols
, which means you won't need to make inconsistent changes as currently done in line 715. - Line 685-694 of the
Regarding item 2 of the issue:
- I checked with the new Climsoft import dialog in PR Changes Import from Climsoft Functionalities #9070 but still got the error. Note that I explained where I think the bug is in this comment. In the PR, I have fixed it by not using the data book R functions for factor conversion, so if you want to test my original issue, use my branch named
climsoft_updated_dialog_metadata_bug
.
@@ -712,7 +712,7 @@ DataSheet$set("public", "add_columns_to_data", function(col_name = "", col_data, | |||
if(require_correct_length) stop("Length of new column must be divisible by the length of the data frame") | |||
else curr_col <- rep(curr_col, length.out = self$get_data_frame_length()) | |||
} | |||
if(use_col_name_as_prefix) curr_col_name = self$get_next_default_column_name(col_name[i]) | |||
if(use_col_name_as_prefix) curr_col_name = self$get_next_default_column_name(col_name) | |||
else curr_col_name = col_name[[i]] | |||
curr_col_name <- make.names(iconv(curr_col_name, to = "ASCII//TRANSLIT", sub = ".")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@N-thony kindly explain to me why you think this solves the problem. My understanding is, the col_name
will always have the length of num_cols
and also notice line 716 still uses col_name[[i]]
because of this assumption. Shouldn't the change apply to line 716 as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure and I can't remember why this change was made. If the problem has been fixed, we can close this PR. I did a comparison of the code with one of my old branch and realized the change where I did revert it. As you said, if it is fixed then this PR is not needed anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@N-thony kindly explain to me why you think this solves the problem. My understanding is, the
col_name
will always have the length ofnum_cols
and also notice line 716 still usescol_name[[i]]
because of this assumption. Shouldn't the change apply to line 716 as well?
This is done now
@rdstern I will close this PR since the problem has been fixed already. |
Note that closing this PR means the first item is not yet fixed. Regarding the second item in the issue, it is not specific to the Climsoft import but applies generally to any similar operations. I hope you intend to fix that in another PR as well? I'm accepting to close this PR and will reopen the issue issue it attempts to solve. On a separate note, I made a few comments above. They are quite critical and generally apply to other dialogs with similar operations. It's essential that you consider the items I have outlined when attempting to fix the issue. Thanks. |
@N-thony and @Patowhiz can anything be merged yet? It would be good to include this in the Friday version, ready for @jkmusyoka checks in the Malawi workshop next week. |
@rdstern Considering my review comments above, which explain in detail where the problem might be and also point out the potential regression that may arise due to the solution provided here, from my side, this is not ready to merge. I'm happy for @lilyclements to take over, review, and merge this one. The R function has gone through several modifications, and I may be missing something that @N-thony is already familiar with, which might render my review irrelevant. |
@lloyddewit can you review this when possible? |
@N-thony Thanks, but I don't think I have anything to add. I suggest that one of the normal approvers does the review/approval. |
@lloyddewit sorry, I mean @lilyclements. |
@lilyclements I believe the issue has been resolved. The problem occurred because we were using the default prefix when attempting to add multiple columns, which was incorrect in this context. We should use the default prefix only when adding multiple columns based on a single provided column name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@N-thony this seems fine now on the few checks I made. I checked an example on splu=it and also ranks on the row statistics. Both produce multiple variables.
I am also really happy that:
a) There seem fewer regression bugs
b) It is other than me finding them. This was an important @Patowhiz discovery and @rachelkg has also been making suggestions recently. It will be good if (at least the) rest of the development team become more proficient users/testers too.
@Patowhiz can you approve and merge please
@rdstern @Patowhiz I'm aiming for the test release at the end of today. Can we get this merged? |
@rdstern my comments and questions above are yet to be responded to by @N-thony . I suspect we may regress other parts if I merge this without knowing the answers to my questions above. In my above comments, I've also given an alternative solution that involves solving the problem at the dialog level (this has far less risk). I'm happy for you and @lilyclements or someone else to review and merge this. |
@N-thony are you reasonably confident that your solution is ok and will not cause the regressions @Patowhiz is concerned about? If so, then I suggest we ask @ChrisMarsh82 to check and merge? It worked nicely for the examples I looked at - which is seems were a problem before. So I am keen to progress. |
@rdstern @Patowhiz if the code looks okay, another way to make sure this works well is through tests. It will be good to test this as much as we want to be sure before merging. |
@N-thony I am unhappy delaying the merging of this bug-fix. I have tested the Prepare > Text > Split with data from tidyr (world populations) and it works fine with your fix. I also checked the only other examples I could think of, where multiple variables are produced and added to an existing data frame. They are Prepare > Numeric > Random samples. That doesn't work on the merged version and seems fine with your fix. And the Prepare > Numeric > Row Statistics (with ranks) is the only other example I can think of for now. That also works fine with your fix. So I find it curious that the 2 best programmers won't take responsibility in this case. I would like this to be merged in the forthcoming version, because it is the only known (obvious) bugs. Assuming @Patowhiz is still not prepared to merge, then please @ChrisMarsh82 could you do so. Whether you check the code or just merge I don't care any more - it is time this fix was included. |
Dismissing this review to facilitate quick merging of this PR so that the dialogs that depend on it can be functional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this to facilitate further testing and new release that needs the bug fixed.
Fixes #9086
@rdstern @Patowhiz I don't remember the reason that code has been changed but this seems now to be fixed. It will be good to test this PR seriously and see if there is no regression. I did some test and all seems to work fine. Over to you.