Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes in adding columns #9091

Merged
merged 3 commits into from
Sep 21, 2024

Conversation

N-thony
Copy link
Collaborator

@N-thony N-thony commented Jul 31, 2024

Fixes #9086
@rdstern @Patowhiz I don't remember the reason that code has been changed but this seems now to be fixed. It will be good to test this PR seriously and see if there is no regression. I did some test and all seems to work fine. Over to you.

@N-thony
Copy link
Collaborator Author

N-thony commented Aug 4, 2024

Fixes #9086 @rdstern @Patowhiz I don't remember the reason that code has been changed but this seems now to be fixed. It will be good to test this PR seriously and see if there is no regression. I did some test and all seems to work fine. Over to you.

@rdstern is everything okay after testing this?

@rdstern
Copy link
Collaborator

rdstern commented Aug 4, 2024

Thanks, @N-thony I'll check soon, but am assuming @Patowhiz will check first, as it resolves a problem he found. So easy for him to check that it now, at least, solved the original problem he found.

Copy link
Contributor

@Patowhiz Patowhiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-thony

Here are a few comments and suggestions:

Regarding item 1 of the issue:

  • It is solved, but I suspect your current solution may cause regressions elsewhere. I think the issue relates to:
    1. Line 685-694 of the data_object_R6.R file. I'm not sure when line 685 was added/modified and why the check (length(col_name) != 1) is relevant. Ideally, the check in line 685 should stop any further execution of the function.
    2. The script produced by the dialog:
    X <- data_book$get_columns_from_data(data_name="flags_metadata", col_names="X", use_current_filter=FALSE)
    wind_cols <- stringr::str_split_fixed(string=X, pattern=stringr::coll(pattern=","), n=5)
    data_book$add_columns_to_data(data_name="flags_metadata", col_name="wind_cols", col_data=wind_cols, before=FALSE, adjacent_column="X")
    rm(list=c("wind_cols", "X")) 
    
    Notice that wind_cols will contain multiple columns, yet the dialog sets the col_name parameter. I think the correct solution would be to make the dialog set the use_col_name_as_prefix parameter to true and use the passed col_name as the prefix of all the newly generated column names. If use_col_name_as_prefix is set to true, you can make changes at line 685 that will make the length of col_name equal to the value of num_cols, which means you won't need to make inconsistent changes as currently done in line 715.

Regarding item 2 of the issue:

  • I checked with the new Climsoft import dialog in PR Changes Import from Climsoft Functionalities #9070 but still got the error. Note that I explained where I think the bug is in this comment. In the PR, I have fixed it by not using the data book R functions for factor conversion, so if you want to test my original issue, use my branch named climsoft_updated_dialog_metadata_bug.

@@ -712,7 +712,7 @@ DataSheet$set("public", "add_columns_to_data", function(col_name = "", col_data,
if(require_correct_length) stop("Length of new column must be divisible by the length of the data frame")
else curr_col <- rep(curr_col, length.out = self$get_data_frame_length())
}
if(use_col_name_as_prefix) curr_col_name = self$get_next_default_column_name(col_name[i])
if(use_col_name_as_prefix) curr_col_name = self$get_next_default_column_name(col_name)
else curr_col_name = col_name[[i]]
curr_col_name <- make.names(iconv(curr_col_name, to = "ASCII//TRANSLIT", sub = "."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-thony kindly explain to me why you think this solves the problem. My understanding is, the col_name will always have the length of num_cols and also notice line 716 still uses col_name[[i]] because of this assumption. Shouldn't the change apply to line 716 as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure and I can't remember why this change was made. If the problem has been fixed, we can close this PR. I did a comparison of the code with one of my old branch and realized the change where I did revert it. As you said, if it is fixed then this PR is not needed anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-thony kindly explain to me why you think this solves the problem. My understanding is, the col_name will always have the length of num_cols and also notice line 716 still uses col_name[[i]] because of this assumption. Shouldn't the change apply to line 716 as well?

This is done now

@N-thony
Copy link
Collaborator Author

N-thony commented Aug 6, 2024

@rdstern I will close this PR since the problem has been fixed already.

@N-thony N-thony closed this Aug 6, 2024
@Patowhiz Patowhiz reopened this Aug 6, 2024
@Patowhiz
Copy link
Contributor

Patowhiz commented Aug 6, 2024

@N-thony

Note that closing this PR means the first item is not yet fixed.

Regarding the second item in the issue, it is not specific to the Climsoft import but applies generally to any similar operations. I hope you intend to fix that in another PR as well?

I'm accepting to close this PR and will reopen the issue issue it attempts to solve.

On a separate note, I made a few comments above. They are quite critical and generally apply to other dialogs with similar operations. It's essential that you consider the items I have outlined when attempting to fix the issue.

Thanks.

@Patowhiz Patowhiz closed this Aug 6, 2024
@Patowhiz Patowhiz mentioned this pull request Aug 6, 2024
@N-thony N-thony reopened this Aug 6, 2024
@rdstern
Copy link
Collaborator

rdstern commented Aug 7, 2024

@N-thony and @Patowhiz can anything be merged yet? It would be good to include this in the Friday version, ready for @jkmusyoka checks in the Malawi workshop next week.

@Patowhiz
Copy link
Contributor

Patowhiz commented Aug 7, 2024

@rdstern Considering my review comments above, which explain in detail where the problem might be and also point out the potential regression that may arise due to the solution provided here, from my side, this is not ready to merge.

I'm happy for @lilyclements to take over, review, and merge this one. The R function has gone through several modifications, and I may be missing something that @N-thony is already familiar with, which might render my review irrelevant.

@N-thony
Copy link
Collaborator Author

N-thony commented Aug 14, 2024

@rdstern Considering my review comments above, which explain in detail where the problem might be and also point out the potential regression that may arise due to the solution provided here, from my side, this is not ready to merge.

I'm happy for @lilyclements to take over, review, and merge this one. The R function has gone through several modifications, and I may be missing something that @N-thony is already familiar with, which might render my review irrelevant.

@lloyddewit can you review this when possible?

@lloyddewit
Copy link
Contributor

@lloyddewit can you review this when possible?

@N-thony Thanks, but I don't think I have anything to add. I suggest that one of the normal approvers does the review/approval.

@N-thony
Copy link
Collaborator Author

N-thony commented Aug 15, 2024

@lloyddewit can you review this when possible?

@N-thony Thanks, but I don't think I have anything to add. I suggest that one of the normal approvers does the review/approval.

@lloyddewit sorry, I mean @lilyclements.

@lilyclements
Copy link
Contributor

Looking at the code history, this [[i]] was added in by "Bug fix by pasting multiple columns", which relates to PR #8954 on issue #8815

If we want to merge this, I suggest we reopen #8954 with an explanation as to why it's reopen and a link to this PR.

@N-thony
Copy link
Collaborator Author

N-thony commented Aug 17, 2024

Looking at the code history, this [[i]] was added in by "Bug fix by pasting multiple columns", which relates to PR #8954 on issue #8815

If we want to merge this, I suggest we reopen #8954 with an explanation as to why it's reopen and a link to this PR.

@lilyclements I believe the issue has been resolved. The problem occurred because we were using the default prefix when attempting to add multiple columns, which was incorrect in this context. We should use the default prefix only when adding multiple columns based on a single provided column name.

@N-thony
Copy link
Collaborator Author

N-thony commented Aug 17, 2024

@rdstern @Patowhiz you can have a look now, I believe now we will not have any problem from climsoft too even tho it has been fixed other way.

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-thony this seems fine now on the few checks I made. I checked an example on splu=it and also ranks on the row statistics. Both produce multiple variables.

I am also really happy that:
a) There seem fewer regression bugs
b) It is other than me finding them. This was an important @Patowhiz discovery and @rachelkg has also been making suggestions recently. It will be good if (at least the) rest of the development team become more proficient users/testers too.
@Patowhiz can you approve and merge please

@N-thony
Copy link
Collaborator Author

N-thony commented Sep 20, 2024

@N-thony this seems fine now on the few checks I made. I checked an example on splu=it and also ranks on the row statistics. Both produce multiple variables.

I am also really happy that: a) There seem fewer regression bugs b) It is other than me finding them. This was an important @Patowhiz discovery and @rachelkg has also been making suggestions recently. It will be good if (at least the) rest of the development team become more proficient users/testers too. @Patowhiz can you approve and merge please

@rdstern @Patowhiz I'm aiming for the test release at the end of today. Can we get this merged?

@rdstern
Copy link
Collaborator

rdstern commented Sep 21, 2024

@N-thony I have approved. @Patowhiz back to you. I agree it would be great to merge this one in the forthcoming release

@Patowhiz
Copy link
Contributor

@rdstern my comments and questions above are yet to be responded to by @N-thony . I suspect we may regress other parts if I merge this without knowing the answers to my questions above. In my above comments, I've also given an alternative solution that involves solving the problem at the dialog level (this has far less risk).

I'm happy for you and @lilyclements or someone else to review and merge this.

@rdstern
Copy link
Collaborator

rdstern commented Sep 21, 2024

@N-thony are you reasonably confident that your solution is ok and will not cause the regressions @Patowhiz is concerned about? If so, then I suggest we ask @ChrisMarsh82 to check and merge? It worked nicely for the examples I looked at - which is seems were a problem before. So I am keen to progress.

@N-thony
Copy link
Collaborator Author

N-thony commented Sep 21, 2024

@rdstern my comments and questions above are yet to be responded to by @N-thony . I suspect we may regress other parts if I merge this without knowing the answers to my questions above. In my above comments, I've also given an alternative solution that involves solving the problem at the dialog level (this has far less risk).

I'm happy for you and @lilyclements or someone else to review and merge this.

@rdstern @Patowhiz if the code looks okay, another way to make sure this works well is through tests. It will be good to test this as much as we want to be sure before merging.

@rdstern
Copy link
Collaborator

rdstern commented Sep 21, 2024

@N-thony I am unhappy delaying the merging of this bug-fix. I have tested the Prepare > Text > Split with data from tidyr (world populations) and it works fine with your fix.
I have now looked at the same example in the latest version and it doesn't work. That's what started this off with Patrick's issue on July 30. So, unless we merge, we are waiting 2 months for the fix of an obvious problem.

I also checked the only other examples I could think of, where multiple variables are produced and added to an existing data frame. They are Prepare > Numeric > Random samples. That doesn't work on the merged version and seems fine with your fix.

And the Prepare > Numeric > Row Statistics (with ranks) is the only other example I can think of for now. That also works fine with your fix.

So I find it curious that the 2 best programmers won't take responsibility in this case. I would like this to be merged in the forthcoming version, because it is the only known (obvious) bugs. Assuming @Patowhiz is still not prepared to merge, then please @ChrisMarsh82 could you do so. Whether you check the code or just merge I don't care any more - it is time this fix was included.

@Patowhiz Patowhiz dismissed their stale review September 21, 2024 17:46

Dismissing this review to facilitate quick merging of this PR so that the dialogs that depend on it can be functional

Copy link
Contributor

@Patowhiz Patowhiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this to facilitate further testing and new release that needs the bug fixed.

@Patowhiz Patowhiz merged commit fb93fb5 into IDEMSInternational:master Sep 21, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Addition of columns
5 participants