Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new option to make renaming columns easier #8439

Closed
lilyclements opened this issue Jul 13, 2023 · 7 comments · Fixed by #8485
Closed

A new option to make renaming columns easier #8439

lilyclements opened this issue Jul 13, 2023 · 7 comments · Fixed by #8485
Assignees

Comments

@lilyclements
Copy link
Contributor

lilyclements commented Jul 13, 2023

@serifatf is looking at analysing the app data in R-Instat. One of the immediate questions that came up was whether there was an approach to remove the same string from all column names. This would be a problem in our app and chatbot data since they all start with a long string (e.g., "rp.contacts.field." in the app data).
There are hundreds of rows, so doing this manually would be very time consuming. As well as this, it takes up a lot of space in the column titles making it less clear.

Fortunately there's some very easy code to fix this! Using starts_with in tidyselect, we can just find all the cases that start with a certain string, and replace that string with anything we like (including an empty string, as we want to here).

data_book$rename_column_in_data(data_name="data_RDS", type="rename_with",
                                 .fn=stringr::str_replace,
                                .cols=tidyselect::starts_with("rp.contact.field."),
                                pattern = "rp.contact.field.",
                                replacement = "")

I suggest we add this to our "Rename With" tab on the Rename Columns dialog. We currently have three radio buttons, but I suggest we add one more:

  • Replace Name (or something like this!)

When that is selected, we have a ucrInputDropDown (non editable) where you can select one of "Starts with", "Ends with", "Contains", "Matches"
There is also two ucr Inputs: one with the string to rename, and one with the value to rename it to.
Only the first needs to be filled. If you want to remove that part of the column name completely, you rename it to an empty string, "".

The corresponding functions

  • If the rdo button is selected, we run .fn = stringr::str_replace
  • If starts_with is selected, we run .cols = starts_with as the function
  • If ends_with is selected, we run .cols = ends_with as the function
  • If matches is selected, we run .cols = matches as the function
  • If contains is selected, we run .cols = contains as the function

Then

  • The ucr input with the string to rename is the value of the parameter pattern as well as the value that is read into our starts_with (or ends_with, matches, contains) function. This needs to be filled if the rdo is checked.
  • The ucr input with the value to rename it to is replacement, this is default ""

@rdstern what do you think? This is the start of using R-Instat for analysing the App and Chatbot data, which would be great. Do you think this should be four options - one for "starts with", "ends with", "contains", and "matches"? Or just one option about replacing a part of a string?

Side note: Matches vs Contains -
matches will match a regular expression, contains is for a string.

@serifatf
Copy link

@lilyclements , good to connect with you here.

@rdstern
Copy link
Collaborator

rdstern commented Jul 14, 2023

@lilyclements that sounds excellent. @N-thony wrote most of the new rename dialog so is well places to add - or to allocate - this feature. @N-thony you might also check on the details of the location of the extra control. You are good at that!

There is the similar drop-down control in the Select columns sub-dialog:

image

Maybe call it Edit Names: as the checkbox.

Then, as you say, if checked it shows the drop down, followed by Replace: <input string> By: <replacement string>

a) If the second is left blank, then the replace string is removed.
b) If this results in an illegal name then it is made legal. That includes the situation where the whole name may be deleted.
c) Could the output window report the number of names edited?
d) If a Select is "operational" in the dialog, then it only renames those that are within the select.

@N-thony
Copy link
Collaborator

N-thony commented Aug 3, 2023

@derekagorhom as we have discussed and agreed yesterday evening that this is going to be taken by @MeSophie and you will work on a very interesting task on the toolbar improvement for the output comments that @rdstern will assign to you soon.

@N-thony N-thony assigned MeSophie and unassigned derekagorhom Aug 3, 2023
@N-thony
Copy link
Collaborator

N-thony commented Aug 3, 2023

@derekagorhom as we have discussed and agreed yesterday evening that this is going to be taken by @MeSophie and you will work on a very interesting task on the toolbar improvement for the output comments that @rdstern will assign to you soon.

@derekagorhom It was already assigned to you. #8444

@MeSophie
Copy link
Contributor

MeSophie commented Aug 7, 2023

@rdstern and @lilyclements Please I need Some clarifications.
This is what I did from up to now.

image

So @lilyclements please is the parameter type will always has the value rename-with? I assume that rp.contact.field. in your code above represente the string to be replace.
Than is this code only works in all the columns. I means because we don't have any controls where we can specify the colum is mean that the code will replace the string everewhere? Don we need another control where we can specify the column(s)?
Also i don't know if this code that I obtain following your code is correct I have every parameters but when I run nothing change.
data_book$rename_column_in_data(data_name="survey", .fn=stringr::str_replace, .cols=tidyselect::starts_with("OLD"), pattern="OLD", replacement="NONE", type="starts_with")

@rdstern Please can you give more explanation about part b) (What do you mean by illagal name?) and part d)?

@rdstern
Copy link
Collaborator

rdstern commented Aug 7, 2023

@MeSophie it is good you asked. I'll give you my answer, and @lilyclements can add if I have it wrong.
I have an example dataset. I have used the DAAG package, and it has a dataset called rockart. See here:

image

You see there are a lot of names that start with SS for example SSn33, SSn34 and so on. Suppose I want to change the SS to Sophie in all these names. So I really need a find and replace that works on column names.

Lily suggests adding a 4th option on this button shown, rather than a new button. So it is all within the Rename with. It goes under Abbreviate and I suggest you just call it Replace (It doesn't need to be Replace Name, because the whole dialog is about Rename.)

Then you can, as Lily says, take advantage of the "language" to have a pull-down to choose where in the name you are replacing. And I mentioned above, that most of the options lily mentions are already given in the Select feature, that we have already. In the example above you would use Starts With. Then you need a foeld into which we can type SS and then a By: label and another field, into which you can type Sophie. And if you type nothing into the second field, then it will simply delete the SS in the first field.

If you need help on the details of the Find/Replace then @N-thony may also be able to help, as he wrote most of the ordinary Find/Replace in the Prepare > Column: Text menu.

And keep asking questions on anything you are unsure about.

@lilyclements
Copy link
Contributor Author

lilyclements commented Aug 10, 2023

@MeSophie Hopefully the points made by @rdstern answer your questions. Do you have any other questions?

So @lilyclements please is the parameter type will always has the value rename-with? I assume that rp.contact.field. in your code above represente the string to be replace.

Yes exactly. We want to both replace every column that starts with the string rp.contact.field, and replace the part of the string called rp.contact.field.

Than is this code only works in all the columns. I means because we don't have any controls where we can specify the colum is mean that the code will replace the string everewhere? Don we need another control where we can specify the column(s)?

Apologies for being unclear. The rp.contact.field string is saying two things:

  1. Get every column that starts with rp.contact.field
  2. Replace the string rp.contact.field in the column names.

Also i don't know if this code that I obtain following your code is correct I have every parameters but when I run nothing change.

Can you send a copy of the code you are using and I'll take a look? With the example @rdstern has put above, the code would be:

data_book$rename_column_in_data(data_name="rockArt", type="rename_with",
                                 .fn=stringr::str_replace,
                                .cols=tidyselect::starts_with("SS"),     # the columns to change are the ones that start with SS
                                pattern = "SS",                                 # the string to replace is SS
                                replacement = "Sophie")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants