Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add strip_bbcode() method to String #65839

Closed
wants to merge 1 commit into from

Conversation

ztc0611
Copy link
Contributor

@ztc0611 ztc0611 commented Sep 15, 2022

Replaces #65384.

Adds a function to strip bbcode tags out of strings.

Example Code:

var s = "[color=#ffffff]Hello![/color]
print(s.strip_bbcode())

Result: Hello

Closes godotengine/godot-proposals#5056

@ztc0611 ztc0611 requested review from a team as code owners September 15, 2022 18:27
@akien-mga akien-mga added enhancement topic:core cherrypick:3.x Considered for cherry-picking into a future 3.x release labels Sep 15, 2022
@akien-mga akien-mga added this to the 4.x milestone Sep 15, 2022
@markdibarry
Copy link
Contributor

markdibarry commented Sep 15, 2022

This is a great idea. Haven't looked in awhile, but if the implementation is the same, would it make more sense to add this to the TextServer and have both RTL and string call the same method? Or do they need to be separate methods?
@bruvzg Maybe you could weigh in?

@ztc0611
Copy link
Contributor Author

ztc0611 commented Sep 15, 2022

would it make more sense to add this to the TextServer and have both RTL and string call the same method?

I'm still not a big expert on Godot's code. I wasn't able to find the function that does that to RTLs already (to convert them to the text property), but if that is considered to be a better solution it's fine by me.

@ztc0611 ztc0611 force-pushed the strip-bbcode branch 2 times, most recently from 1d9e467 to 100a0c5 Compare September 15, 2022 21:54
@Spartan322
Copy link
Contributor

would it make more sense to add this to the TextServer and have both RTL and string call the same method? Or do they need to be separate methods?

I don't see why you would call it in the RTL given that if you're using BBCode it has to process the text and if not then you want it to be a raw string including the BBCode tags.

@ztc0611
Copy link
Contributor Author

ztc0611 commented Sep 16, 2022

I don't see why you would call it in the RTL given that if you're using BBCode it has to process the text and if not then you want it to be a raw string including the BBCode tags.

I think they meant make the RTL and String use the same function for striping it, because the RTL converts bbcode_text to raw text in the editor. I didn't tap into that exact functionality though.

@markdibarry
Copy link
Contributor

After some digging, it looks like RTL is in fact not caching but stripping it every time you call get_parsed_text(), but it looks like it's doing something specific, so I'm going to say we should keep them separate unless otherwise needed.
image

@dalexeev
Copy link
Member

Actually, there is a way to escape BBCode. Maybe we should add escape_bbcode instead of / together with strip_bbcode?

@ztc0611
Copy link
Contributor Author

ztc0611 commented Sep 23, 2022

So it needs a check to make it convert [lb] and [rb] to brackets? I can start on that soon.

@markdibarry
Copy link
Contributor

The other pretty important issue with this is that it doesn't check anywhere if it's valid BBCode or not. This is important, since RichTextLabel doesn't strip BBCode that doesn't match a tag.
image

@ztc0611
Copy link
Contributor Author

ztc0611 commented Oct 12, 2022

@markdibarry any idea where the script that checks for valid bbcode is?

@dalexeev
Copy link
Member

RichTextLabel doesn't strip BBCode that doesn't match a tag

Given that custom BBCodes can be added, I think it's correct to only consider the syntax (strip all tags, including non-existent and non-matching ones) and point this out in the documentation.

@Spartan322
Copy link
Contributor

Spartan322 commented Oct 13, 2022

The other pretty important issue with this is that it doesn't check anywhere if it's valid BBCode or not. This is important, since RichTextLabel doesn't strip BBCode that doesn't match a tag.

The only simple independent way I can think of this being implemented is by having a list of valid tag names in an array and "skipping" past anything outside that array, honestly doesn't look like that be difficult, (perhaps that can be considered, if so however it should be under a different name so it'll handle the default tags by distinctly and by default like strip_bbcode_tags(handle_defaults : bool = true, extra_tags : Array[string] = [])) but if the function gets that complex then doesn't trying to better comply with BBCode enter the fray adding undue complexity?

@ztc0611
Copy link
Contributor Author

ztc0611 commented Oct 13, 2022

If custom BBCode tags can be added, and RichTextLabel can access this, then surely this function could just automatically access whatever that store is, right? That seems like it would be the most elegant solution.

@Spartan322
Copy link
Contributor

If custom BBCode tags can be added, and RichTextLabel can access this, then surely this function could just automatically access whatever that store is, right? That seems like it would be the most elegant solution.

That's the clunkiest solution actually, as you would be required to pass the RTL every time you made a call and makes the function heavily dependent upon the RTL. (meaning as well if you don't use the RTL now you're polluting the dependency of the class just for that function, which iirc is actually prohibited as new additions anyway as core can't be dependent upon scene files) And that aside its unnecessary since all the information this function needs to remove known tags is the name of the tags, as in all you need is strings, you don't need anything else in the RTL and don't need to keep a reference to an RTL in order to use it.

@ztc0611
Copy link
Contributor Author

ztc0611 commented Oct 13, 2022

That's the clunkiest solution actually, as you would be required to pass the RTL every time you made a call and makes the function heavily dependent upon the RTL.

Considering the primary use case here is for RTLs, maybe this should be a function attached to RTLs? Like $RichTextLabel.strip_bbcode(string) which just returns the stripped text as the RTL would display it, and doesn't affect the RTL's actual bbcode_text or text? It not returning an identical string to what the RTL will end up displaying feels like it could make this function have limited use for many.

@dalexeev
Copy link
Member

I guess the main purpose of strip_bbcode is to safely add text without markup (although there is add_text and [lb], [rb] tags). In order to get the current visible text there is get_parsed_text. strip_bbcode respecting the current set of tags is useful if you need to know the future visible text, but I think this is a rare use case.

@ztc0611 ztc0611 force-pushed the strip-bbcode branch 2 times, most recently from 3865426 to 8f8b213 Compare October 14, 2022 02:04
@markdibarry
Copy link
Contributor

useful if you need to know the future visible text, but I think this is a rare use case

Funny enough, that's the exact use case I have. When working with dialog, knowing the indices of rendered characters can be crucial. With any custom text syntax, your two options are to manually keep a list of valid bbcode tags up to date, or:

  1. Set the text, so the text can be parsed
  2. call get_parsed_text()
  3. Use the full text and parsed text together to determine the rendered text.
  4. Parse the text for custom syntax.
  5. set the text again.

Can be a little over-the-top.

@ztc0611
Copy link
Contributor Author

ztc0611 commented Oct 14, 2022

When working with dialog, knowing the indices of rendered characters can be crucial.

This is my exact use case as well, but I don't use custom bbcode. Wasn't sure if it was an odd edge case or not. I use it to make the text make sounds only for precise durations, and pause when the text reaches a pause.

@markdibarry
Copy link
Contributor

Sounds rad. Keep me posted with your project's progress after this!

@ztc0611 ztc0611 force-pushed the strip-bbcode branch 2 times, most recently from 6748318 to 4d08028 Compare October 14, 2022 03:06
@dalexeev
Copy link
Member

dalexeev commented Oct 14, 2022

When working with dialog, knowing the indices of rendered characters can be crucial.

I prefer not to rely on indexes, but to use invisible control characters.

According to your description, it should not be String.strip_bbcode, but RichTextLabel.get_parsed_text_from_string or something like that. strip_* means that some character sequences will be removed (like strip_escapes or lstrip) and not replaced (like c_escape, c_unescape, etc.).

We also need to remember that rtl.text += s and rtl.append_text(s) give different results for unclosed tags. And the proposed behavior may not be suitable in one of the cases.

if (operator[](i) == '[') {
int skip_char = 0;

//Check for [rb] and [lb].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess strip_bbcode should just remove all BBCodes without parsing them (even [code]). That is:

"[lb]text[rb]".strip_bbcode() # "text", not "[text]"

Otherwise, after strip_bbcode, we can get a string containing the BBCode, which even sounds wrong:

"[lb]b[rb]text[lb]/b[rb]".strip_bbcode() # "[b]text[/b]"

As I wrote above, if the functionality is needed, then it should not be the String.strip_bbcode method, but RichTextLabel.get_parsed_text_from_string or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if they want brackets in the text they're typing for some reason? I don't see why example 2 would ever come up outside of somebody trying to intentionally "break" it.

@ztc0611 ztc0611 closed this Jul 4, 2023
@akien-mga akien-mga added archived and removed cherrypick:3.x Considered for cherry-picking into a future 3.x release labels Jul 4, 2023
@YuriSizov YuriSizov removed this from the 4.x milestone Dec 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a function to erase BBCode from a String, without depending on RichTextLabel
6 participants