Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RichTextLabel Refactoring #39248

Closed
wants to merge 9 commits into from
Closed

Conversation

HaSa1002
Copy link
Contributor

@HaSa1002 HaSa1002 commented Jun 2, 2020

Important: See #39248 (comment) on current status

As we talked about it on IRC I started refractoring the RichTextLabel (RTL). I am nowhere finished, I just wanted to show that I am working on a refractoring, to prevent possible dublication of work.

The current goal of this refractor is to split the process_line method into different methods, as it is responsible for too many things at the same time (it operates in at least three states). I believe a good portion of the RTL bugs have their root in this function and only a few people are willing to deal with this beast.

We could additionally see, if we want to implement godotengine/godot-proposals#826. If we want to go into that direction, we could add support for markdown as well. This is especially interesting for this refractoring as I would change the the input to be the raw text and move all the implementation logic into the bbcode, which is currently stacked into RTL.

Update 2020-06-06

I fixed all compilation issues and removed _process_line from the RTL.cpp. I reimplemented it in the dirtiest possible way (Constructing the BbCodeProcessor inline and calling the correct process function). I removed the ProcessMode from the visible API and hide it inside the BbCodeProcessor. I'd like to get rid of it in the the functions called from within the process methods too, but have no clue if that is the best way, since the current change already added some code duplication. I rebased and reapplied the changes of #39113 into the new _parse_img function.

Update 2020-06-03

Abstracted big chunks in the _process_line function into functions. (parse_image, parse_text, parse_detect_click, and parse_table). Note to self: copy #39113 into new bbcode when rebasing

Original

As for now I managed to transfer the process_line method into a new class and replaced the inlined defines with proper functions. I copied every dependency from RTL into the new Class, to get as fast as possible to a compiling build.

I will merge the commits together, once it is ready.
If merged it supersedes #39148

@bruvzg
Copy link
Member

bruvzg commented Jun 3, 2020

Might be irrelevant, depending on the amount of changes to the rlt/bbcode text processing you are willing to do, but since we still want to eventually implement BiDi/shaping support, it's probably worth considering some of its specifics during the refactoring, to avoid rewriting it again in the future.

  1. Whole paragraphs of the text (along with its font markdown, and metrics of the inline images/tables) should be passed to shaping engine in one piece, for reordering and shaping (mapping characters to font visual elements, in most cases mapping is not one-to-one), all subsequent processing (formatting / text effects) should be done on the results of this operation.

  2. Breaking the lines can be done locally, but at least set of the valid break points should be received from the shaping engine (in some cases it involves word search in the dictionary or/and other complex rules), or it might be better to fully offload it to the shaping engine.

  3. Lines should be reordered again after breaking.

If I recollect correctly, process_line actually processes paragraphs instead of lines, but doing all the above probably will need multiple passes.

@HaSa1002
Copy link
Contributor Author

HaSa1002 commented Jun 3, 2020

@bruvzg here some questions I have in order to get what you want to tell me.

Whole paragraphs of the text (along with its font markdown, and metrics of the inline images/tables) should be passed to shaping engine in one piece, for reordering and shaping (mapping characters to font visual elements, in most cases mapping is not one-to-one),

How to define a paragraphs then? When we stumble up on an empty line (like in Markdown)? What is meant with reordering?

all subsequent processing (formatting / text effects) should be done on the results of this operation

Currently this is done in the processing of ItemText. Problem is, that the drawing occurs, when we draw, directly after the processing. If I understood you correctly, the preprocessing should be split apart from the actual drawing. If that is the case, that is one goal I want to achieve, because only then I can get rid of the three function operating modes.

Breaking the lines can be done locally, but at least set of the valid break points should be received from the shaping engine (in some cases it involves word search in the dictionary or/and other complex rules), or it might be better to fully offload it to the shaping engine.

That has to be implemented once the shaping support is implemented, right? I could implement a method get_valid_line_breaks(), which can be later replaced.

Lines should be reordered again after breaking.

What do you mean by reordering?

@bruvzg
Copy link
Member

bruvzg commented Jun 3, 2020

How to define a paragraphs then?

Probably start new paragraph at each explicitly set newline ("\n" or bbcode tag that forces newline).

What is meant with reordering?
Strings are always stored in the reading order (from logical start to end), but some languages are written left-to-right and some right-to-left, reordering means placing characters in the order they are displayed, then characters are mapped to graphemes/glyphs (that's called shaping - same character can look differently depending on its position and neighbors, or multiple chars can be combined into ligature, this part depends on specific font).

e.g. if you have input string "ĀBČdess" where capital letters represent L-t-R language and small letters R-t-L one. It should be displayed as ßedĀBČ, internal rules of reordering are complex and irrelevant for the case (Unicode Standard Annex #9).

There will be some function (exact API is not determined) that will take the "ĀBČdess" string + fonts to use with it and return array of graphemes, each containing set of glyphs (images in the font + their relative offset) and a range in string it corresponds to: [ ß ] 5..7, [ e ] 4..5, [ d ] 3..4, [A, -] 0..1, [ B ] 1..2, [ C, ˇ ] 2..3.

If I understood you correctly, the preprocessing should be split apart from the actual drawing.

Yep. I guess, at this point preprocessing should do something like:

  • Prepare paragraph text and fonts
  • [... First round of reordering/shaping will plug-in here.... ]
  • Break lines.
  • [... Second reordering will be added here... ]
  • Store lines to the cache.

And separate drawing / hit testing that use preprocessed cache.

That has to be implemented once the shaping support is implemented, right? I could implement a method get_valid_line_breaks(), which can be later replaced.

Nothing of this is implemented yet, but probably it'll be some function taking string, and returning Vector of each position it's safe to break (position of each space for example, but not limited to it).

I'm not sure how much of this is possible to take into account without actually having BiDI/shaping implementation.

@Eoin-ONeill-Yokai
Copy link
Contributor

@HaSa1002 Yes yes yes, this is absolutely a good first step and I'd be willing to help in any way I can. Breaking that _process_line beast into proper functions will really streamline the whole process of working with the class. As an outsider jumping into the code for the Effects patch, it's a very daunting and hard to parse method will all of the use of defines in place of proper functions.

Let's also keep in mind some potential improvements that can be made for RichTextEffects as we're doing it. Specifically, it might be nice to try to get some proper character rotation support. There were attempts to add the feature in another PR / MR somewhere, but the interface could be made nicer by improving the font drawer interface.

Lastly, any way we can streamline the internal text tree? One feature I was attempting to address was the ability dynamically change text within the tree so that you could hotswap specific sections of the rich text with new bbcode text.

Also, I have a utility method inside RichTextLabel for custom properties parsing that could streamline the parsing for every bbcode type. Currently, it's only used RichTextEffects.

@HaSa1002
Copy link
Contributor Author

HaSa1002 commented Jun 4, 2020

Let's also keep in mind some potential improvements that can be made for RichTextEffects as we're doing it. Specifically, it might be nice to try to get some proper character rotation support. There were attempts to add the feature in another PR / MR somewhere, but the interface could be made nicer by improving the font drawer interface.

AFAIK the FontDrawer is seperate from the RTL. I'd like to stick only to the RTL in this pr.

Lastly, any way we can streamline the internal text tree? One feature I was attempting to address was the ability dynamically change text within the tree so that you could hotswap specific sections of the rich text with new bbcode text.

You mean without reparsing all invalidated lines? I haven't decided yet, what I do with the Items, since I'd like to use the text String as input and parse it then, but my first priority is to finalize the clean cutting of the functions and then see, how to improve it. I didn't really think about what I want to improve in detail, since I still learn, what the detailed use cases and intentions of _process_line is. (The saying, "I read the letters, but don't understand the words" fits very well, here)

Also, I have a utility method inside RichTextLabel for custom properties parsing that could streamline the parsing for every bbcode type. Currently, it's only used RichTextEffects.

Show it, please.

Answering your question on the tracker here, (I don't want to pollute it with discussions really):

#38169 RichTextLabel's append_bbcode stops animated rich text effects MRP

When I find some time, I will solve this. I can't imagine it being too hard.
Since you are currently refactoring the code, should I participate in that branch? Or as a separate MR?

I would wait with it, until the new structure is finalized, and work on top of it. We might integrate some bugfixes into the refractoring, depending on how the final design will be. The point I see in favor of adding bugfixes in this state is, that sometimes those will lead to a different and perhaps better design. I'd like to get opinions on that one though, especially from Coredevs 😄.

@Eoin-ONeill-Yokai
Copy link
Contributor

Eoin-ONeill-Yokai commented Jun 5, 2020

Show it, please.

@HaSa1002 The method is parse_expression_for_values which contains the following code.

Dictionary RichTextLabel::parse_expressions_for_values(Vector<String> p_expressions) {
	Dictionary d = Dictionary();
	for (int i = 0; i < p_expressions.size(); i++) {
		String expression = p_expressions[i];

		Array a = Array();
		Vector<String> parts = expression.split("=", true);
		String key = parts[0];
		if (parts.size() != 2) {
			return d;
		}

		Vector<String> values = parts[1].split(",", false);

#ifdef MODULE_REGEX_ENABLED
		RegEx color = RegEx();
		color.compile("^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$");
		RegEx nodepath = RegEx();
		nodepath.compile("^\\$");
		RegEx boolean = RegEx();
		boolean.compile("^(true|false)$");
		RegEx decimal = RegEx();
		decimal.compile("^-?^.?\\d+(\\.\\d+?)?$");
		RegEx numerical = RegEx();
		numerical.compile("^\\d+$");

		for (int j = 0; j < values.size(); j++) {
			if (!color.search(values[j]).is_null()) {
				a.append(Color::html(values[j]));
			} else if (!nodepath.search(values[j]).is_null()) {
				if (values[j].begins_with("$")) {
					String v = values[j].substr(1, values[j].length());
					a.append(NodePath(v));
				}
			} else if (!boolean.search(values[j]).is_null()) {
				if (values[j] == "true") {
					a.append(true);
				} else if (values[j] == "false") {
					a.append(false);
				}
			} else if (!decimal.search(values[j]).is_null()) {
				a.append(values[j].to_double());
			} else if (!numerical.search(values[j]).is_null()) {
				a.append(values[j].to_int());
			} else {
				a.append(values[j]);
			}
		}
#endif

		if (values.size() > 1) {
			d[key] = a;
		} else if (values.size() == 1) {
			d[key] = a[0];
		}
	}
	return d;
}

There are some flaws with the current implementation (specifically, there's no alternative parsing for builds lacking the regex module and it would be nice to support spaces within strings) that need to be addressed. The benefit of doing it this way though is that we can support more, optional parameters and less manual labor needs to happen inside the append_bbcode function. At any rate, I think it would be useful to consider during a refactor. The usage looks like the following in append_bbcode, but is only used for custom RichTextEffects:

                        Vector<String> expr = tag.split(" ", false);
                        if (expr.size() < 1) {
				add_text("[");
				pos = brk_pos + 1;
			} else {
				String identifier = expr[0];
				expr.remove(0);
				Dictionary properties = parse_expressions_for_values(expr);
				Ref<RichTextEffect> effect = _get_custom_effect_by_code(identifier);

				if (!effect.is_null()) {
					push_customfx(effect, properties);
					pos = brk_end + 1;
					tag_stack.push_front(identifier);
					set_process_internal(true);
				} else {
					add_text("["); //ignore
					pos = brk_pos + 1;
				}
			}

One last request, would it be possible to add a signal to the rich text label that emits when the 'visibleCharacters' parameter is set just beyond a bbcode? My thought is that it would be cool to allow for an extension of the RTL that can support some bbcode text like this:

There will be a sound playing right [playsound asset="path/to/asset"]now!

I'm not entirely sure the best way to support this, but my basic idea was that when the visibleCharacter value changes, we can inquire what bbcode text exists between the last visible character and the newly visible character, use the parse_expression_for_values function to populate a dictionary, and pass the bbcode identifier (in the above example, playsound) and a parameter dictionary into a signal. So the user could define a function like the following (pseudo-code):

func play_sound( code, parameters ):
    if code == "playsound":
        $SomeSamplePlayer.set_sample(parameters['asset'])
        $SomeSamplePlayer.play()

And simply bind that method to a signal. It could also, alternatively, be a virtual function of some sort for scripts to implement themselves.

Just a thought at least...

I'd like to get opinions on that one though, especially from Coredevs smile.

I don't think I'm considered core, but as someone who has contributed quite a bit to this specific code, I personally think a full refactor should happen first. Most of the existing problems probably stem from the complexity of the process_line code on it's own. The append_bbcode issue in particular is relatively simple (basically stems from disabling accepting update events) and a hotfix could be possible if a version before 4.0 needs to be released.

If you have any questions on the code, or RichTextEffect related implementations, don't be afraid to @ me though. I wish I could help out more, but the PR workflow doesn't always align itself super well with multiple hands-on-deck.

@HaSa1002
Copy link
Contributor Author

HaSa1002 commented Jun 5, 2020

I split the three states of _process_line into three functions, but I still have the p_mode ifs in the methods I abstracted away. I am not so sure, how to continue. I'd like to get rid of the ProcessModes completly but I am not sure if that is an intelligent move. Any suggestions?

@HaSa1002 HaSa1002 force-pushed the rtl-refractoring branch 2 times, most recently from 319f0eb to e8f262f Compare June 6, 2020 13:35
HaSa1002 added 7 commits June 6, 2020 16:33
This simplyfies the text setting and resolves some bugs
With this commit the engine is compiling, but the class itself doesn't
draw. There are changes which need to be made undone, but the priority
in this commit was compilability.

The inlined defines were refractored into their own functions.
Should compile again. Some bugs are existent and some more refractoring
needs to be done.
@HaSa1002 HaSa1002 force-pushed the rtl-refractoring branch from b7f0784 to 3a23f7f Compare June 6, 2020 14:36
@KoBeWi
Copy link
Member

KoBeWi commented Jun 6, 2020

I'd like to get rid of the ProcessModes completly but I am not sure if that is an intelligent move. Any suggestions?

If getting rid of them means making the code simpler without losing performance or introducing regressions, it's fine to do it.

@Eoin-ONeill-Yokai
Copy link
Contributor

I split the three states of _process_line into three functions, but I still have the p_mode ifs in the methods I abstracted away. I am not so sure, how to continue. I'd like to get rid of the ProcessModes completly but I am not sure if that is an intelligent move. Any suggestions?

I think that depends on some benchmarking. One thing I don't know is how that affects performance, so I think test that first and let us know the results.

@akien-mga akien-mga changed the title RichTextLabel Refractoring RichTextLabel Refactoring Jun 15, 2020
@HaSa1002
Copy link
Contributor Author

HaSa1002 commented Jun 22, 2020

As I have to do some work unrelated to code, it is very likely that I won't work on this PR until mid July. If somebody wants to continue the refactoring, I might be able to share some knowledge I gained from the already done parts.

The current "refactoring" (bbcode.h and bbcode.cpp) is just a new class which is constructed in place after which ether process_draw, process_pointer, or process_cache is called. This allown reduces the footprint in the RTL itself.

The second approach (bbcode_parser.h) I started would be a deeper refactoring where I move the complete parsing into a new class that is a member of the RTL or even it's own child of RTL. It would take over the actual parsing of the text (with the set and add_text methods) and generate the the tag tree itself. That would move all the tags into the new class and clearly seperate the parsing from the rest.

Lastly, as there is the is the work package of the right to left and font shaping support, which would require a refactoring ether way, I don't know whether or not it is worth trying to refactor the rtl before that. I tested with some dummy methods which would later be replaced with the shaping engine, but it is hard to guess if your dummy method really fullfills its job and if the shaping engine would work with actually contributing usable data. I have no idea how font shaping works so it is always guessing which hinders progress. That beeing said I really want to emphasize that I still want a refactoring of the rtl and will continue once I found time for it again. Sadly, currently that's not the case. I will leave this as a learning oppurtunity for now, as the first approach actually enhances readability.

@willnationsdev
Copy link
Contributor

willnationsdev commented Jul 22, 2020

Just wanted to say 1) thanks for working on this, and 2) I think my preferred approach for this class would be for a generic low-level tag tree to be supported in a singular Text node, and then it could optionally have a resource assigned to a property where the resource specifies the configuration and algorithm for parsing the text to generate the tag tree. So, a Text node could have a RichTextParser resource assigned to it or perhaps a MarkdownTextParser resource, etc. And if users wanted to create a script that implements the same interface (perhaps as a TextParser script), then they could do that as well to add their own preferred language support. A Text node that has no parser assigned to it would then behave like the current Label does. Then, all rendering code could function off of the tag tree to implement alignment, text wrapping, tables, alignments within table cells, etc.

It sounds to me like your bbcode_parser.h would be going more in that direction, if I'm not mistaken. So, that direction is my vote.

@akien-mga akien-mga requested review from reduz and Paulb23 October 2, 2020 13:38
@HaSa1002
Copy link
Contributor Author

superseded by #42595. Might look into extensibility again in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants