-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser: Revert to PEG parser by default #947
Conversation
Fixes several small issues created with the TinyMCE parser. Focuses efforts on building a declarative formal grammar to handle our parsing needs.
cfaa2b6
to
664ef45
Compare
Which issues does this fix? Can we get test cases for them to ensure they don't regress? If you're talking about #844, I'm actually only seeing that issue with the PEG parser and not the TinyMCE parser (see latest comment for details). More generally, my thoughts on this are tied up with #914. I am disappointed in the lack of interest in solving this critical set of problems, and I don't think we should make the change in this PR until the PEG parser returns a tree of block delimiters + HTML tags. |
well #890 and yeah #844 are examples. with the grammar the invalid block is just that: invalid; but we don't mangle the content. with the TinyMCE parser it breaks the entire page.
sounds like a good idea. would be nice to do in a follow-up PR.
please try not to assume anyone else's intent on this. we have disagreements on what the problem is and how it should be resolved. I trust you that you are working to produce the best design for the project; that is what I am also doing.
when I test it out it appears to match the behavior of the TinyMCE-based parser at the moment. if that's not the case can you help me identity what needs to change? I don't want to add new behavior in this PR. I'm on board with your goals. this PR is about wanting to focus on the formal grammar and the corresponding parser that it generates. there's nothing holding us back from changing it however we want, but I'd rather get this started now so we can iteratively improve it in a well-defined central spot than let the implicit spec continue to develop implicitly outside of the grammar. |
I have no doubt that we are all trying to produce the best design for the project, and that we'll continue to improve the grammar. However, merging this PR now means that #914 is effectively stalled until that happens, because our default parser would not be providing a parse tree that includes HTML tags. I have to say I'd be pretty disappointed with that outcome. I would prefer instead that we enhance the PEG grammar to return a tree of HTML tags first (aka bring it more into line with Also, if this PR is intended to fix bugs, then I think the test cases for those bugs should be added at the same time. |
And then remove the TinyMCE-based parser. The latest comments on #844 provide a clear example of how limited this approach is. |
yes, though switching now means swapping out at a point where the editor's behavior doesn't change and #914 isn't merged yet. I've been trying to give #914 the weight and consideration it deserves (which is why I've been slow to respond to it) and I'm very uncomfortable merging it in the way it's designed. Even if TinyMCE were the only parser I don't like the performance characteristics of #914 or the nondeterminism it introduces to converting raw HTML into blocks. If we're going to add an HTML tree to the parser output then let's do that, but let's do it one step at a time instead of squashing together changes. Also, if you aren't going to approve of the grammar parser anyway (which seems to be where your comments point) why should we waste time building it up the way we want the parser to work? |
Thanks for all the discussions around these important issues. I'm going to merge this as it fixes the immediate problem, but I trust we'll iterate on this crucial foundation until we are satisfied. I'd like us to remain open to layering solutions in order to strengthen the system without adding unnecessary burdens for the optimal case (a person using just the new editor for new content) which is the one that will get measured against other platforms. I don't feel comfortable removing the TinyMCE parser until we can run different tests against the two, more faithfully determine shortcomings around performance and reliability, and iterate on the grammar offering. The changes in #914 to |
Fixes several small issues created with the TinyMCE parser.
Focuses efforts on building a declarative formal grammar to handle our
parsing needs.
@nylen how would you feel here about switching back to the PEG parser and then making our needed updates to the grammar and focusing on it?