Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: keep tail spaces in heading #577

Merged
merged 1 commit into from
Oct 2, 2023

Conversation

s-alves10
Copy link
Contributor

Fixed Issues

$ Expensify/App#27445

Tests

    Input: <h1>heading1</h1> in the beginning of the line
    Output: # heading1\n in the beginning of the line
  1. Open any chat and send message
# heading 1
    in the beginning of the line
  1. Switch the message just sent to edit mode
  2. Verify that spaces in the second line not disappeared

QA

  1. Open any chat and send message
# heading 1
    in the beginning of the line
  1. Switch the message just sent to edit mode
  2. Verify that spaces in the second line not disappeared

@s-alves10 s-alves10 marked this pull request as ready for review September 19, 2023 06:03
@s-alves10 s-alves10 requested a review from a team as a code owner September 19, 2023 06:03
@melvin-bot melvin-bot bot requested review from jasperhuangg and removed request for a team September 19, 2023 06:04
@s-alves10
Copy link
Contributor Author

@jasperhuangg

Please take a look

@parasharrajat
Copy link
Member

Looks like all tests passed. Great.

@s-alves10 can you please analyze all tests related to the heading rule? Let me know if you find something strange. I will do that same but you can help me speed up the process. Thank you.

@s-alves10
Copy link
Contributor Author

s-alves10 commented Sep 20, 2023

@parasharrajat

These are htmlToMarkdown tests for heading

test('Test html string to heading1 markdown', () => {
const testString = '<h1>This is a heading1</h1>';
const resultString = '# This is a heading1';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});
test('Test html to heading1 markdown when there are other tags inside h1 tag', () => {
const testString = '<h1>This is a <strong>heading1</strong></h1>';
const resultString = '# This is a *heading1*';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});
test('Test html to heading1 markdown when h1 tag is in the beginning of the line', () => {
const testString = '<h1>heading1</h1> in the beginning of the line';
const resultString = '# heading1\n'
+ 'in the beginning of the line';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});
test('Test html to heading1 markdown when h1 tags are in the middle of the line', () => {
const testString = 'this line has a <h1>heading1</h1> in the middle of the line';
const resultString = 'this line has a\n'
+ '# heading1\n'
+ 'in the middle of the line';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});
test('Test html to heading1 markdown when h1 tag is in the end of the line', () => {
const testString = 'this line has an h1 in the end of the line<h1>heading1</h1>';
const resultString = 'this line has an h1 in the end of the line'
+ '\n# heading1';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});
test('Test html to heading1 markdown when there are 2 h1 tags in the line', () => {
const testString = 'this is the first heading1<h1>heading1</h1>this is the second heading1<h1>heading1</h1>';
const resultString = 'this is the first heading1'
+ '\n# heading1'
+ '\nthis is the second heading1'
+ '\n# heading1';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});
test('Test html to heading1 markdown when there are adjacent h1 tags in the line', () => {
const testString = '<h1>heading1</h1><h1>heading2</h1>';
const resultString = '# heading1'
+ '\n# heading2';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});

Nothing strange is there.
L554-567 are related to the current issue and I changed them according to the expected behavior

@parasharrajat
Copy link
Member

parasharrajat commented Sep 20, 2023

@s-alves10 I was checking the regex earlier and saw that the old regex does not match the newlines and spaces before the next line so in short the solution here is not justified.

Screenshot 2023-09-21 at 12 52 34 AM

can you please explain how this regex change works?

@s-alves10
Copy link
Contributor Author

@parasharrajat

[^\S\r\n]* means that this would match all space characters except \r and \n
But when converting markdown to html, we remove one newline and convert all other newlines to <br /> and so there is no \r or \n in the stored html string.

# heading
  next line

=> <h1>heading</h1> next line

# heading

  next line

=> <h1>heading</h1><br /> next line

The original regex works fine for the latter case but doesn't work for the first case(this is just our issue) because there is no <br /> or \r or \n

I hope this makes sense

@marcaaron marcaaron requested review from marcaaron and removed request for jasperhuangg September 25, 2023 05:06
@parasharrajat
Copy link
Member

parasharrajat commented Sep 25, 2023

I got it. Thanks for explaining. So it seems that the root cause is that we remove all the newlines from the output HTML. If the user types newlines on MD then they should be there. At least one should be there.

@s-alves10

The real issue is that we match the spaces in the next line after the heading tag. Given that regex was correct and it does not match the spaces on the next line, I don't think we should change it. The root issue is somewhere else which looks like it be

But when converting markdown to HTML, we remove one newline and convert all other newlines to
and so there is no \r or \n in the stored HTML string.

My question would be why do we remove one newline after the heading tag?

@s-alves10
Copy link
Contributor Author

s-alves10 commented Sep 25, 2023

@parasharrajat

My question would be why do we remove one newline after the heading tag?

I think removing one newline is reasonable because the newline in heading in markdown expresses the heading is ended. It doesn't mean newline.
In addition, if we add <br /> to the h1 tag, it would add additional newline, which leads to an unexpected result.
This was added in this PR

@parasharrajat
Copy link
Member

I see so you are saying that the heading is a block element and it itself should add newlines after being parsed into MD.

Then wouldn't it make sense to remove BR and newline after the heading tag while converting to MD?


Let me test this PR a bit to see there are no edge cases.

@s-alves10
Copy link
Contributor Author

@parasharrajat

Then wouldn't it make sense to remove BR and newline after the heading tag while converting to MD?

I don't think so. heading in markdown uses newline as its end mark.

@parasharrajat
Copy link
Member

parasharrajat commented Sep 25, 2023

I agree. I was saying the same. remove Br newlines while parsing HTML to MD and add just one. I guess this is what happening on main. But it also consumes spaces after the end tag which is the issue.

@s-alves10
Copy link
Contributor Author

@parasharrajat

What should I do next? This PR is 8 days old without any change. I'm not sure why

@parasharrajat
Copy link
Member

parasharrajat commented Sep 26, 2023

My bad, I forgot to update here. I asked for the original change to clarify the regex because there was no mention of those changes on the issue or proposal. Seems like I didn't receive any response. #531 (comment).

Meanwhile, I have tested a few possible patterns and all seem to work fine so I will approve it. Thanks for waiting.

Copy link
Member

@parasharrajat parasharrajat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

🎀 👀 🎀 C+ reviewed

@s-alves10
Copy link
Contributor Author

@marcaaron

Please take a look

@s-alves10
Copy link
Contributor Author

@marcaaron

What's the trouble with you? This PR was opened 2 weeks ago. Not sure what I can do here

@marcaaron
Copy link
Contributor

Couple things.

  1. I have been waiting for @parasharrajat to review. I see he has completed his review on the 26th. That is exactly 4 business days ago.
  2. We have been traveling for our company offsite in Indonesia. Not sure if we made any announcement this time around, but during this week we apply maximum focus to some internal goals. Which unfortunately can cause delays.

Last thing...

What's the trouble with you?

Just some feedback here - but that question commonly reads as having a critical tone. I am going to assume best intentions and believe that you didn't mean it that way. It would be better to ask: "How are things looking?" or "Can I please have a status update?", "Can I help unblock you in anyway?" etc. Those are all more polite ways to check in with a reviewer.

Copy link
Contributor

@marcaaron marcaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your patience.

@marcaaron marcaaron merged commit f76ff4b into Expensify:main Oct 2, 2023
3 checks passed
@s-alves10
Copy link
Contributor Author

@marcaaron

I'm sorry if I was not polite. That's not my intention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants