Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔧 ExpensiMark: Markdown image shorthand syntax #676

Merged
merged 8 commits into from
Apr 15, 2024

Conversation

kidroca
Copy link
Contributor

@kidroca kidroca commented Apr 2, 2024

This PR introduces refinements in the ExpensiMark library to enhance the handling of markdown syntax for images, including the introduction of shorthand syntax for images without alt text, and refines the cleanup of HTML attributes

Fixed Issues

$ Expensify/App#38952

Tests

  1. Unit tests have been updated in __tests__/ExpensiMark-HTML-test.js and __tests__/ExpensiMark-Markdown-test.js to cover the new markdown shorthand for images without alt text, the handling of images with empty and special characters in alt text, and the refinement in HTML attribute cleanup. These tests validate the parsing of markdown to HTML and vice versa, ensuring that the changes behave as expected.
  2. Manual tests were conducted to verify that the new markdown shorthand for images is correctly converted to HTML, that images with and without alt text are handled properly, and that no XSS vulnerabilities are introduced through the alt text or any other HTML attributes. The tests also confirmed that existing functionality for other markdown elements remains unaffected.

QA

  1. QA needs to validate that images using the new shorthand markdown syntax are correctly rendered in the app. This includes testing with various image URLs, with and without alt text, and ensuring that images embedded within text blocks are displayed correctly.
  2. It's crucial to test for regressions in the rendering of markdown content, particularly other markdown elements like links, bold/italic text, and code blocks. Ensuring that the sanitization of HTML attributes does not inadvertently affect the rendering of other elements or introduce any rendering issues is also important.

@kidroca kidroca requested a review from a team as a code owner April 2, 2024 18:30
@melvin-bot melvin-bot bot requested review from chiragsalian and removed request for a team April 2, 2024 18:31
Comment on lines 962 to 992

/**
* Replace MD characters with their HTML entity equivalent
* @param {String} text
* @return {String}
*/
escapeMarkdownEntities(text) {
// A regex pattern matching special MD characters we'd like to escape
const pattern = /([*_{}[\]~])/g;

// A map of MD characters to their HTML entity equivalent
const entities = {
'*': '*',
_: '_',
'{': '{',
'}': '}',
'[': '[',
']': ']',
'~': '~',
};

return text.replace(pattern, char => entities[char] || char);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the introduction of attributeCleanup rule - this logic is no longer needed

Comment on lines 6 to 7
const MARKDOWN_LINK = `\\[([^\\][]*(?:\\[[^\\][]*][^\\][]*)*)]\\(${MARKDOWN_URL_REGEX}\\)(?![^<]*(<\\/pre>|<\\/code>))`;
const MARKDOWN_LINK_REGEX = new RegExp(MARKDOWN_LINK, 'gi');
const MARKDOWN_IMAGE_REGEX = new RegExp(`\\!${MARKDOWN_LINK}`, 'gi');
const MARKDOWN_LINK_REGEX = new RegExp(`\\[([^\\][]*(?:\\[[^\\][]*][^\\][]*)*)]\\(${MARKDOWN_URL_REGEX}\\)(?![^<]*(<\\/pre>|<\\/code>))`, 'gi');
const MARKDOWN_IMAGE_REGEX = new RegExp(`\\!(?:\\[([^\\][]*(?:\\[[^\\][]*][^\\][]*)*)])?\\(${MARKDOWN_URL_REGEX}\\)(?![^<]*(<\\/pre>|<\\/code>))`, 'gi');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MARKDOWN_LINK's string was directly moved into the MARDOWN_LINK_REGEX constructor as it's no longer reusable

MARKDOWN_IMAGE_REGEX was updated and although is similar to MARKDOWN_LINK the [alt text] part was made optional

chiragsalian
chiragsalian previously approved these changes Apr 3, 2024
Copy link
Contributor

@chiragsalian chiragsalian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, i just have 2 questions for you. If nothing needs to be modified I'll merge once answered.

lib/ExpensiMark.js Outdated Show resolved Hide resolved
__tests__/ExpensiMark-HTML-test.js Show resolved Hide resolved
@robertKozik
Copy link
Contributor

Hi! @kidroca
Can you check the case when label is a code block and shouldKeepRawInput is set to true? Seems wrong to me

    test.only('No html inside the alt attribute - pre tag', () => {
        const testString = '![```code```](https://example.com/image.png)';
        const resultString = '<img src="https://example.com/image.png" alt="&#x60;&#x60;&#x60;code&#x60;&#x60;&#x60;" data-raw-href="https://example.com/image.png" data-link-variant="labeled" />';
        expect(parser.replace(testString, {shouldKeepRawInput: true})).toBe(resultString);
    });

robertKozik added a commit to Expensify/react-native-live-markdown that referenced this pull request Apr 5, 2024
kidroca added 5 commits April 5, 2024 17:11
Support the `!(url)` to create images without alt text

🧪 Add test for image md shorthand conversion to html tag without alt text

ℹ️ Added tests for single image with alt text, single image with empty alt text, and single image short syntax without alt text in ExpensiMark-HTML-test.js.
…matching

The update aims to avoid matching and converting these rules if the match happens to be within an HTML attribute - in that case the content should not be converted to HTML
@kidroca kidroca force-pushed the kidroca/md-image-short-syntax branch from c3ac1a6 to 7a7c075 Compare April 5, 2024 14:12
@kidroca
Copy link
Contributor Author

kidroca commented Apr 5, 2024

@robertKozik, @chiragsalian, I have updated the pull request with a revised approach to handling HTML attributes. The original method aimed to clean up HTML within attributes but proved unreliable because distinguishing embedded HTML from the attribute content was challenging. Consider the example:

<img src="https://example.com/image.png" alt="<pre data-code-raw="code">code</pre>" data-raw-href="https://example.com/image.png" data-link-variant="labeled" /> />

In this case, the alt attribute includes additional quotes that complicate detecting the attribute's end due to the embedded HTML. To address this, I've modified the Regex patterns for the italic, bold, and strike rules to prevent matches within HTML attributes. Additionally, I have added tests to cover these scenarios (thanks to @robertKozik for the suggestion), and all existing tests for these rules continue to pass.

As the scope and focus of the PR have shifted, I kindly request a re-review. I have made an effort to limit the extent of changes for easier review.

@kidroca
Copy link
Contributor Author

kidroca commented Apr 5, 2024

@robertKozik, regarding:

include #676 PR as patch package

While using a patch package might seem convenient, I've updated the PR, which necessitates creating the patch again.

Initially, my suggestion to use the commit hash from my fork was intended as a temporary measure for testing purposes in your PR. This approach would enable us to validate the integration, merge the changes into expensify-common, and then revert to the original package source.
(Here's the hash if you've reconsidered: "expensify-common": "git+ssh://git@github.com/kidroca/expensify-common.git#7a7c075bff9a07eb280fc720bec65b6d2acab161")

Alternatively, if it’s more feasible, I can open a new PR in NewDot using a previous version of live-markdown. Let me know if you prefer this method, and I can set it up later.

replacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${this.escapeMarkdownEntities(g1)}" />`,
rawInputReplacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${this.escapeMarkdownEntities(g1)}" data-raw-href="${g2}" data-link-variant="labeled" />`
replacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}"${g1 ? ` alt="${this.escapeAttributeContent(g1)}"` : ''} />`,
rawInputReplacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}"${g1 ? ` alt="${this.escapeAttributeContent(g1)}"` : ''} data-raw-href="${g2}" data-link-variant="labeled" />`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor thing, sorry for bother @kidroca .
If user types !(link) data-link-variant should be set to "auto" rather than "labeled" indicating that it's typed using shorthand syntax

This is needed to distinguish between ![](link) and !(link)

@robertKozik
Copy link
Contributor

robertKozik commented Apr 5, 2024

Actually I'm still prefering patching the repo with your changes, rather than pointing to the forked repository.
I'll merge it and open the Expensify PR which will unblock you. We will get rid of the patch simply in the next library version.

I've added one more comment - do you think it's solvable? @kidroca

@kidroca
Copy link
Contributor Author

kidroca commented Apr 5, 2024

I've added one more comment - do you think it's solvable? @kidroca

Thanks, I'll post an update in a minute

@robertKozik
Copy link
Contributor

I've came up with something like that:
data-link-variant="${typeof(g1) === 'string' ? 'labeled': 'auto'}"

@kidroca kidroca force-pushed the kidroca/md-image-short-syntax branch from 22e8a4d to 046b6ec Compare April 5, 2024 18:05
@kidroca
Copy link
Contributor Author

kidroca commented Apr 5, 2024

Updated

@robertKozik
Copy link
Contributor

@kidroca This PR will unblock you

@robertKozik
Copy link
Contributor

@kidroca react-native-live-markdown bump is merged to main. You should be unblocked now ⛓️ 💥

@robertKozik
Copy link
Contributor

robertKozik commented Apr 15, 2024

Pls merge main, bold -> <strong/> seems to go under the radar in HTML attributes

kidroca added 2 commits April 15, 2024 20:10
# Conflicts:
#	__tests__/ExpensiMark-HTML-test.js
#	__tests__/ExpensiMark-Markdown-test.js
#	lib/ExpensiMark.js
ℹ️ Skip replacing bold text within code blocks during parsing for ExpensiMark-HTML-test
Comment on lines +25 to +29
test('Test bold within code blocks is skipped', () => {
const testString = 'bold\n```*not a bold*```\nThis is *bold*';
const replacedString = 'bold<br /><pre>*not&#32;a&#32;bold*</pre>This is <strong>bold</strong>';
expect(parser.replace(testString)).toBe(replacedString);
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this test to ensure the merge conflict was resolved correctly
The test is related to code added in: #681

@kidroca
Copy link
Contributor Author

kidroca commented Apr 15, 2024

Merged main there was a conflict related to bold handling, but it's resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants