Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v8 development #559

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

v8 development #559

wants to merge 1 commit into from

Conversation

quantizor
Copy link
Owner

@quantizor quantizor commented Mar 22, 2024

Objectives

  • allow for multiple rendering targets (react, solid, etc) punting this one to a future major so this can be released faster
  • simplify text-handling regexes to optimize speed and complexity
  • general codebase refactoring / profiling to remove bottlenecks
  • potentially update the library name if we support rendering targets other than JSX punting this one to a future major so this can be released faster
  • expose the parser so it can be used without directly compiling to a target
  • custom rule support
  • ability to disable/enable particular rules
  • documentation

Copy link

changeset-bot bot commented Mar 22, 2024

🦋 Changeset detected

Latest commit: 4085815

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
markdown-to-jsx Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@quantizor quantizor changed the title allow for standalone use of AST parser, start isolating React rendering to allow for other renderers potentially v8 development Apr 5, 2024
@quantizor quantizor mentioned this pull request Apr 5, 2024
const TEXT_STRIKETHROUGHED_R = new RegExp(`^~~${INLINE_SKIP_R}~~`)
// https://regexr.com/7u91c
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '*' and containing many repetitions of '<>'.

Copilot Autofix AI 21 days ago

To fix the problem, we need to refactor the regular expression to remove the ambiguity and nested quantifiers that cause exponential backtracking. This can be achieved by breaking down the regular expression into simpler, non-ambiguous parts and ensuring that each part matches a specific pattern without overlap.

In this case, we can refactor the regular expression to handle each part separately and avoid nested quantifiers. We will replace the problematic part with a more efficient pattern that achieves the same functionality.

Suggested changeset 1
index.tsx

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/index.tsx b/index.tsx
--- a/index.tsx
+++ b/index.tsx
@@ -229,3 +229,3 @@
 const INLINE_FORMATTING_R =
-  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
+  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<[^<>]*>(?:<[^<>]*>)*|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
 
EOF
@@ -229,3 +229,3 @@
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<[^<>]*>(?:<[^<>]*>)*|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
const TEXT_STRIKETHROUGHED_R = new RegExp(`^~~${INLINE_SKIP_R}~~`)
// https://regexr.com/7u91c
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '*' and containing many repetitions of '['.

Copilot Autofix AI 21 days ago

To fix the problem, we need to modify the regular expression to remove the ambiguity that causes exponential backtracking. Specifically, we should replace the .*? pattern with a more specific pattern that avoids ambiguity. In this case, we can use a negated character class to match any character except the ones that would cause backtracking issues.

  • Modify the regular expression on line 230 to replace .*? with a more specific pattern.
  • Ensure that the new pattern maintains the existing functionality of the regular expression.
Suggested changeset 1
index.tsx

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/index.tsx b/index.tsx
--- a/index.tsx
+++ b/index.tsx
@@ -229,3 +229,3 @@
 const INLINE_FORMATTING_R =
-  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
+  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([][^)\]]*?[)\]]|<[^>]*>(?:[^<]*?<[^>]*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
 
EOF
@@ -229,3 +229,3 @@
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([][^)\]]*?[)\]]|<[^>]*>(?:[^<]*?<[^>]*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
const TEXT_STRIKETHROUGHED_R = new RegExp(`^~~${INLINE_SKIP_R}~~`)
// https://regexr.com/7u91c
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '*[](' and containing many repetitions of ')[]('.

Copilot Autofix AI 21 days ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

const TEXT_STRIKETHROUGHED_R = new RegExp(`^~~${INLINE_SKIP_R}~~`)
// https://regexr.com/7u91c
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '*<' and containing many repetitions of '><'.

Copilot Autofix AI 21 days ago

To fix the problem, we need to modify the regular expression to remove the ambiguity that causes exponential backtracking. Specifically, we should replace the .*? pattern with a more specific pattern that avoids the inefficiency.

  • We will replace .*? with a pattern that matches any character except for the ones that could cause backtracking issues.
  • The new pattern will be [^*<]*? which matches any character except * and <, thus preventing the problematic backtracking scenario.
Suggested changeset 1
index.tsx

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/index.tsx b/index.tsx
--- a/index.tsx
+++ b/index.tsx
@@ -229,3 +229,3 @@
 const INLINE_FORMATTING_R =
-  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
+  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<[^*<]*?>(?:[^*<]*?<[^*<]*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
 
EOF
@@ -229,3 +229,3 @@
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<[^*<]*?>(?:[^*<]*?<[^*<]*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
const TEXT_STRIKETHROUGHED_R = new RegExp(`^~~${INLINE_SKIP_R}~~`)
// https://regexr.com/7u91c
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '*<>' and containing many repetitions of '<><>'.

Copilot Autofix AI 21 days ago

To fix the problem, we need to modify the regular expression to remove the ambiguity that causes exponential backtracking. Specifically, we should replace the .*? pattern with a more precise pattern that avoids ambiguity. In this case, we can use a negated character class to match any character except the ones that would cause backtracking.

  • Identify the problematic regular expression on line 230.
  • Replace the .*? pattern with a negated character class that matches any character except the ones that would cause backtracking.
  • Ensure that the new pattern maintains the original functionality of the regular expression.
Suggested changeset 1
index.tsx

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/index.tsx b/index.tsx
--- a/index.tsx
+++ b/index.tsx
@@ -229,3 +229,3 @@
 const INLINE_FORMATTING_R =
-  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
+  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([][^)\]]*|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
 
EOF
@@ -229,3 +229,3 @@
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([][^)\]]*|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
const TEXT_STRIKETHROUGHED_R = new RegExp(`^~~${INLINE_SKIP_R}~~`)
// https://regexr.com/7u91c
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Check failure

Code scanning / CodeQL

Inefficient regular expression High

This part of the regular expression may cause exponential backtracking on strings starting with '*<><' and containing many repetitions of '><><'.

Copilot Autofix AI 21 days ago

To fix the problem, we need to modify the regular expression to remove the ambiguity that causes exponential backtracking. Specifically, we can replace .*? with a more specific pattern that avoids the inefficiency. In this case, we can use a negated character class to match any character except the ones that would cause backtracking issues.

  • Replace .*? with a more specific pattern that avoids ambiguity and backtracking.
  • Ensure the new pattern maintains the original functionality of the regular expression.
Suggested changeset 1
index.tsx

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/index.tsx b/index.tsx
--- a/index.tsx
+++ b/index.tsx
@@ -229,3 +229,3 @@
 const INLINE_FORMATTING_R =
-  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
+  /^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<[^>]*>(?:.*?<[^>]*>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
 
EOF
@@ -229,3 +229,3 @@
const INLINE_FORMATTING_R =
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<.*?>(?:.*?<.*?>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/
/^(([*_])\2|[*_]|~~|==)((?:\[.*?\][([].*?[)\]]|<[^>]*>(?:.*?<[^>]*>)?|([*_]+|`|~~|==)[\s\S]+?\4|[\s\S])+?)\1/

Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options

This comment was marked as off-topic.

index.tsx Fixed Show fixed Hide fixed
@quantizor
Copy link
Owner Author

quantizor commented Aug 18, 2024

Need to incorporate the changes from #594, #579

@quantizor quantizor force-pushed the split-parser branch 2 times, most recently from f070ddf to 1a8a393 Compare December 15, 2024 03:19
This allows for direct use of the markdown-to-jsx AST if it's preferable
for your use case to retain full control over output.

refactor: isolate React rendering rules

chore: upgrade dependencies

refactor: consolidate formatted text rules

chore: update dependencies

refactor: react 16

refactor: remove top-level compiler export

refactor: moving things around

chore: update benchmark

refactor: improve typings

refactor: refactor rules into tuple array, add disable/enableRules

feat: add custom rule support

forward-port fixes from 7.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant