-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/bin/bash: gsed: command not found #99
Comments
You're exactly right, I'm not sure how or why it worked on my end. I will appreciate your Pull Request a lot! |
Merged
In case you might have not seen it, pull request #100 resolves this issue. |
adamziel
added a commit
that referenced
this issue
Oct 28, 2024
Prototypes a `wp_rewrite_urls()` URL rewriter for block markup to migrate the content from, say, `<a href="https://adamadam.blog">` to `<a href="https://adamziel.com/blog">`. * URL rewriting works to perhaps the greatest extent it ever did in WordPress migrations. * The URL parser requires PHP 8.1. This is fine for some Playground applications, but we'll need PHP 7.2+ compatibility to get it into WordPress core. * This PR features `WP_HTML_Tag_Processor` and `WP_HTML_Processor` to enable usage outside of WordPress core. ### Details This PR consists of a code ported from https://github.com/adamziel/site-transfer-protocol. It uses a cascade of parsers to pierce through the structured data in a WordPress post and replace the URLs matching the requested domain. The data flow is as follows: Parse HTML -> Parse block comments -> Parse attributes JSON -> Parse URLs On a high level, this parsing cascade is handled by the `WP_Block_Markup_Url_Processor` class: ```php $p = new WP_Block_Markup_Url_Processor( $block_markup, $base_url ); while ( $p->next_url() ) { $parsed_matched_url = $p->get_parsed_url(); // .. do processing $p->set_raw_url($new_raw_url); } ``` Getting more into details, the `WP_Block_Markup_Url_Processor` extends the `WP_HTML_Tag_Processor` class and walks the block markup token by token. It then drills down into: * Text nodes – where matches URLs using regexps. This part can be improved to avoid regular expressions. * Block comments – where it parses the block attributes and iterates through them, looking for ones that contain valid URLs * HTML tag attributes – where it looks for ones that are reserved for URLs (such as `<a href="">`, looking for ones that contain valid URLs The `next_url()` method moves through the stream of tokens, looking for the next match in one of the above contexts, and the `set_raw_url()` knows how to update each node type, e.g. block attributes updates are `json_encode()`-d. ### Processing tricky inputs When this code is fed into the migrator: ```html <!-- wp:paragraph --> <p> <!-- Inline URLs are migrated --> 🚀-science.com/science has the best scientific articles on the internet! We're also available via the punycode URL: <!-- No problem handling HTML-encoded punycode URLs with urlencoded characters in the path --> https://xn---science-7f85g.com/%73%63ience/. <!-- Correctly ignores similar–but–different URLs --> This isn't migrated: https://🚀-science.comcast/science <br> Or this: super-🚀-science.com/science </p> <!-- /wp:paragraph --> <!-- Block attributes are migrated without any issue --> <!-- wp:image {"src": "https:\/\/\ud83d\ude80-\u0073\u0063ience.com/%73%63ience/wp-content/image.png"} --> <!-- As are URI HTML attributes --> <img src="https://xn---science-7f85g.com/science/wp-content/image.png"> <!-- /wp:image --> <!-- Classes are not migrated. --> <span class="https://🚀-science.com/science"></span> ``` This actual output is produced: ```html <!-- wp:paragraph --> <p> <!-- Inline URLs are migrated --> science.wordpress.com has the best scientific articles on the internet! We're also available via the punycode URL: <!-- No problem handling HTML-encoded punycode URLs with urlencoded characters in the path --> https://science.wordpress.com/. <!-- Correctly ignores similar–but–different URLs --> This isn't migrated: https://🚀-science.comcast/science <br> Or this: super-🚀-science.com/science </p> <!-- /wp:paragraph --> <!-- Block attributes are migrated without any issue --> <!-- wp:image {"src":"https:\/\/science.wordpress.com\/wp-content\/image.png"} --> <!-- As are URI HTML attributes --> <img src="https://science.wordpress.com/wp-content/image.png"> <!-- /wp:image --> <!-- Classes are not migrated. --> <span class="https://🚀-science.com/science"></span> ``` ## Remaining work - [x] Add PHPCBF - [x] Get to zero CBF errors - [x] Get the unit tests to run in CI (e.g. run `composer install`) - [x] Add relevant unit tests coverage ## Follow-up work - [x] Patch `WP_HTML_Tag_Processor` in WordPress core, see WordPress/wordpress-develop#7007 (comment) - [ ] Package our copy of `WP_HTML_Tag_Processor` as a "WordPress polyfill" for standalone usage. - [ ] Make it compatible with PHP 7.2+ ## Testing Instructions (or ideally a Blueprint) CI runs the PHP unit tests. To run this on your local machine, do this: ```sh cd packages/playground/data-liberation composer install cd ../../../ nx test:watch playground-data-liberation ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When running the task
build:wp
, I'm seeing the following error.It's coming from
src/wordpress-playground/wordpress/Dockerfile
.From a quick search, it seems
gsed
is GNUsed
renamed by Homebrew on macOS. Inside the Docker container, I believe the above lines should be callingsed
instead. If so, I'd be happy to make a little pull request.The text was updated successfully, but these errors were encountered: