Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain forward slashes in HTML img src paths #2811

Merged
merged 4 commits into from
Nov 8, 2024

Conversation

Andrew0Hill
Copy link
Contributor

@Andrew0Hill Andrew0Hill commented Nov 7, 2024

Currently, there is an issue in tweak_rmarkdown_html which causes forward slash characters in generated <img> tags to be escaped during the conversion from absolute to relative paths, causing the images to not load correctly.

i.e.
<img src="mysubdir/myimg.png">
becomes
<img src="mysubdir%2Fmyimg.png">
after the call to tweak_rmarkdown_html.

The proposed change preserves what I think is the intention of the original call to xml2::url_esacpe (to properly encode space characters) but adds the forward slash character as a reserved character to prevent it from being encoded.

After this change, all existing tests continue to pass.

I'm happy to try and write a test case for this issue as well if necessary.

Thanks,
Andrew

@Andrew0Hill Andrew0Hill changed the title Fixes issue where forward slashes are encoded in HTML img src paths (#2807) Fix issue where forward slashes are encoded in HTML img src paths (#2807) Nov 7, 2024
@Andrew0Hill
Copy link
Contributor Author

I believe this fixes #2807 as well.

@hadley
Copy link
Member

hadley commented Nov 7, 2024

A test case would be great please!

@@ -135,3 +135,30 @@ test_that("h1 section headings adjusted to h2 (and so on)", {
c("page-header", "section level2", "section level3", "section level2")
)
})

test_that("slashes not URL encoded during relative path conversion", {
# Create a site
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've changed only tweak_rmarkdown_html(), it would be better if the test could more explicitly check just that function — i.e. you don't need to set up an entire placeholder site, you can just pass html to tweak_rmarkdown_html().

Copy link
Contributor Author

@Andrew0Hill Andrew0Hill Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing.

I originally tried something similar to what you suggest, but I found that the path conversion portion of tweak_rmarkdown_html() makes two calls to fs::path_real() (lines 51 and 52), which will fail if the path referenced does not exist on the filesystem.

This means that whatever path is provided in the test HTML also needs to physically exist on the filesystem for the path conversion portion of the call to tweak_rmarkdown_html() to work properly. For this reason, I added the placeholder site with image, so that the paths referenced in the test HTML will exist, and the calls to fs::path_real() will succeed.

In the other existing test for tweak_rmarkdown_html() (line 112 in test_tweak_page.R) the test HTML doesn't contain any <img/> tags, so the path resolution code never runs (and therefore that test does not run into the same issue with the calls to fs::path_real()).

The only other way I could think of avoiding the placeholder package would be to set the img src in the test HTML to an image path that we know exists in the repository (like test_path("assets/kitten.jpg"), and then having it resolve the path relative to test_path() (untested idea below):

# Get the full path to the image.
img_path <- path_real(test_path("assets/kitten.jpg"))

# Get the path we want to resolve the relative path with respect to
par_img_path <- path_real(test_path())

# Simulate an output HTML file referencing the absolute path.
html <- xml2::read_html(
  sprintf('
  <body>
    <img src="%s" />
  </body>
  ', sim_path)
)

# Update absolute path to a relative path
tweak_rmarkdown_html(html, par_img_path)

# Output path should be a relative path with respect to `test_path()`
expect_equal(xpath_attr(html, ".//img", "src"), "assets/kitten.jpg")

I went with the method in my current commit though since the above version would fail if the assets/kitten.jpg file ever moved. (The version with the placeholder package still refers to this resource, but indirectly through pkg_add_kitten())

I'm happy to take another shot at this if you have any other suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation! In that case it sounds like you have written the minimal test, and I'd just suggest briefly summarising this discussion as a comment.

@jayhesselberth jayhesselberth changed the title Fix issue where forward slashes are encoded in HTML img src paths (#2807) Retain forward slashes in HTML img src paths Nov 8, 2024
@jayhesselberth jayhesselberth merged commit 3bee460 into r-lib:main Nov 8, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants