Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(content-blog): links in feed should be absolute #9151

Merged
merged 20 commits into from
Aug 3, 2023

Conversation

VinceCYLiao
Copy link
Contributor

@VinceCYLiao VinceCYLiao commented Jul 17, 2023

… absolute

Pre-flight checklist

  • I have read the Contributing Guidelines on pull requests.
  • If this is a code change: I have written unit tests and/or added dogfooding pages to fully verify the new behavior.
  • If this is a new API or substantial change: the PR has an accompanying issue (closes #0000) and the maintainers have approved on my working plan.

Motivation

Fix #9136

Test Plan

added two mdx files in dogfood docs
website/_dogfooding/_blog tests/2023-07-19-a.mdx
website/_dogfooding/_blog tests/2023-07-19-b.mdx

Inside 2023-07-19-a.mdx are three links

[absolute full url](https://github.com/facebook/docusaurus)

[absolute url with implicit domain name](/tests/blog/2023/07/19/b)

[relative url](2023-07-19-b.mdx)

Visit /tests/blog/feed.json
1st link stays untouched
2nd link resolved as "https://docusaurus.io/tests/blog/2023/07/19/b"
3rd link also resolved as "https://docusaurus.io/tests/blog/2023/07/19/b"
Which are correct.
image

Test links

Deploy preview: https://deploy-preview-9151--docusaurus-2.netlify.app/blog/feed.json

Related issues/PRs

issue 9136

@facebook-github-bot
Copy link
Contributor

Hi @VinceCYLiao!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@netlify
Copy link

netlify bot commented Jul 17, 2023

[V2]

Name Link
🔨 Latest commit da0fb44
🔍 Latest deploy log https://app.netlify.com/sites/docusaurus-2/deploys/64cbd22d4923da00086e8c9f
😎 Deploy Preview https://deploy-preview-9151--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@github-actions
Copy link

github-actions bot commented Jul 17, 2023

⚡️ Lighthouse report for the deploy preview of this PR

URL Performance Accessibility Best Practices SEO PWA Report
/ 🟠 83 🟢 97 🟠 83 🟢 100 🟠 89 Report
/docs/installation 🟠 76 🟢 100 🟠 83 🟢 100 🟠 89 Report

@Josh-Cena Josh-Cena changed the title fix: #9136 Links in blog posts rendered in a feed (rss/atom/json) should be absolute fix(content-blog): links in feed should be absolute Jul 17, 2023
Copy link
Collaborator

@Josh-Cena Josh-Cena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current solution is too error-prone because it includes too much custom logic. Plus it does not work with all relative links. Consider the following:

<a href="another-post">link</a>

If the page is at /blog/2020/09/13/current-post, then the href should point to /blog/2020/09/13/another-post, not /another-post. I.e. the resolver needs to be aware of the current URL, not just the site's URL.

Also, I would prefer not manually joining URLs in any case. Why not elm.attribs.href = String(new URL(elm.attribs.href, currentPageURL))?

@VinceCYLiao
Copy link
Contributor Author

The current solution is too error-prone because it includes too much custom logic. Plus it does not work with all relative links. Consider the following:

<a href="another-post">link</a>

If the page is at /blog/2020/09/13/current-post, then the href should point to /blog/2020/09/13/another-post, not /another-post. I.e. the resolver needs to be aware of the current URL, not just the site's URL.

Also, I would prefer not manually joining URLs in any case. Why not elm.attribs.href = String(new URL(elm.attribs.href, currentPageURL))?

Thanks for your comments. Sorry that I miss clicked the request review button. I'll try to provide a better solution.

@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Jul 17, 2023
@VinceCYLiao
Copy link
Contributor Author

After reading the Nodejs's doc for the URL class, I found that the URL class itself can just handle this issue.
I have revised my code accordingly and built the website locally, and checked the hrefs in the generated feed files are resolved correctly.

@Josh-Cena Please review at your convenience. Thank you!
feed.json of deploy preview

@VinceCYLiao
Copy link
Contributor Author

Sorry that I forgot to run yarn test. Didn't noticed that so many tests will fail due to the changes. I'll look into how to fix the tests if you find my solution fine.

@Josh-Cena
Copy link
Collaborator

I think it looks good! In fact the test changes look expected to me. I reckon you don't need to resolve paths that are just anchor links, only ones that actually point to another page. We would also need test cases (either adding a new test post file, or adding a link to the existing post file)

@VinceCYLiao
Copy link
Contributor Author

I think it looks good! In fact the test changes look expected to me. I reckon you don't need to resolve paths that are just anchor links, only ones that actually point to another page. We would also need test cases (either adding a new test post file, or adding a link to the existing post file)

I have added a test case and updated the test plan. Please let me know if it's ok for you. Thanks!

Copy link
Collaborator

@Josh-Cena Josh-Cena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the tests to https://github.com/facebook/docusaurus/tree/main/packages/docusaurus-plugin-content-blog/src/__tests__/__fixtures__/website instead? No one is going to look at the feed of the dogfooding blog.

…quashed commits)

Squashed commits:
[2db488373] chore: add a new file to test href resolving
[6c18cea] docs: added to test if href resolved correctly in feed
@VinceCYLiao
Copy link
Contributor Author

I was thinking to parse the links from the feeds to check if they are correctly resolved, but found it's hard to so and maybe kind of meaningless since the links in feeds are from the object returned by the defaultCreateFeedItems function.
So my idea is to just write test case for the defaultCreateFeedItems function, checking if the full absolute path links are stay on touched while other links are correctly prefixed.
Don't know if the test makes sense for you, and if it is bad to export the defaultCreateFeedItems function just for unit test.

Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

The implementation looks good. 👍

But the way it is tested looks surprisingly complex to me, and unit tests are not passing.

We also need to absolutize image URLs.

Let me know if you need help to figure out how to test that.

Comment on lines 110 to 115
$(`div#${blogPostContainerID} a`).each((_, elm) => {
const {href} = elm.attribs;
if (href) {
elm.attribs.href = String(new URL(href, link));
}
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

We will also need to convert image links to absolute, see
#9136 (comment)

https://validator.w3.org/feed/docs/warning/ContainsRelRef.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Image links now are absolutized.

@@ -196,3 +228,95 @@ describe.each(['atom', 'rss', 'json'])('%s', (feedType) => {
fsMock.mockClear();
});
});

describe('Test defaultCreateFeedItems', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test looks super complex to me and I don't understand why.

Just call createBlogFeedFiles and take a snapshot: we'll review the snapshot and validate it contains what we expect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I don't need to add new test case; instead I just need to update the snapshot. Am I correct ?

Comment on lines 71 to 95
function isFullAbsolutePath(str: string) {
const domain = 'https://domain.com';
const {origin} = new URL(str, domain);
return origin !== domain;
}

async function generateLinksOfBlogPosts(outDir: string, blogPosts: BlogPost[]) {
const linksOfBlogPosts: {[postId: string]: string[]} = {};
const pathOfFile = path.join(outDir, 'blog');
const promises = blogPosts.map(async (post) => {
try {
const content = await readOutputHTMLFile(post.id, pathOfFile, true);
const $ = cheerioLoad(content);
const anchorElements = $(`div#${blogPostContainerID} a`);
if (anchorElements.length > 0) {
const href = anchorElements.map((_, elm) => elm.attribs.href).toArray();
linksOfBlogPosts[post.id] = href;
}
} catch {
// post is a draft
}
});
await Promise.all(promises);
return linksOfBlogPosts;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need that complexity inr our tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case is removed

@VinceCYLiao
Copy link
Contributor Author

Thanks

The implementation looks good. 👍

But the way it is tested looks surprisingly complex to me, and unit tests are not passing.

We also need to absolutize image URLs.

Let me know if you need help to figure out how to test that.

Thanks! I'll make the image URLs also absolutized.

Regarding the test, I'm thinking to create a new mock file which contains anchor elements with absolute/relative/anchor link, and image element with absolute/relative source URLs. And then in the test case just call createBlogFeedFiles and take a snapshot.

@VinceCYLiao
Copy link
Contributor Author

Tested in local and all unit tests are passed.

@VinceCYLiao VinceCYLiao requested a review from slorber July 23, 2023 08:41
@slorber slorber added the pr: bug fix This PR fixes a bug in a past release. label Jul 27, 2023
Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't undersand how it works anymore 😅

@Josh-Cena do you remember what updates the build-snap folder exactly? Is this updated manually?

@VinceCYLiao how did this PR generate that new src/__tests__/__fixtures__/website/build-snap/blog/blog-with-links/index.html file?

The CI is failing and snapshots are not easy to review 😓
Surprisingly unit tests are passing locally, but not on GitHub action 🤷‍♂️

@@ -0,0 +1,31 @@
<!doctype html>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this file generated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the blog-with-links.mdx in the website/blog folder and ran yarn:build:website:blogOnly. If this is not the way how the files in fixtures created, please let me know the create way to do it.

Comment on lines 1 to 3
import dino from "../static/img/docusaurus.png";
import useBaseUrl from '@docusaurus/useBaseUrl';

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ES imports are supposed to come after front matter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of imports are now correct. I also moved the front matter to the beginning of the file. Misplacing front matter seems to be the reason why tests failed.

Comment on lines 123 to 130
elm.attribs.srcset = srcset
.split(',')
.map((s) => {
const [imageURL, ...descriptors] = s.trim().split(/\s+/);
const newImageURL = new URL(imageURL ?? '', link).href;
return [newImageURL, ...descriptors].join(' ');
})
.join(', ');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a bit unsafe/risky, maybe introduce a dedicated lib to manipulate srcset reliably instead? see https://www.npmjs.com/package/srcset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll look into it and revise my code

Copy link
Contributor Author

@VinceCYLiao VinceCYLiao Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and since latest version of srcset is pure ESM, so I have to use the previous version.

@Josh-Cena
Copy link
Collaborator

do you remember what updates the build-snap folder exactly? Is this updated manually?

I think I added this part of test but I don't remember how it works either. My guess is it's manual.

@VinceCYLiao
Copy link
Contributor Author

VinceCYLiao commented Jul 29, 2023

I don't undersand how it works anymore 😅

@Josh-Cena do you remember what updates the build-snap folder exactly? Is this updated manually?

@VinceCYLiao how did this PR generate that new src/__tests__/__fixtures__/website/build-snap/blog/blog-with-links/index.html file?

The CI is failing and snapshots are not easy to review 😓 Surprisingly unit tests are passing locally, but not on GitHub action 🤷‍♂️

image

Just run the tests again and they are all passed. Sorry for the confusion and I'll look into why the tests are failing on github.

Copy link
Collaborator

@Josh-Cena Josh-Cena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks great to me! Just one stylistic suggestion.

packages/docusaurus-plugin-content-blog/src/feed.ts Outdated Show resolved Hide resolved
packages/docusaurus-plugin-content-blog/src/feed.ts Outdated Show resolved Hide resolved
slorber and others added 2 commits August 3, 2023 16:12
Co-authored-by: Joshua Chen <sidachen2003@gmail.com>
Co-authored-by: Joshua Chen <sidachen2003@gmail.com>
@slorber
Copy link
Collaborator

slorber commented Aug 3, 2023

Hold on, I'm fixing the build-snap generation thing in another PR before merging this PR

@Josh-Cena
Copy link
Collaborator

Note that you would also want to rebase to get rid of the extra commits

@slorber slorber merged commit 109ab0c into facebook:main Aug 3, 2023
28 of 29 checks passed
This was referenced Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA pr: bug fix This PR fixes a bug in a past release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blog Atom/RSS feed urls should be absolute instead of relative
4 participants