Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby-transformer-remark, gatsby-plugin-mdx) Make timeToRead configurable #19763

Merged
merged 6 commits into from
Apr 28, 2020

Conversation

johnrichter
Copy link
Contributor

@johnrichter johnrichter commented Nov 24, 2019

I've been wanting this for a while (since not everyone reads at 265 wpm) and had some time to work on it this evening.

Description

This PR makes timeToRead calculation user configurable. I went with the simple solution to start.

{ 
  // Mimics a reading rate of 42 words per minute
  timeToRead: (wordCount, html, md) => wordCount / 42
}

More advanced potential solution

It would make sense to give the user more access to the markdown node for more flexible and precise calculation (e.g. code samples and complex graphs). I was thinking this could look like

{
  timeToRead: (wordCount, { markdownNode, getAST, getMarkdownAST, getHeadings, getTableOfContents, getHTMLAst, getHTML, getExcerptAst, getExcerpt }) => wordCount / 42
}

It essentially gives access to all of the markdownNode field resolvers. wordCount would now be based off of markdownNode.rawMarkdownBody instead of the sanitized HTML. It's more complicated, but gives the user everything they'd need to calculate timeToRead as precisely as they'd want.

Related Issues

Related to #16760. It would allow users to calculate the value in the they want to vs Gatsby building and maintaining a custom one-size-fits-all solution.

gatsby-plugin-mdx

This plugin also has a timeToRead that can be updated with the same simple solution. It doesn't look on first glance that it could be updated to support the more advanced solution above though.

@johnrichter johnrichter requested review from a team as code owners November 24, 2019 22:09
@johnrichter
Copy link
Contributor Author

I went ahead and added support to gatsby-plugin-mdx as well. I could use some help writing tests for it though. I have no idea what I'm doing when it comes up to setting up brand new tests for a gatsby plugin (or a file within that plugin) and I'd like to make sure a test exists before it gets shipped.

@johnrichter johnrichter changed the title feat(gatsby-transformer-remark) Make timeToRead configurable feat(gatsby-transformer-remark, gatsby-plugin-mdx) Make timeToRead configurable Nov 24, 2019
pvdz
pvdz previously approved these changes Nov 26, 2019
Copy link
Contributor

@pvdz pvdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me.

I wonder how frequent this feature is requested to warrant it being an option / api.

@johnrichter
Copy link
Contributor Author

That is a good question. My guess would be that it isn't something that a lot of people care about. Searching timeToRead in the Gatsby repo only returns a few issues that mention it specifically. The ones that do exist are about the accuracy of the count when considering things like code blocks.

What I do think could be improved more generally with the default calculation is switching away from using the HTML string to something without syntax of any kind. It should be possible to parse Markdown AST to do this in a fairly straightforward manner. If you look at this line of the PR, the expected count should be 7, but it is 9 because the paragraph tags are parsed as words. This is also happening in the default case and I suspect part of the reason why the default WPM is quite high (imo).

@pieh
Copy link
Contributor

pieh commented Nov 26, 2019

We are doing some pure text extraction / stripping tags for excerpt (for sure in remark, not sure if we do that in mdx) so maybe something that could be reused?

@pieh
Copy link
Contributor

pieh commented Nov 26, 2019

Also, if current const avgWPM = 265 is unusually high, maybe we should adjust that as well? I'm not sure where that value came from honestly (I can trace it back to 509175e#diff-db344a7cdc7479489764106f94b249a0R113 which seems like initial commit for what now is gatsby-transformer-remark 3 years ago and it wasn't touched since then)

@pvdz
Copy link
Contributor

pvdz commented Nov 28, 2019

@johnrichter @pieh what do we want to do with this?

John, the changes you suggested sound good in any case, do you have space to try and improve that system?

@johnrichter
Copy link
Contributor Author

I have a bit of time next week to try to get a more accurate solution in place.

@pieh's idea of adapting the plain excerpt implementation for this is a great starting point. I'd like to keep an escape hatch for those that would want to customize the calculation with the goal of making it useful for edge cases like customizing how long inline React components would take to read.

I'm thinking that I could try to do a bit of research to figure out good defaults for the the time it takes to process images and code blocks. The latter of which will be tricky since some languages are much more readable than others. Paul Stamatiou recently mentioned that he

read somewhere that Medium.com adds 12 seconds to the estimated reading time per image used in the article.

That makes me hopeful that I could find some sort of useful default values.

@LekoArts LekoArts added status: awaiting author response Additional information has been requested from the author topic: MDX labels Jan 8, 2020
@LekoArts
Copy link
Contributor

@johnrichter Hi, do you have time to wrap up this PR? :)

@johnrichter
Copy link
Contributor Author

Hi @LekoArts, at the moment I don't have the time to go into the level of detail I mentioned above. If the changes in this PR are good enough for now I have enough spare cycles to rebase and complete whatever else is needed to get this shipped. My apologies for not getting back to y'all on this PR sooner!

I still want to improve the calculation in a more meaningful way in the future, but if you're looking at officially overhauling it by all means go for it 🙂

Copy link
Contributor

@muescha muescha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function signatures not match

{
resolve: `gatsby-plugin-mdx`,
options: {
timeToRead: (wordCount, mdxNode) => wordCount / 42,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not match timeToRead(wordCount, html, md)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 👍

1,
Math.round(
_.isFunction(timeToRead)
? timeToRead(words, mdxNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not match timeToRead(wordCount, html, md)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 I had to add some functionality, but the timeToRead signature now matches for both MDX and Remark.

@johnrichter
Copy link
Contributor Author

I've rebased on master as well

@@ -604,7 +605,9 @@ module.exports = (
timeToRead: {
type: `Int`,
resolve(markdownNode) {
return getHTML(markdownNode).then(timeToRead)
return getHTML(markdownNode).then(html =>
timeToRead(markdownNode, html, calcTimeToRead)
Copy link
Contributor

@muescha muescha Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you rename the utils function, it is more easy to understand (i asked myself why timeToRead(markdownNode, html, calcTimeToRead) as a different signature compared to the other timeToRead(words, html, rawMD*))

Suggested change
timeToRead(markdownNode, html, calcTimeToRead)
timeToReadCalculator(markdownNode, html, timeToRead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great point. I thought it was confusing too to be honest and didn't think about renaming the util function. Let me know if you like the new naming 👍

@muescha
Copy link
Contributor

muescha commented Mar 4, 2020

just now the function gets different TTR? in -mdx you don't count the non latin chars

@johnrichter
Copy link
Contributor Author

johnrichter commented Mar 4, 2020

For the original code in the PR, the utility functions hadn't been created yet and all of that was mostly inline. The utility function still resides in the Remark transformer so I don't want to create a dependency on the Remark transformer in the MDX plugin. I could extract the utils to a shared package somewhere (optimal) or just copy and paste them to mdx (have to maintain in both places). Which one depends on what y'all would like to do really since I'm indifferent.

Historically, MDX and Remark have had different TTRs (once the character handling code was added to Remark) so I left it as is. For a first pass at creating better time to read calculations the main focus of this PR is passing the responsibility to the user if they want to change it.

@muescha
Copy link
Contributor

muescha commented Mar 5, 2020

yes you are right, that is not the scope of this PR

@vladar vladar removed the status: awaiting author response Additional information has been requested from the author label Apr 16, 2020
@pvdz
Copy link
Contributor

pvdz commented Apr 20, 2020

@johnrichter tests are failing even after rebase. If we can get the tests pass I'm inclined to merge it. Please let me know whether you want to fix the tests.

@johnrichter johnrichter force-pushed the Configurable-TimeToRead branch from c6f32ab to 0284d48 Compare April 24, 2020 17:39
@johnrichter
Copy link
Contributor Author

Hi @pvdz, I'm working on getting those tests working today. Did a quick rebase on master again before I start working on getting the tests to pass.

@johnrichter
Copy link
Contributor Author

@pvdz Looks like rebasing was all that was needed 👍

@pvdz pvdz merged commit 8586ca3 into gatsbyjs:master Apr 28, 2020
@gatsbot
Copy link

gatsbot bot commented Apr 28, 2020

Holy buckets, @johnrichter — we just merged your PR to Gatsby! 💪💜

Gatsby is built by awesome people like you. Let us say “thanks” in two ways:

  1. We’d like to send you some Gatsby swag. As a token of our appreciation, you can go to the Gatsby Swag Store and log in with your GitHub account to get a coupon code good for one free piece of swag. We’ve got Gatsby t-shirts, stickers, hats, scrunchies, and much more. (You can also unlock even more free swag with 5 contributions — wink wink nudge nudge.) See gatsby.dev/swag for details.
  2. We just invited you to join the Gatsby organization on GitHub. This will add you to our team of maintainers. Accept the invite by visiting https://github.com/orgs/gatsbyjs/invitation. By joining the team, you’ll be able to label issues, review pull requests, and merge approved pull requests.

If there’s anything we can do to help, please don’t hesitate to reach out to us: tweet at @gatsbyjs and we’ll come a-runnin’.

Thanks again!

@pvdz
Copy link
Contributor

pvdz commented Apr 28, 2020

@johnrichter Thank you for the additional effort 💜 Sorry it took this long, but happy we can merge it now :D

@johnrichter
Copy link
Contributor Author

You're welcome! Thanks for merging it in. I can't wait to use it when it gets released 😄

Also, I had no idea ya'll hand out swag for PRs 😍 😍 😍

wardpeet added a commit that referenced this pull request May 4, 2020
wardpeet added a commit that referenced this pull request May 4, 2020
#23726)

Revert "feat(gatsby-transformer-remark, gatsby-plugin-mdx) Make timeToRead configurable (#19763)" 

This reverts commit 8586ca3.

Fixes #23614
Fixes #23714
@wardpeet
Copy link
Contributor

wardpeet commented May 4, 2020

@johnrichter we had to revert this change because of Out of memory issues and slow query builds.

see #23614 & #23714

@johnrichter
Copy link
Contributor Author

☹️ Interesting. In practice all this did was add a plugin option and leave the original code intact. Perhaps the changes that were pulled in by rebasing changed something more fundamental. I'll take a look and see what happened.

@muescha
Copy link
Contributor

muescha commented May 7, 2020

i have two ideas:

  1. maybe use genHtml only if there is a custom timeToRead function defined? other wise there no calculation needed for html
  2. more general: maybe it is possible to cache some calculations for the resolvers (i see the some functions called many times for example: genHtml, processMdx)
    i see getAST has a cache - maybe apply the same to the others....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants