Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML5 feature: lazy loading images #6197

Open
gwern opened this issue Mar 19, 2020 · 11 comments
Open

HTML5 feature: lazy loading images #6197

gwern opened this issue Mar 19, 2020 · 11 comments

Comments

@gwern
Copy link
Contributor

gwern commented Mar 19, 2020

There is a newly-standardized HTML5 feature, image lazy loading, which allows images in a web page to be loaded only shortly before they would come on screen, potentially greatly reducing bandwidth consumption and page loading speed (and which replaces the many nasty JS hacks used before*). One simply adds an attribute loading="lazy" to tags such as <img> tags.

I think Pandoc should generate HTML5 which does lazy loading by default. There are few downsides, and a huge upside, and users should receive the benefit of it automatically**. It is

  1. fully backwards-compatible; it does not break any existing browsers or degrade any functionality in browsers that do not recognize it

  2. fully standardized & accepted (discussion)

  3. already widely available: Chrome shipped it by default in Chrome 76 ~August 2019, and Firefox 75 ~February 2020 (current global availability: >63%)

  4. saves a potentially enormous amount of bandwidth, particularly for mobile users, who pay the most for bandwidth while benefiting the most from lazy-loading due to small screens

    For example, on one gwern.net page it saves something like 20MB. It also results in a much more pleasant browsing experience in my experience. I've become much more willing to include images in my pages now that I know lazy loading works.

  5. reliable and bug-free

    I am not aware of any use-cases that lazy loading breaks other than printing in Chrome (see below).

    I've had it enabled on all gwern.net pages/images since 19 Sep 2019 (see staticImg in hakyll.hs), hundreds of thousands of page loads ago, and no problems have been reported to me, nor have I noticed any in my own browsing, aside from sometimes (with a 4k fullscreened browser in portrait orientation) images flashing into view when I page down too fast for it. (But it's rare and at first I wasn't sure I had enabled lazy loading correctly until I set up the network monitor to check when the image requests were made.)

  6. easy to add: it's just adding an attribute to all generated figures, with no IO or any logic involved. Just add an attribute in src/Text/Pandoc/Writers/HTML.hs's image-handling.

    I think it might go something like this (I actually implemented my rewrite in Tagsoup as a post-processing step, because I didn't want to reinstall my entire Pandoc stack just to check this would work):

    img <- inlineToHtml opts (Image attr [Str ""] (s,tit))
    

    to

    img <- inlineToHtml opts (Image (attr++[("loading", "lazy")]) [Str ""] (s,tit))
    

The main drawback I've found so far is that

  1. printing: in Chrome, printing may omit images which haven't been seen yet, which is an open bug.

    They acknowledge it's a bug, so I hope they'll fix it since Wikimedia is complaining about it, and printing partially-loaded pages in Chrome is a pretty narrow and specific edge case with an easy workaround for the user of just scrolling to the bottom before printing. (There aren't any mentions of a bug like this for Firefox or Edge; I guess no one ever tries to print image-heavy gwern.net pages from Chrome, because it hasn't come up.)

* nasty because most of them required breaking image loading entirely for people without JS enabled. And then didn't even work very well - I assume because they generally didn't use IntersectionObservers to properly start fetching the image well before the user scrolled it into visibility.
** It can be stripped out of the HTML, if necessary somehow, with as simple a rewrite as a call to sed.

@jgm
Copy link
Owner

jgm commented Mar 19, 2020

This should probably be discussed on pandoc-discuss.

@jgm
Copy link
Owner

jgm commented Mar 19, 2020

Note, currently you can't put this on manually, which seems a bug:

% pandoc
![myimg](url){loading=lazy}
<figure>
<img src="url" data-loading="lazy" alt="" /><figcaption>myimg</figcaption>
</figure>

The data- prefix is added to attributes that aren't in our list of valid HTML5 attributes. I don't know where I got the list, but perhaps the addition of loading is newish and needs to be added?

Once that change is made, you could add the lazy loading with an explicit attribute, or do it globally using a simple filter. That might reduce any pressure to make a change in pandoc's default behavior.

@gwern
Copy link
Contributor Author

gwern commented Mar 19, 2020

Alright, I will copy it over.

Sounds like a bug. It's part of the standard now, after all, within the past few months. If you don't update the whitelist, I guess that'll happen.

Doing explicit attributes isn't very useful, though. Lazy loading is something you'd want for all your images, or none. I think almost all users will want it enabled, and shouldn't have to opt into a filter; if there are users who are convinced it doesn't work or are very concerned about the Chrome printing bug, they could just as easily write a filter to remove it (or, as I said, since it's an attribute, they can filter it out as easily as sed -e 's/ loading="lazy"//' or so). It doesn't make sense to force 99% of users to enable a filter to enable a feature rather than 1% enable a filter to disable a feature, IMO.

@jgm
Copy link
Owner

jgm commented Mar 19, 2020

I'm adding loading and some others to the whitelist.

I don't think I want to make this the default yet. Too new, not yet the standard way of doing things. Pandoc tries to produce fairly vanilla HTML. It's not hard to use a filter to change images in bulk.

@szhorvat
Copy link

szhorvat commented Apr 2, 2020

To make this work well, shouldn't one also specify the image size explicitly in the img tag? Pandoc will not (cannot?) include that.

@gwern
Copy link
Contributor Author

gwern commented Mar 10, 2021

szhorvat: I don't think that makes any difference. Regardless of whether you use lazy, a dimension-less img tag still works. The image loads, and the page reflows as necessary. The only difference might be when that happens, but the same amount of work is going to happen if the browser is not given the dimension data it needs. Nothing new there.

(Pandoc could add img height/width but John has chosen not to, IIRC, because it leads to problems with inconsistency in compile-time vs deploy-time: if you look up 'foo.jpg' at compile-time to inline its height/width, maybe foo.jpg doesn't exist, but will exist once it's copied to the server. Or maybe they'll be different images with different dimensions which have the same path. Stuff like that. The user needs to ensure that it's OK, Pandoc can't on its own.)


To update my other comments while I'm here: lazy penetration is now >73% globally. (This is not going to increase much because the remnant is now just Safari users; ~100% of FF/Chrome/Opera/Edge users support lazy images now.) The Chrome print bug is still open and inactive. We have run into 1 bug in the year since on gwern.net, which was due to an interaction between our custom 'fullwidth image' layout JS & lazy-loading (depending on where the user loaded the page the image might be loaded or not in time for the fullwidth rewrite to maximize the image size to be edge-to-edge, so it might be shifted left a few hundred pixels incorrectly) and fixed the fullwidth code to avoid it. Otherwise, my experience and that of my readers remains positive and that it's a big bandwidth & page load win.

@gwern
Copy link
Contributor Author

gwern commented May 28, 2022

Updates:

  • The Chrome bug has been fixed.
  • Global support is still at ~70%; CanIUse reports 81% because Safari has implemented it behind an optional flag, but that doesn't really count - however once Safari makes it default soon, support will gradually increase to around 90%.
  • No further bug reports from my use in Pandoc over another million pageviews.
  • No new problems with lazy-loading images have surfaced.

@gwern
Copy link
Contributor Author

gwern commented Nov 4, 2022

Update: Safari has shipped it. CanIUse reports 92.32% global penetration.

@jgm
Copy link
Owner

jgm commented Nov 4, 2022

Good. I'm still not sure about making this the default, but note that it's easy to achieve using a 3-line Lua filter:
lazy.lua

function Image(el)
  el.attributes.loading = "lazy"
  return el
end
pandoc --lua-filter lazy.lua

@gwern
Copy link
Contributor Author

gwern commented Jun 1, 2024

Another 2-year update: now 96% global penetration. Still continues to work perfectly for us over million+ pageviews, still no issues I am aware of anywhere which make defaulting to it a bad idea. (Even Wordpress now defaults to it, which is why it's approaching a third of websites.)

I continue to suggest enabling this by default in Pandoc HTML output due to the large performance benefits and the ever increasing use of images in general (particularly with generative AI now making it trivial to dump in a bunch of illustrative images into one's Pandoc-generated blog and people doing so, despite the bad taste).

@jgm
Copy link
Owner

jgm commented Jun 1, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants