-
-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serve static HTML #634
Comments
As a strong Jamstack advocate, I'm in. I recommend using 11ty which is a pretty straightforward node SSR. 11ty support nunjucks templates, and nunjucks is heavily inspired by jinja2 so I guess migrating the templates won't be long. And there's an easy way to deal with internationalization through conventions and a content key. Regarding the contributors list, we can always request the data through XRH and insert it in the DOM after a shuffle. |
Thanks @borisschapira. Question about 11ty: does that require us to change our development workflow as well? Or could we simply take the static resources from our current local setup and push that out to the 11ty server? |
This is effectively the same as shuffling in JS with the drawback of having to block rendering on the XHR. If we did shuffle in JS we could always bake the JSON contributors into the HTML document. 😕 |
I'm not convinced server side rendering is the problem. The TOC chapter is the only one of the one you've tested which has no hero image. That's why it's faster IMHO. The delay is in downloading the images - and particularly the hero image which will be delivered as the large image (though from last release it will use smaller images if screen doesn't support larger images and Does the CrUX data show TTFB which would be a better indicator of whether the server side rendering was holding us up? I would also be interested in seeing the CrUX data in a months time as we implemented a number of fixes this weekend in the first release since the freeze:
I would hope these would improve things, though I admit it doesn't appear to have from above graphs despite apparently including this week when those changes were already live 😔 However, I do like the idea of a static site. Can we make more use of GitHub Actions? I don't know much about them (though would be keen to learn) but could flow be:
Now I don't know much about GitHub actions and whether above is possible, but if it is it would massively simplify making changes (and would make it much easier to accept small, typo type edits from new, random contributors who are not as familiar with how to generate the site), would give us the static site we potentially want and wouldn't require a big move off the current tech stack which has served us reasonably well. In future we could even take it to the next stage and auto-deploy if we wanted. Or is the use of using a development server to generate the HTML for us a bit hacky and is there a better way to generate the HTML from Jinja2 templates from the command line? HTTP Chunked Encoding is no longer a thing in HTTP/2 (which Google App Engine uses) - it's inbuilt in the protocol and happens by default. Even if this was not the case, I'm not sure we need it with the size of our HTML files. The HTTP/2 chapter is the CDN one and at 25.4KB gzipped that's not much at all really. For contributors page I'd have the contributors listed alphabetical and an inline |
@ethomson might help on the GitHub Actions topic? |
Oh that would be good @DirtyF ! The other option (whether we do GitHub actions - which would have other benefits) is just to stick Cloudflare or similar in front of it, Benefit from it's caching of the HTML pages and get HTTP/3, improved HTTP/2 prioritisation implementation as Google's CDN is sub-optimal in this respect and even further improvements prioritiations on top of that better HTTP/2 and 3 prioritisation. But not sure if we have budget for that, |
👋 What you’ve described is not impossible with GitHub Actions but I fear that it will be a bit challenging at present: reason being that pull requests from forks have downgraded security rights. So - doing something like spinning up a Linux server sounds like it would require some sort of secret or token that had permission to start a service. (eg you might need an AWS token to start the VM.) You can add that key as a secret in your repository and GitHub Actions workflows will be able to access it... except workflows that are running as pull requests from forks. Those won’t get secrets (to prevent a malicious person from trying to steal your secrets). I don’t have a very good workaround for you at present, I’m afraid. I think a GitHub App may make more sense at the moment. |
Thanks for getting back so quickly @ethomson!
Is there such a thing as a post-commit hook or GitHub action to apply changes to master after a PR has been merged?
For development this is run as a Flask development server started from the command line and running on port 8080 so don't need any particular service permissions. Though does require some dependencies as detailed in steps 1-3 of the README.md in our src directory. As I said above, using the whole flask development server is probably overkill and there may be better ways to convert Jinja2 templates to actual HTML but still, I'm curious to know what the options are anyway since I'm less familiar with GitHub and GitHub actions in particular. |
Yes! You can use this trigger: on:
push:
branches: [master] which will trigger on any push to master, or merge of a pull request into master.
Aha - thanks for the explanation. If that's the case then this makes a lot more sense for a GitHub Actions workflow. name: Build
on:
push:
branches:
- master
jobs:
build:
runs-on: ubuntu-latest
steps:
- run: |
sudo pip install virtualenv
virtualenv --python python3 env
source env/bin/activate
pip install -r requirements.txt
python main.py
... |
Oh this looks promising (and interesting to learn about!!). And that would allow us to checkout master, make changes, and then commit to master again? |
Yeah, you'll need to set up a personal access token that has write access to the repository. Then add two secrets to your repository, one with the username of the user that created that PAT, and another with the PAT itself. For example, if you set username to PUSH_USERNAME and the PAT itself to PUSH_TOKEN, then you can do this:
|
@ethompson I've set this up in my fork, and it seems to be working well. The one thing I'm stuck with is getting a list of files changed in that commit. This post seems to suggest using this:
But that is returning blank - see the "Update timestamp for modified files" section of an example run
No files are returned by that command, but clicking on the dc0374f link at top of the screen shows the file did change, and the same command run locally shows that it changed. Any ideas why that would not work? It's almost like the commit hasn't fully committed by this point, but since this is a post-commit action it should be there in master by this point shouldn't it? Or is there a better way of getting the list of files included in this commit? |
By default, the
If I understand the depth and reachability, I think that you could get away with
(You only need to diff |
But you can also set |
Thanks. |
Thanks - I'm a little surprised, but not completely surprised. I'll 👀 at this a bit to better understand it, but I'm glad that |
OK so at the moment I have the following working on my fork, after a Pull Request is merged:
This takes care of #317 (with the added bonus of I can't get it to start the flask server in the background (also doesn't work locally on my command line, though I can background it once it's started). But then honestly think having markdown->jinja2 template->web server->html is a bit of a hacky workflow to generate a static site. Separately, I'm not sure we really need a static site as don't think that's the cause for what @rviscomi has raised in #634 (comment) as discussed in #638 Anyway still been very useful to learn about Github Actions (they're very cool!), spotted and fixed a couple of bugs, and think we can implement what I have in to solve #317. So will move this discussion over there. Thanks @ethomson for all your help! |
Actually @ethomson I've another problem. That works when I commit to master directly. When I do it via a pull request, it again returns no files when I run |
It would appear from the last release that since we added caching headers in #608 Google Cloud is caching our site for the 3 hours that we set. That means this line from @rviscomi 's opening comment is incorrect:
Only the first user from each cloud server (set by region I presume?) takes this hit. This can be proven with the contributors page: This is supposed to give a random ordering with each refresh. However it does not - it gives the same "random" ordering on each refresh and you have to wait 3 hours since it was cached to get a random ordering. However if you add a query string you get a new random ordering (which is also now locked for 3 hours no matter how many times you refresh) as detailed in Google Cloud Caching documentation: Different query string, different random ordering:
Therefore we effectively have a static site already, thanks to the Google Cloud CDN in front of us. Yeah we possibly could improve that first user's experience with a truly static site, but I don't think it's worth changing tech stat just for that to be honest. Plus we've yet to prove that was ever the reason for the slowness and, as per #638 , there are other things I think are bigger impact. So I vote to close this issue. Barry P.S. @ethomson would still like to get GitHub actions working to build the site, if you get a chance to figure out the answer to my last comment. |
Our backend is currently hosted on Google App Engine running a Python server and rendering Jinja templates in Flask based on JSON config data.
For every URL, there is effectively one static HTML page. However, every time a user visits one of these pages, the Python server needs to render the templates using (in-memory) JSON config data. Because each type of page requires a different amount of rendering, performance varies across the site:
This chart tracks the 30-day trend of CrUX data for four of our pages: Home, CSS, JS, and Table of Contents. The ToC page has the highest % of fast FCP on mobile, at ~80%, although it's been dropping over time. The CSS page, which is quite large in terms of content and assets, has the fewest fast FCP experiences, at about 40%.
I'd argue that these pages should take more or less the same amount of time to download and render, and they should be closer to 80% fast than 40%. Ideally 100% fast!
The key to this, I think, is to stop server-side rendering pages. When we deploy the site, none of the HTML will change. So we should take a snapshot of the markup and serve those files statically as we would a JPEG. Sure, the caching headers can be different but the key is that the server doesn't have to think before it returns the contents. For larger documents like the CSS or HTTP/2 chapters, I'd love to explore HTTP chunking as a way to give clients a head start to load dependencies and render above-fold content early.
I think this can simplify our tech stack as well. For example, we can host our static assets (HTML, JS, CSS, images) on a host like Netlify or even directly from this repo using GitHub Pages. Without server-side code in production, we can more quickly deliver contents to users, which should improve our bottom-line metrics like engagement rates.
One very small exception to this is the Contributors page which shuffles the list of contributors on each request. The HTML is different each time and we can either shuffle in JS (bad because of a delay in rendering) or remove that feature entirely (bad because it unfairly features some contributors at the top more prominently). Open to discussion about how to resolve that.
@HTTPArchive/developers any thoughts on this proposal?
The text was updated successfully, but these errors were encountered: