-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Wiki as Polykey-Docs repository acting as a CMS for polykey.io/docs #1
Comments
I've followed https://www.milkmoonstudio.com/post/using-webflow-with-cloudflare-to-cache-and-speed-up-your-webflow-project this to setup cloudflare as a proxy to the webflow site. So basically we now have both It was necessary to first disable SSL on webflow, for it to show the configuration necessary for non-SSL routing for cloudflare. After configuring the DNS, the SSL could be re-enabled on webflow. Cloudflare is then configured to use "Full" TLS, so between client to cloudflare proxy is cloudflare's own universal SSL, then between cloudflare and the website, it is using the origin certificate which is webflow's certificate. On webflow it still claims that something is incorrectly configured, this is not relevant, we can just ignore it. It is important to switch on "Always use HTTPS", so that any Then it is possible to create a cloudflare worker/pages to be routed from The configuration for this should be setup for Matrix AI infrastructure later. |
So the plan is now:
It's important for us to configure the Polykey-Docs repo to prevent force pushes, and to not just pull in diverged changes. This is because there is no branch protection on the wiki for Polykey, and we wouldn't want to allow accidental force pushes. Note that enabling the feature seems to only exist on premium. |
The wiki repo will now need a
|
I've used https://gitpod.io to test out docusaurus, it appears to work well. We basically have to configure its docs plugin to use There is one point of concern. It appears some aspects of the docs plugin uses markdown "front matter" that is processed by the docs plugin. When adding front matter to the markdown files, it appears to be processed as normal markdown, so it ends up showing up. I'm not sure to what extend we will need front-matter, or if there is a way to configure it in a different way. I did try https://gitpod.io/#https://github.com/MatrixAI/Polykey.wiki but it didn't work... claims it is a private repo. Submitted feedback. |
While playing around with docusaurus, I had this idea that a similar concept can be applied to the project docs that is going to the github pages. It seems we don't need |
The cloudflare workers/pages is a really powerful concept, as it seems to the next iteration of microservices, it's much more limited in terms of its runtime capabilities, but at the same time, it's so much faster and easier to develop for. But reducing the "surface complexity", the system becomes alot more streamlined. It becomes little HTTP-driven functions that is then deployed anywhere, with state and logic becoming "cloud-native". |
With the amount of OS packages, it would make sense to eventually lift our nixpkgs overlay to have a public overlay. The private overlay would then overlay on top of the public overlay. This way we can have a consistent upgrade. But we would need to have an appropriate CI/CD to work with the nixpkgs overlay itself. |
Ok an experiment with subdirectory shows that subpages in subdirectories in the github wiki using gollum shows up as a root level page. So subdirectories aren't really mapped accordingly.
Gollum must iterate across all markdown pages. I wonder if this means it will also read in This means all of our docs pages in |
Ok because But this does mean ANY markdown files in the repo will end up being rendered. There doesn't seem to be an exclusion method atm. |
Then the only special pages are:
|
In the future if we also want to open PRs to the wiki repo: https://sparkbox.com/foundry/github_wiki_tutorial_for_technical_wiki_documentation. That just means a proper github repo for hosting the wiki. It would need to be a 2-way mirror, as that would then provide the ability to push changes to the GitHub wiki repo. |
Atm wrangler2 isn't yet on nixpkgs. NixOS/nixpkgs#176233 There is some minor difference with respect to us, which we only need to publish: https://developers.cloudflare.com/workers/wrangler/compare-v1-v2/ Likely it will end up as It can be done as a |
The preset classic comes with some additional dependencies:
I took them out for now, because it appears to work fine. But it is necessary for us to develop our own theme. |
Ok so the first build is working along with images. It appears that Images was solved by using symlinks in static directory. |
Setting |
The usage of The warning issue can be ignored since it's only |
Ok the wiki has gone into a new repository https://github.com/MatrixAI/Polykey-Docs which now has a license... and so on. This enables pull requests, and this means we don't have to abide by the github wiki's limited functionality. We can now use docusaurus to its full potential. Plus it's still easy to edit the wiki by just clicking and editing the wiki on the UI. |
And it also renders the frontmatter properly: https://github.com/MatrixAI/Polykey-Docs/blob/master/Home.md. |
Subsequently in the future we can also move the blog into it, or create it as a separate repository. |
It appears that |
Ok I've got wrangler2 installed because wrangler1 doesn't actually support Cloudflare pages. Afterwards, authentication uses 2 environment variables: The relevant command is This publishes to polykey-docs.pages.dev. And it also automatically publishes to the same branch that the git repo you already in. So if you are in branch staging, then it ends up creating a staging branch on cloudflare's pages project. This can be overridden with All pushes results in a "commit", which gives you a unique reference url to that specific deployment. Like https://618c7d4f.polykey-docs.pages.dev/. The project name has to be lower case. And a new project is auto created if so. Interestingly enough, this means project names are global across the entire cloudflare system. Not exactly as nice as cloudflare workers, which gives you a unique subdomain first, every page project basically has its own subdomain. Unlike cloudflare workers, you cannot just add a route from a website to a cloudflare page. Pages only gain access to custom domains or subdomains. Whereas cloudflare workers can be routed from any site. It appears that it is recommended to use a worker to act like a proxy to the cloudflare pages, that would be a way to get However there was an intermediate product: https://developers.cloudflare.com/workers/platform/sites/start-from-existing, called workers sites. And this seems to work like a normal CF worker, which would make it routable without having to create a worker just for doing routing to the polykey-docs.pages.dev. So we will need to investigate workers sites to see if it fits instead of pages, and if not, then we could use the polykey-docs.matrixai.workers.dev to a transparent proxy. As per https://community.cloudflare.com/t/cloudflare-pages-http-route/389515, we might just investigate how to write a worker that routes to the page. |
https://blog.cloudflare.com/subdomains-vs-subdirectories-improved-seo-part-2/ - here is an example using workers to do proxying. Now one issue talked about there is important. If the asset paths like We would need the images to be instead loaded from |
In working with cloudflare workers, pages has their own workers system integration. This is called cloudflare pages functions. https://developers.cloudflare.com/pages/platform/functions/ There are 2 ways of deploying these functions, using a However I discovered another bug cloudflare/workers-sdk#1570, it doesn't work unless I'm running the This is still very much a beta product, so there's lots of little polishing issues here. At any case this doesn't actually solve my problem. These worker functions still sit within the cloudflare pages system, and is not routable unlike workers as subpaths of existing domains. So I'll need to instead deploy a proper worker, and not just cloudflare pages. Perhaps... it's possible to use worker sites instead, it seems to work the same as a worker. Let's see what happens if I were to deploy a worker-site instead of a normal worker. A template is provided here: https://developers.cloudflare.com/workers/platform/sites/start-from-scratch |
Ok moving to cloudflares workers sites worked. This is basically the same as a normal cloudflare worker, but instead it has It basically uploads every file in the public directory to the cloudflare KV system. Which is key-value database globally distributed like a cache. Every file has a unique file path, the entire file path is the key. The KV is given a "namespace" and an environment variable binding: As you can see the values of the KV is just the full content of the files verbatim. In a way it just looks like a S3 bucket, but it does have slightly different consistency behaviour compared to durable objects. |
It's actually really easy to migrate between workers sites and cloudflare pages, and cloudflare pages is really missing this critical feature. So instead of the Alot of the features and behaviour of wrangler is hard to find clear explanations of. So now Note that there's no such thing as different environments for cloudflare workers. They are just deployed to separate cloudflare workers. This is unlike cloudflare pages which does support it. Finally cloudflare worker sites is basically the same as cloudflare workers, it just does some extra automation regarding the KV upload of static assets. Note that there is no authentication to any of these things, any auth has to be done inside the worker code. Otherwise, they have to be routed via a custom domain or a custom route that then can have cloudflare stick their firewall/access/zero trust system on top. |
Cloudflare pages does appear to be a more sophisticated product, with a lot of convenience features for any kind of site deployment (the integration of functions allows a fully dynamic PHP-CGI-esque kind of site). I suspect that once the routing feature becomes available to cloudflare pages, it should be simple enough to transition back to pages, while for now we stick to using worker sites. |
Ok, after some wrangling, some further notes:
Ok so basically it is correct to set the Additionally we need to also process the |
Relevant:
I'm thinking... these are 2 solutions to the same problem. Either apply some base tag that affects how my a href links are loaded, or do some post-processing on src tags and more... Alternatively we do definitely say Furthermore need to try to use asset paths without any spaces in the names. Although this should likely be fixed upstream by cloudflare. |
Nope the So that leaves either using a base url or post processing Actually base URL also won't work, it breaks other links. So lastly just post-process the img tags, and any file link. Quite possibly any absolute HREF in img or anchor tag must be processed appropriately. |
Note that cloudflare has some protections for the workers:
The rate limiting is actually not worth it because it's more expensive than just executing the workers themselves. You may as well just pay the cost of worker execution as part of the bundled plan. It can be useful if you want to rate limit IPs specifically to allow fair usage across different client IPs. This leaves the firewall, which you can set rules to block certain traffic. By default cloudflare does not have any kind of layer 7 protection. This has to be done manually via the WAF. |
Posted new bug: cloudflare/workers-sdk#4970. |
Ok what's left to do is:
|
So docusaurus uses 2 types of plugins in process markdown: remark and rehype. Remark works on the markdown AST, while rehype works on hypertext AST. It seems pretty simple to create these plugins https://docusaurus.io/docs/markdown-features/plugins#creating-new-rehyperemark-plugins. So I'm still not entirely sure which one is the right one for finding the |
The image links are all sorted now. I got 2 versions of the plugin: /**
* Docusaurus does not process JSX `<img src ="...">` URLs
* This plugin rewrites the URLs to be relative to `baseUrl`
* Markdown links `[]()`, images `![](/image)` and anchor `<a href="...">`
* are already automatically processed
*/
const remarkImageSrcWithBase = (options) => {
return (ast) => {
visit(ast, 'jsx', (node) => {
const matches = node.value.match(/^(<img\s.*?src="\s*)\/(.*)$/);
if (matches != null) {
node.value = `${matches[1]}${baseUrl}${matches[2]}`;
}
});
};
};
/**
* Docusaurus does not process JSX `<img src ="...">` URLs
* This plugin rewrites the src attribute to `src={require("...").default}`
* Markdown links `[]()`, images `![](/image)` and anchor `<a href="...">`
* are already automatically processed
*/
const remarkImageSrcWithRequire = (options) => {
return (ast) => {
visit(ast, 'jsx', (node) => {
const matches = node.value.match(/^(<img\s.*?src=)"(\s*\/.*?)"(.*)$/);
if (matches != null) {
node.value = `${matches[1]}{require("${matches[2]}").default}${matches[3]}`;
}
});
};
}; I think the second one is better as that always uses the |
Along the way, I also fixed the links to
Is needed, the using Normal |
But on the topic of SPA. I've discovered a new bug. If you go to However the JS takes over and then routes properly to I think the cloudflare worker is finding that it doesn't find This means I need to make the KV handler route as if it's an SPA. It needs to understand that |
Ok so we don't want to use SPA routing, since we do want it to show the right file on the first load for SEO purposes. In this case, we need to ensure that either So In cloudflare the KV asset loader will resolve things that look like directories to However if |
Ok so the full solution was even more complicated. You need this too https://docusaurus.io/docs/markdown-features/links. Basically this means we must use They are considered file links. And file links get "processed" by docusaurus. This removes the On Github these links work, while So now with |
Ok this is all done. Cool so we have an SPA that is SEO-capable (that is pretty much hydrated statically), but with SPA routing in JS. Future custom functionality can be extended with teh workers. Lastly:
Will be done in a new issue. I'll setup the |
Webflow now feels slow compared to the docs because the docs is a SPA. I've a plan to switch over the blog to docusaurus too if we make use of it more... Then webflow becomes purely a page designer for the front page. |
Should write a blog post about this. |
It's actually not necessary to do this. #1 (comment) You can use |
The wiki will be scheduled to go into the polykey.io/docs instead due to community/community#4992. GH wikis are not SEO friendly.
However the wiki is still useful way of editing information compared to editing pages on our website. So a hybrid solution is to have the wiki available but then replicate/mirror the wiki to the website.
How to do this? Well the wiki information is public, and we can use BE server to mirror the information. Because search engines cannot see the wiki anyway, it's fine to have the duplicate information.
However
polykey.io
is hosted by webflow which has their own custom CMS. So how to connect to an external API (possibly serverless) backend to then load custom data. Most important it must do the rendering of HTML itself, and not just a mirror the HTML and present it via SSR, not fetched from JS.It looks like on webflow, it supports https://university.webflow.com/lesson/href-prefix which allows part of an existing website hosted by your own webserver to reverse proxy a particular url directory to webflow.
However I'm looking to do the opposite, use webflow to reverse proxy a portion of the site to an alternative location. However they only allow this for enterprise: https://webflow.com/feature/reverse-proxy-service-for-enterprise
So it looks like it will be time to to front webflow with our own reverse proxy to serve.
There are 2 options here:
polykey.io
and we can use it's page rules and maybe other features to be able to route a subdirectory to a different backend server.Finally the underlying backend could work as a serverless platfrom like cloudflare workers, and then it just has to fetch the wiki markdown and render the correct page.
This does limit the wiki's formatting to what the HTML is shown. See: https://stackoverflow.com/questions/38480701/can-a-github-wiki-embed-html and https://github.com/bryanbraun/github-wiki-html-test/wiki. Note that even if github wiki itself removes those tags, the code is still in the wiki git, and when we mirror we can render that with custom CSS as well if needed. We will need to incorporate this https://webmasters.stackexchange.com/a/76927
Seems like something to be automated with pulumi when we get to it.
Tasks
[ ] 1. Create server backend for fetching wiki source from https://github.com/MatrixAI/Polykey.wiki.git- unnecessary as we use SSG triggered from CI/CD[ ] 3. Server backend should be caching the git repository, and just fetching changes should be fast, this means filesystem state may need to be cached, either via mounted filesystem, ephemeral storage is fine- this is not necessary, as we change to using SSG triggered in the CI/CDpolykey.io/docs
[ ] 6. Need to integrate the theme appropriately, it's not possible to partially load data from a third party inside webflow template on the backend, only custom code on the frontend, so docs server will need to satisfy the same HTML header and footer of the main website- Modify defaultKeyModifier to allow pretty urls cloudflare/kv-asset-handler#6[ ] 7. Reuse the polykey.io theme, could be done through a git submodule pointing to website theme code- Modify defaultKeyModifier to allow pretty urls cloudflare/kv-asset-handler#6The text was updated successfully, but these errors were encountered: