-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZIP extraction helper for Joomla Update #35388
ZIP extraction helper for Joomla Update #35388
Conversation
question regarding the security part of your post. Would it not be beneficial to check the hash of the zip against the published hashes at the beginning of the process? (note I have yet to read the code so maybe it already does - don't shoot me) |
administrator/components/com_joomlaupdate/tmpl/update/default.php
Outdated
Show resolved
Hide resolved
* @package Joomla.Administrator | ||
* @subpackage com_joomlaupdate | ||
* | ||
* @copyright (C) 2016 Open Source Matters, Inc. <https://www.joomla.org> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a completely new file shouldnt it be 2021?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied the copyright from the other files of Joomla Update. I have not found a consistent rule for the copyright of various files. I would appreciate a pointer to the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #31504
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I'll have to add a double copyright since this is derivative work. Okay, thanks for the clarification!
@nikosdion could you update your instructions please to say that you cannot run build.php on a windows system
|
This comment was marked as abuse.
This comment was marked as abuse.
@brianteeman Checking the hash is done before running the extraction already. It's meant to ensure that whatever we downloaded is what the update server is telling us we should have downloaded to perform the update. Checking that again before extraction, right after we already checked that, would be a waste of time. Regarding build.php, dunno, I found it this way 😛 I thought that everyone who'd try this would know that build.php requires a *NIX system. FWIW you can run it on Windows, under WSL. Whether this is a sane thing to do is a different story. An alternative to that is take an existing Joomla 4 update package and replace the @PhilETaylor We've had the experience of writing unit tests for this in the Akeeba Backup repository and one thing we quickly realised is that it's far more complicated than your typical tests and completely nonsensical in context. The code doesn't lend itself to unit testing as it goes and works directly with files. Having it work against data in a memory buffer or stream makes it substantially slower and uses much more memory which makes it fail on cheap shared hosts i.e. exactly where we need it to work (fast hosts could just as well use PHP's ZipArchive and be done with it). One way around that is faking a filesystem in memory e.g. with vfsStream. That's exactly what we had done. Then you need to create doctored ZIP files to simulate error conditions. This requires a hex editor, a copy of the ZIP specification (APPNOTE.TXT), a lot of experience and plenty of time. I actually did some of that and it was... doubleplusunfun. Then you realise that the one thing you cannot test is the timer code. You can mock it and you can have it return a fake out of time message conditionally... but so what? Have you actually really tested that this will prevent a timeout on most servers? No. That requires something that cannot be tested: experience debugging on these servers and a good sense of how they work both as a complete system and as individual software components. Unfortunately, there's only one of me in Joomla and I've found it impossible to train someone else in this dark art. Davide is working with me for eight years, there are still a few cases each week I have to provide input based on my experience with the dark arts of hosting environments. So, yeah, I could spend the next month writing Unit Tests but they wouldn't really be testing anything useful. If someone wants to write integration tests that would be FAR more useful. If you have ideas how to do that you're more than welcome to contribute that! I can tell you how I did that for Akeeba Solo which has an updater. I'm creating a new installation of the previous version, I update the updater files (since that's what I'm trying to test) and apply an update created out of the repository's files. At the end of the integration test you can also test that all files have been extracted with the correct name, size and checksum. Regarding the “Invalid login” it's INTENTIONAL. Before 2014 you would get a different message depending on the actual problem the code ran into. This made a Padding Oracle attack much easier. SO now we're not using encryption why don't I just change these messages? Glad you asked! I am preventing information disclosure which would help an attacker. If you get a different message depending on whether For code comments I'll reply inline. |
First of all: thanks a lot @nikosdion for that PR! I've worked with the old restore.php because of my involvement in a Joomla management SAAS and the new file is clearly an update in terms of readability and maintainability! Just one remark:
I want to highlight that this issue will affect a considerable number of sites. In our own SAAS-context, roughly 15% of the sites are affected and therefore will require a manual adjustment by the site owner. |
@SniperSister Thank you for confirming my suspicion. I don't have hard numbers, I can only extrapolate from the number of unique Joomla sites that ping our stats server for any of our software and the number of unique sites that ping our site for Admin Tools. My number was about 25%, about half would be using the .htaccess Maker based on experience so we seem to agree. That's good. My problem is that I am making a PR to Joomla, not the third party code that added the .htaccess / web.config etc change — even though the most likely third party code owner is me. I don't want to put a message that's a self-advertisement inside Joomla. Moreover, I cannot introduce a new language string because this PR would only make it into 4.1 which might be a long way away yet. The best thing I can do is that we need to update the Joomla documentation. If the file is blocked you will get a dialog reading “ERROR: AJAX Error”. This means that the browser received an HTTP error response. The first thing to check would be whether you can access /administrator/components/com_joomlaupdate/extract.php. If you can and get a JSON-encoded Invalid Login error the problem with the extraction was something else. If you get a 403 you need to check your .htaccess file (if you're on Apache ro Lighttpd), or your web.config file (if you are on Microsoft IIS) or your NginX configuration. If there's nothing blocking that file there you need to check any CDN configuration or talk to your host. I'm pretty sure a native English speaker can take that and create something easy for people to understand. I could also make it so that the action after an AJAX Error message is directing the user to the documentation page for Joomla Update. No questions asked, here are the docs, read them. Better than have a message with a link “Click here to read the troubleshooting documentation” which invariably leads to half of the people taking a screenshot of it and asking what to do (click the bloody link is what you should do, dammit!). Sorry, this happens so often in support that it is borderline triggering. What do you think? |
Not aware of that rule |
@brianteeman I have been told in the past that there's a language freeze before the beta of the x.y.0 version and no language strings are allowed until the next minor version. If I am allowed to add language strings (and I need official confirmation for that) I can definitely make a MUCH MORE USER FRIENDLY error reporting. Like, having a proper modal dialog with actual troubleshooting information instead of a JS popup. I really wanted to do that but without new lang strings this can't fly :) |
@nikosdion The language freeze was between the last RC until 4.0.1, not until 4.1, see #34685 ... so now after these releases all is normal as usual, no limits for language changes. |
@richard67 Thanks for the tips regarding lang strings and the git issue. I will work on better error handling since I now know I can use new lang strings :) |
@nikosdion Regarding the testing scenario "update a J4 with the changes of this PR applied to a later version with the changes still applied": It doesn't really need to create own packages. You can apply the patch of this PR e.g. with git patch or with the patchtester on a clean, current 4.0-dev and then update to the patched package built by drone for this PRm using the custom update URL of the update package which can be found when expanding the ci checks at the bottom of the PR ("show all checks") and then using the "Details" link at the right hand side of the "Downoad" line. Since the version of the patched package update has the PR number appended, there will always be found that update even if already being on the latest "*-dev" version. After such an update, the database checker will and should show only one problem for the CMS about not matching update versions. That is expected but should always be checked after such a test because if there is an error, there will be more problems shown. |
@richard67 Thank you! @wilsonge I agree. I will make the necessary changes to Admin Tools but won't make a release just yet. I truly appreciate the extra time built into this co–ordination. We're at the second week into our daughter going to pre-school. We've already had the first weekend of all of us being varying degrees of sick with a mild respiratory tract virus she brought back from school 😅 |
Ouch! Hope you feel better soon and she's enjoying school :) 4.0.4 is scheduled for October 26th just so we have a timescale to work towards! Hopefully will start docs + marketing efforts on this next week once 4.0.3 has bedded in a bit. |
I'm recovering very well, thanks! I am working today on my side of things. I hope the documentation I provided helps. If you need clarification on anything feel free to ask. It's a shame we never thought of writing a troubleshooting guide in the past. Better late than never, right? 😅 |
Thanks! I'll start on the docs upgrades over the next few days so it's all there for release day @softforge we need to get marketing kicked off on this I guess from next week. |
Woo-hoo! Is it OK if I made an Admin Tools release this week, though? I need to provide fixes for other bugs we discovered after I submitted this PR. I will make a note in the release notes that this will be available starting with Joomla 4.0.4 to prevent any misunderstandings. I can even link to this issue so the few people reading the release notes have a clue 😄 |
OK let me try and super simplify things.
|
@davidascherG You make a very large of assumptions, none of which are factually correct. 99.9% of people using Joomla — and definitely everybody who is not technical — need to do ABSOLUTELY NOTHING WHATSOEVER to prepare their sites for this change. Nothing. At. All. The only people who have to do something are those who have customised their .htaccess beyond the sample
Even then, the change is ONLY required if you have applied the advanced server protection rules for the backend which prevent access to any .php file not explicitly allowed in the .htaccess. This is literally something only very advanced users and my clients will have done. Everyone else — especially people who are not developers, systems administrators or power users — are unaffected and need to do ABSOLUTELY NOTHING AT ALL. Nothing special is required on Windows. I don't know where you got that idea? If you are talking about IIS or NginX the only people affected are those using my software, Admin Tools Professional, to create a custom web.config or NginX configuration respectively. Nothing is made easier for any third party developer (3PD). In fact, it does NOT affect third party developers at all either positively or negatively. This only has to do with Joomla Update, i.e. how Joomla updates itself. Updates to third party extensions are not affected at all. The only 3PD affected is me who wrote this PR and I am negatively affected. Not only did I carry the responsibility of updating Joomla Update itself but I also had to bear the responsibility of updating Admin Tools Professional to address the change I made in the Joomla core. I created more work for myself! To make it clear, if you are using Admin Tools Professional and its .htaccess Maker / Web.config Maker / NginX Conf Maker all you need to do is update to the latest version and click the button on your screen to regenerate the .htaccess / web.config / nginx.conf file. This means that the only people practically affected by this change is about 0.1% of expert Joomla users who maintain their own security–strengthened .htaccess file. This is the target audience of the post made by Joomla: the 0.1% of expert users. In very simple terms, if you do not understand what to do then there is a 99.9% chance that you need to do ABSOLUTELY NOTHING. As to why this change was made:
I spent more than 100 hours of my own time, having my business suffer for it, to help the Joomla community. Half of that time was spent making sure nothing would break on update — a big thank you to everyone who tested and reported issues. We all made sure that the migration to the new updater backend would be seamless and would cover all sorts of upgrade cases without any work required by the overwhelming majority of Joomla users and definitely by those Joomla users who are not very technical. You're welcome, I guess...? Unfortunately, this experience makes me wary of trying to fix anything else in Joomla Update. I was planning on helping with Joomla Update showing misleading information about third party extensions when updating to a new major release or new version family. I have already identified the problem and reported it. I was asked to fix it. Seeing how people complain about fixing what was broken for years I am not going to spend my time only to be met with hostility for fixing things. It's far easier for me to complain that something is still broken after I have reported exactly what is broken and how to fix it, let someone else spend their time to fix it and be met with hostility for their trouble. This kind of community reaction is why nobody wants to contribute to Joomla. |
Hi @brianteeman Can we put that in docs and add it to SM, its a great infographic |
@softforge What is “SM” (in this context)? |
Social Media |
No need to ask |
Thank you kindly. |
To those who have no idea why nick responded to a post from me that you can't see - I deleted the post in question @nikosdion within a few minutes of having written it. I have nothing but the highest respect for nick's efforts in fixing these complicated issues with Joomla Update (and with his excellent Akeeba Backup and Admin Tools extensions). I thought I had put enough disclaimers in my post to make it clear that I had not fully absorbed the issues involved and was probably misunderstanding what it seemed to me would be required of Joomla Admins. To try to avoid muddying the waters, I deleted that post but apparently not fast enough for nick to not see it. I suppose he got a copy of it sent automatically sent to him as soon as I hit the "Comment" button. I will be much more cautious about hitting that button in the future. |
Correct, I get an email notification on any Pull Request I have submitted. I do not unsubscribe from notifications after they are merged in case someone finds a critical issue we all missed during testing. Your message was not really the reason I wrote my reply. The Joomla FB group had a long thread about this which was... let's say neither based on fact nor particularly pleasant. Seeing this spill here made me reply. Beyond what I already commented, I'd like to add two more non–obvious points and call it a day, a week, a month and a trimester. The leadership wrote the announcement. They ran it by me to ensure technical accuracy. Here is the entirety of my feedback, sent by email, verbatim:
You can draw your own conclusions 🤷🏽♂️ The other point is the other much less reported benefit of the new Joomla Update backend: error reporting and error resolution. For the past nine years if something didn't work during the update extraction you'd get a JavaScript alert reading “AJAX Error”. That was it. Supremely unhelpful. With the new backend I am telling you exactly what went went wrong. Even better, I am showing you a practical list of what you need to do to fix the problem. If you know how to use FTP to upload stuff to your server you have the technical competence required for that. For anything more complicated I tell you what to ask your host to do. I also wrote a most detailed troubleshooting documentation for Joomla Update. Hopefully someone will put it in the docs site; I don't think I have access anymore — or at least I do not seem to have saved a login for it. It's at the top of this PR right now. I sincerely hope that my contribution does help the community — even those people who complain without having actually used the software yet (it will only be used in the next update, 4.0.4 to 4.0.5). On a personal note, I came to Mambo in 2004 for the code, I stayed because of the community. I consider the Joomla community to be my second extended family. I care deeply about each and every one of you, regardless of what you think of me. Like all families it's a bit dysfunctional and there's some screaming at each other at times but we still help and respect each other. That's what drives me to write software for Joomla; you're all family to me and I care about you. That said, the recent bout of shouting did take a mental toll on me, much more than I thought it would. I need to take a short break for mental health reasons. I suppose I'll recuperate and regain my resolve before the Joomla 4.1 merge window so I can fix more Joomla bugs. Just not right now. Right now I need to spend some time with my daughter and my wife. ☮️ |
Pinging @wilsonge @PhilETaylor @zero-24 Here's something fun for the upcoming weekend!
Summary
This PR modernises and simplifies the server- and client-side code for Joomla Update when applying the update (extracting the update Joomla ZIP file and running the update finalisation code). It also makes the code far more manageable so you can avoid problems like what you had in Joomla 4.0.1.
The following changes have been made with regards to Joomla Update:
restore.php
(Akeeba Restore) with a customextract.php
which works similarly but is easier to maintain.extract.php
.I explain each item individually below.
Test instructions
I have taken care so that this update works when updating from a version of Joomla that contains Akeeba Restore (
restore.php
) to one that doesn't, as well as updating between versions of Joomla which only useextract.php
.First, build an update package. Assuming the branch is called
feature/jupdate-new-restore
you need to do the following:npm ci npm build:js cd build php build.php --remote=feature/jupdate-new-restore --exclude-gzip --exclude-bzip2
IMPORTANT: Joomla's
build.php
script only works on Linux, macOS and other UNIX systems since it goes through the shell to use standard system tools such as find, git etc. If you are on Windows this has to be run under WSL, MSysGit32 or a similar environment which provides all the *NIX tools used bybuild.php
. I didn't make it this way. I didn't even tough it. That's how I found it!You will need the generated file
build/tmp/packages/Joomla_4.0.3-dev-Development-Update_Package.zip
You will also need a Joomla 4.0.2 site.
Test 1: Old to New
In this test you will confirm that the ‘old’ Joomla Update extraction method with Akeeba Restore still works when updating to a newer version of Joomla which no longer contains it.
Joomla_4.0.3-dev-Development-Update_Package.zip
file.Make sure that the files
administrator/components/com_joomlaupdate/restore.php
andadministrator/components/com_joomlaupdate/restore_finalisation.php
are removed.Test 2: New to New
In this test we will confirm that the new JavaScript and server-side extraction helper (
extract.php
) work, i.e. we didn't break Joomla Update (that would suck, considering I wrote its first implementation and all!).Follow the EXACT same steps as the previous test.
Since you had already updated the code that kicks in doing this update is the new one, using
extract.php
.Make sure that the update installs without any errors.
PLEASE TRY THIS ON A TEST SITE ON A COMMERCIAL HOST, IDEALLY ON A SITE THAT IS A CLONE OF A REAL WORLD SITE. DO NOT ONLY TEST ON A BLANK JOOMLA 4.0.2 SITE ON LOCALHOST. This is important! Everything works on localhost. The push comes to shove when we are dealing with real world sites with 3PD extensions of varying degrees of QA and Joomla compatibility on hosts with greatly varying relative performance using Internet connections which may drop packets harder than an overworked Amazon delivery driver tosses packages to your porch.
Good news: you do NOT need to issue an update to Joomla Update
The original Joomla Update uses the files
restore.php
(Akeeba Restore, does the extraction),restoration.php
(transient configuration file) andrestore_finalisation.php
(post-update finalisation, deletes the files which no longer exist in the new version).With this PR the respective files are
extract.php
,update.php
andfinalisation.php
. The change in name is intentional.For starters, we are updating the site, we are not restoring a backup. The file naming in the original Joomla Update came from the fact that we were using Akeeba Restore, a script used to restore backup archives. Using the new names makes it easier for developers new to Joomla, who were not around Joomla 2.5.1 when Joomla Update was rushed through the door, to understand what is going on.
Moreover, the lack of overlap means that these files will NOT overwrite the files of the previous Joomla Update while the update to the new version takes place. These files will only be removed at the finalisation step. Therefore you can have a clean update from an old to a new version without updating Joomla Update itself first. Neat!
There is a catch, though. The users who have followed the instructions of the Joomla Security Wiki page on .htaccess files, have used my Master .htaccess (which is used in the Joomla wiki) or are using Admin Tools Professional's .htaccess Maker (or something similar) will need to update their .htaccess files before running Joomla Update AFTER installing whichever Joomla version includes this patch. Same goes for NginX configuration and IIS
web.config
files.We COULD avoid that by keeping the same names as previously used but a. you still get confusing naming and b. you would need to update Joomla Update before updating Joomla (try saying that three times, fast).
Why things needed to be changed
Let's take things in a bit more detail. It's a long read. Sorry.
Custom ZIP extraction handler instead of Akeeba Restore
Joomla Update was contributed to Joomla 2.5.1 on little more than a moment's notice by yours truly, having forked it off a feature by the same name I had in Admin Tools. When I implemented this feature in Admin Tools it made sense for me to reuse the code I had already written for Akeeba Backup to extract backup archives. Extracting Joomla's update ZIP package was simply a much narrower use case of the more generic use case of extracting a ZIP backup archive.
The problem is that Akeeba Restore does much more than just extract a ZIP archive. It needs to handle multipart archives of different formats which contain large files and need several minutes to hours to extract, it needs to handle .htaccess files, it needs to handle the removal of the
installation
directory, stealth .htaccess files and much more. All these are irrelevant for Joomla Update. In other words, Joomla Update never needed Akeeba Restore and using the two together is an overkill. It also seems to confuse some people as to why Joomla is using an Akeeba product in the core.This wouldn't be that bad but for the fact that Akeeba Restore is also very tricky to maintain, especially when you only have it as one big file (in my repository it's several small files which are concatenated when the file is being “built”). This has historically led to small, well-meaning changes causing the Joomla Update to fail miserably. Like what happened most recently with Joomla 4.0.1.
I've been meaning to solve these problems by creating a special version just for the Joomla project, only including a subset of the features of the full-fat version. This is what I did here. Better late than never!
The whole file is one big class and a small “controller” tacked at the end. It's a tiny fraction of Akeeba Restore's code, it's much more maintainable and I can contribute it per the terms of the Joomla Contributor Agreement I signed all those years ago i.e. the Joomla project gets non-exclusive copyright rights under the GPLv2 and the right to change the license to a newer version of the GPL.
Furthermore, since this is a bespoke script for Joomla 4 I have made sure that the code makes use of static typing (compatible with PHP 7.2 or later) instead of the dynamic / implicit typing Akeeba Restore is doomed to use as long as there are servers with a default PHP version in the 5.x range (don't get me started!).
Improved security
Any script which allows extraction of ZIP archives onto an application directory poses an inherent security risk: if an attacker is able to extract an archive of their choosing they can compromise the site. This can be solved by having the path to the archive to be extracted stored in a server-side file. However, this would still allow an attacker to perform a Denial of Service attack by hitting the archive extraction URL repeatedly. The only way to solve this is to “authenticate” requests.
For the authentication part, a randomly generated secret key is written to a server-side file and communicated to the client-side JavaScript that goes through the archive extraction.
The old version of Akeeba Restore which is still used in Joomla Update uses the secret key to derive an AES-128 key and uses AES-128 in CTR (Counter) mode to encrypt a JSON string which is sent to the server-side
restore.php
file. That file reads the secret key from the server-side file (restoration.php
), derives the same AES-128 key and tries to decrypt the information ostensibly sent to it by the client-side application. If the decryption fails or the result is not a valid JSON document an error is returned.This has two inherent problems.
First, they key derivation function is naive and insecure. The generated AES-128 key is approximately 56 bits strong instead of 128 bits. It also suffers greatly from key collisions.
Second, the very fact that encryption is used for authentication creates an opportunity for a Padding Oracle attack. On a typical server it would take anywhere between a few dozen to several hundred minutes to derive the key used to authenticate requests to restore.php. When that happens the attacker can exploit
restore.php
to extract an archive of their choosing, even if the archive is stored remotely. A naive mitigation (fail the authentication if therestoration.php
file is created more than 90 minutes ago) is in place but it's not enough anymore. PHP 7 and 8 are much faster and hosting services no longer cram thousands of sites on a single server. This makes each request faster which helps perform the Padding Oracle attack more efficiently.This new file implements more robust mitigations I have already implemented in my own software since late 2017:
://
substring we immediately fail the request. This raises the bar of the minimum viable attack opportunity to BOTH MITM AND arbitrary file uploads with a known location and file name AND a window of opportunity in the range of a few seconds it takes for the Joomla update to complete. This gets to the territory of ‘if you can pull this off you deserve to hack me’.update.php
file is not removed AND the update ZIP file is also not deleted just yet from the temporary directory. This is more of a failsafe and less of a security feature.Overall, these changes not only make the code simpler but far more secure as well.
JavaScript simplification
The only reason we needed the convoluted JavaScript in update.js and encryption.js was the old authentication method. Now that this is no longer a concern we can instead move to plain vanilla JSON responses from our ZIP extraction helper and use Joomla's built-in
Joomla.Request
to communicate with it and parse the responses. This greatly simplifies the client-side of the update, making it maintainable by more developers instead of only those who could understand how encryption worked.Since I was at it I also removed the dependency to jQuery, rewrote the JavaScript as EcmaScript 6 and fixed a small visual bug which resulted in the progress bar not turning green at the end of the ZIP file extraction.
OPcache reset for .php files
One of the biggest problems with updating Joomla is that the OPcache is not reset per .php file being overwritten or deleted but globally, at the server level. This is problematic for two reasons.
First, there is a delay between resetting OPcache globally and the cache being deleted. More specifically, the cache is not reset until PHP is tearing down the script after it finishes executing. Therefore the restoration finalisation cannot use any core code as there's no guarantee the correct code will even load!
Second, resetting the OPcache globally is a problem on shared servers where this built-in function may not be available or, if it is, causes performance degradation across the entire server. On a commercial host with hundreds of sites this can be detrimental, especially if the various Joomla sites do not update all at the same time.
Since we are now using a bespoke file for Joomla Update we can do some simple post-processing per extracted or deleted file. If the file extracted or deleted has a .php extension and opcache_invalidate is available and the other conditions are met (see the code in the CMS' File class) we'll ask PHP to invalidate this file in the OPcache. Therefore we are resetting OPcache only for the files we are touching during the update, causing a temporary performance degradation against core files at the first few page loads after the upgrade instead of across the entire server. Moreover, opcache_invalidate is applied immediately, meaning that the finalisation file can now use core code if desired.
Further thoughts
I pondered whether we could support tar.gz or tar.bz2 update files as well. The answer is no, we can't.
ZIP files are, to put it simply, a concatenation of file headers containing information about each file and the respective file's data. The data can be compressed, the headers are not. If you are given an offset in the file where a file's header begins you can extract that file and all files after it. This is what allows us to pause the extraction if it's taking too long and restart it in a new page load. This is what allows us to perform the update on a slow server.
Plain tar archives are similar BUT the file contents are never compressed. They were meant as a primitive disk images five or so decades ago. tar.gz and tar.bz2 archives solve the problem of files taking up too much space by compressing the tar archive itself instead of each individual file's contents with gzip or bzip2 respectively. We would need to extract the entire archive, write it to disk and then extract it in a way that allows resumption.
The problem is that the decompression is memory and CPU intensive. You need as much free PHP memory available as the compressed and uncompressed archive plus the overhead of the gzip or bzip2 decompression algorithm. With modern versions of Joomla this is in the order of ~64MB. In practical terms, even a site with 128MB PHP memory limit may run out of memory if it has enough plugins wasting memory and/or debug enabled (remember that DatabaseDriver logs queries and their information in this case, exploding the memory usage). It would also take a lot of time to perform that, so much that you might hit a PHP or server timeout.
This kills the idea of using any kind of compressed TAR archive.
The other idea I pondered is whether we can use bzip2 compression in ZIP files. It is supported by the ZIP standard, alright! However, unlike zlib (implementing gzip), it's not a requirement for running Joomla and there is no guarantee it will be enabled on the server. This means that if we were to use it the update ZIP files would be unusable on a large enough proportion of servers to make it an unrealistic option.
So, ZIP files with gzip (called ‘Deflate’ in the ZIP standard) compression it is.
Finally, this PR does not touch the CLI updater. That runs under the CLI, it is not subject to the same time, memory and CPU usage constraints we need to take into account for the web version of the updater. It works fine. If it works fine, I don't touch it. Fair enough? :)
Documentation changes needed
As mentioned above, the change of the extraction helper's name from
restore.php
toextract.php
necessitates some changes in Joomla's documentation. Moreover, the documentation for Joomla Update currently has no useful troubleshooting information. So please let me rectify that.Update the .htacces examples page.
At the very least the the .htaccess example page needs to be updated with the following.
Find the following lines:
and replace them with
You should also update that file with more recent code from https://github.com/nikosdion/kyrion-htaccess/blob/kyrion/.htaccess and the changes made to the core .htaccess, e.g. for the core gzipped files. These changes are well outside the scope of this PR and I will not comment on them any further.
Further documentation changes
Joomla Update will tell you to read the documentation if something goes wrong. Enhance the Troubleshooting section in https://docs.joomla.org/J4.x:Updating_from_an_existing_version#Troubleshooting by adding the following information at the top (it's a LONG read but totally worth it if you are desperately stuck).
Joomla Update is a core component which is responsible for determining if there is a newer version of Joomla available for installation of your site, download it (or let you upload it) and install it. It has been available in Joomla since Joomla 2.5.1 and as a third party extension two years prior to that. You can access it at System, Update, Joomla.
The update process consists of several different steps. While every care has been taken to make this process as trouble–free as possible there's always a minuscule chance that something may go wrong, typically due to a very restrictive server configuration or network conditions on a very small minority of sites.
The following troubleshooting instructions are organised by update step to make it easier to find the information you are looking for. Furthermore, it is an exhaustive resource, based on more than a decade of experience troubleshooting all possible (and some borderline impossible) problems with Joomla and extension updates. It lists problems which are extremely unlikely to occur. Don't let its length scare you; you are very unlikely to ever see any of these problems occur.
Determining if updates exist. Joomla will make a request to its update server over HTTPS and download an XML file provided by the Joomla project listing the latest available versions. The update server in use can be determined by going to System, Update, Joomla, clicking on Options and examining which update server is in use. You are recommended to use the Default update server to receive updates to your current major version of Joomla. Use Joomla Next when you want to upgrade your site to the next major version of Joomla — this is best done on a copy of your site to avoid any nasty surprises; not all third party extensions and templates will be compatible across major Joomla versions. The major version of Joomla is the first digit in the Joomla version, before the first dot. For example, the major version of Joomla 4.0.1 is 4.
If Joomla cannot determine that an update is available please check the following:
https://update.joomla.org
. This is a CDN, meaning that the exact IP address will be different depending on where the world you are trying to access this URL from. Do tell your host; they will know what to do with this information.Determining if third party software is compatible with the new version you are about to install. Joomla does not have a magical way of evaluating third party code for compatibility. Its report is based solely on the extension information kept in Joomla's
#__extensions
table, the update sites provided by the installed extensions and the update information provided by the developers of third party extensions including but not limited to which version of their software is compatible with which version of Joomla.If the information displayed is incorrect please check the following:
Downloading the update. Joomla will need to download its update package, a ZIP file which is very similar to the Joomla installation ZIP file but without the web installer (the
installation
directory). This could fail for a few reasons:ulimit -t
), the PHP-FPM timeout or the web server's timeout the download will fail and you will see an error page. You will have to ask your host for help with that.https://github.com
. This is a CDN, meaning that the exact IP address will be different depending on where the world you are trying to access this URL from. Do tell your host; they will know what to do with this information.Extracting the update. After the update ZIP file has downloaded Joomla needs to extract it on your site. Since Joomla is effectively replacing itself and because this process does take some time to complete it cannot happen within Joomla itself. Instead, a separate file (
administrator/components/com_joomlaupdate/extract.php
) is used to perform the update. This file is inert except when an update is in progress.You may get an error during the extraction for one of the following reasons:
extract.php
file because of a server protection e.g. a customised .htaccess file on Apache and Litespeed servers. Try accessinghttps://www.example.com/administrator/components/com_joomlaupdate/extract.php
from a web browser, wherehttps://www.example.com/
is to be replaced with the URL to your site. You should see the message{"status":false,"message":"Invalid login"}
. If you see anything else you are being forbidden from accessing this file.extract.php
from being handled by that file. Please contact your host about this.*/administrator/components/com_joomlaupdate/extract.php
and Then The Settings Are to “Disable Security” and on a new line ”Cache Level”, ‘Bypass’. Set the Order to First. Click on Save and Deploy. This ensures that CloudFlare will not try to block the update extraction.extract.php
file. Each request is set up to take between 3 and 4 seconds. The process repeats until the entirety of the update file has been extracted. On some servers this cadence of requests to the same file from the same IP address may trigger the server's security. On other servers it may trigger a different server protection, e.g. a maximum PHP time limit, a maximum CPU usage limit or another server timeout. On even fewer servers running on CloudLinux it could trigger a server memory outage situation if your server was already running low on memory. You need to contact your server about that; there is nothing you can do yourself to work around these server limitations.Finalising the update. This is a two step process.
Right after the update ZIP file has been extracted a final step will run which removes old files. When upgrading to a new major version of Joomla the list of files to remove is pretty big and the process may timeout. Moreover, the point made in the previous section about ownership and permissions of files is important here too; Joomla needs write permissions to the old files and folders it has to remove. If this step fails you can resume it from the command line. Go into the
cli
folder and runphp joomla.php update:joomla:remove-old-files
. If you cannot do it yourself ask your host to do it for you. You will also need to follow the workaround for the next step.Finally, Joomla reloads and you are logged back into the administrator interface. At this point Joomla updates its database tables and performs any database administration tasks. If this fails you can resume the process by going to System, Maintenance, Database. Select Joomla CMS from the list and click on Update Structure.
Postscriptum
The entire PR is one commit. It almost makes it sound trivial. Deriving
extract.php
from Akeeba Restore was anything but. It took 44 rounds of refactoring, about 24 hours of work crammed into 2 ½ days. I think it was well worth it.I just hope it has a chance of getting merged. I don't think I have it in me redoing any of that work ever again. That was intense.