Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TypeError on large folder downloads #27562

Closed

Conversation

j-be
Copy link

@j-be j-be commented Jun 18, 2021

As requested in #15117, here a PR for discussion.

Additionally, I propose to add Linux to the preferTarFor user agent matches.

Please note the TODO: I would not consider this PR ready to be merged until that is fixed.

@szaimen
Copy link
Contributor

szaimen commented Jun 18, 2021

cc @ChristophWurst @juliushaertl @nickvergessen for feedback on this

@szaimen szaimen added this to the Nextcloud 23 milestone Jun 18, 2021
@ChristophWurst
Copy link
Member

cc @ChristophWurst @juliushaertl @nickvergessen for feedback on this

FYI I do mostly auth, APIs and dependencies in the server repo :)

Copy link
Member

@skjnldsv skjnldsv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me
A bit of testing would be nice :)

@j-be
Copy link
Author

j-be commented Jun 24, 2021

@skjnldsv I'm more than happy to help, but I have not written PHP since high school (must have been PHP4 or early PHP5 - can't really remember). So I'm not sure if I am more help than hazard here 😄

I am currently running the "smaller" fix using TarStreamer for anything that doesn't fit Zip32 on my Nextcloud, as I only need to support Linux (personal use only). Finally, my server is an Odroid HC-2 (i.e. ARMv7 32Bit), so I may not even be able to manually test Zip64, assuming McNetic/PHPZipStreamer#38 is still accurate.

@skjnldsv skjnldsv marked this pull request as ready for review August 18, 2021 12:43
@skjnldsv
Copy link
Member

/rebase

@szaimen
Copy link
Contributor

szaimen commented Aug 31, 2021

What is missing here?

@j-be
Copy link
Author

j-be commented Aug 31, 2021

@szaimen I wrote this fix off the top of my head - and I have not written anything substantial in PHP for about a decade. Also, the code is untested.

I created the PR as a base for discussion, as requested in #15117. If you guys think the code is good enough to be merged, I will obviously not object. But I advice extra caution 😄

@skjnldsv skjnldsv mentioned this pull request Oct 13, 2021
@skjnldsv skjnldsv mentioned this pull request Oct 25, 2021
25 tasks
@skjnldsv

This comment has been minimized.

@skjnldsv
Copy link
Member

@j-be can you rebase please?

There is an issue when downloading large folders (i.e. bigger than 2GB)
as archives. This commit tries to improve the situation.

Signed-off-by: Juri Berlanda <juriberlanda@hotmail.com>
Signed-off-by: Juri Berlanda <juriberlanda@hotmail.com>
@j-be j-be force-pushed the 15117-fix-type-error-on-large-archives branch from 3bf788a to 2b351aa Compare October 25, 2021 16:45
@j-be
Copy link
Author

j-be commented Oct 25, 2021

@skjnldsv here you go 😃

@skjnldsv
Copy link
Member

Test failure unrelated

@skjnldsv skjnldsv mentioned this pull request Nov 1, 2021
19 tasks
@blizzz blizzz mentioned this pull request Nov 3, 2021
18 tasks
@skjnldsv skjnldsv mentioned this pull request Nov 8, 2021
23 tasks
@j-be
Copy link
Author

j-be commented Mar 28, 2022

@blizzz @skjnldsv Is there anything I can contribute to get this merged?

@blizzz blizzz mentioned this pull request Mar 31, 2022
@szaimen
Copy link
Contributor

szaimen commented Apr 5, 2022

sorry for the delay. We discussed this internally and decided that it should not be done like this. There should rather be a config option which allows the admin to specify the archive format. cc @skjnldsv @PVince81 @nickvergessen

@j-be
Copy link
Author

j-be commented Apr 5, 2022

@szaimen no worries 😃

That sounds like a good idea. Feel free to close this PR if it is not needed anymore.

@nickvergessen nickvergessen removed their request for review April 5, 2022 16:06
@nfp0
Copy link

nfp0 commented Apr 5, 2022

@szaimen Sounds good.
Is there a place where I can follow that feature to know when it lands?

This was referenced Apr 7, 2022
@blizzz blizzz modified the milestones: Nextcloud 24, Nextcloud 25 Apr 21, 2022
@MichaIng
Copy link
Member

Probably not relevant anymore due to: nextcloud/documentation#9071

But I like the idea to have an option for specifying the archive format. zip is really the worst possible option, as long as you don't run one of these Windows systems without any proper archiver installed.

@skjnldsv
Copy link
Member

#27562 (comment)

Closing then

@skjnldsv skjnldsv closed this Feb 24, 2024
@skjnldsv skjnldsv removed the 3. to review Waiting for reviews label Feb 24, 2024
@j-be
Copy link
Author

j-be commented Feb 24, 2024

@skjnldsv For the record: the issue is still present on 32 Bit machines. But taken this is now over 2 and a half years old and I see there is no interest in tackling this, I am beyond caring.

To close some honest feedback: Trying to contribute to the project sucks. My working PR was declined for an alternative solution, which never materialized leaving the actual issue present to this day.

@MichaIng
Copy link
Member

MichaIng commented Feb 24, 2024

@j-be are you sure it is still an issue? I just successfully downloaded a 2.4 GiB directory from my NC 28.0.1, hosted on a Raspberry Pi 2 ARMv7 with 32-bit Linux OS. No error message was triggered anywhere and the downloaded archive was complete.

@j-be
Copy link
Author

j-be commented Feb 25, 2024

@MichaIng Not entirely sure as I am applying my patch (i.e. this PR) on every update of Nextcloud.

Did you try to download a folder, that will lead to a Zip bigger than 4GB from non Mac (it will correctly fall back to Tar only on MacOSX, not on Windows and Linux)? I feel like you should end up with either an error, or an invalid Zip file.

@MichaIng MichaIng added the feature: 32bits Bug specific to 32bits architectures label Feb 25, 2024
@MichaIng
Copy link
Member

I just downloaded a 4.3 GiB archive, packaged as 4.1 GiB sized zip archive. It is all complete.

There have been a number of 32-bit bit related fixes, after it broke completely with NC25 (IIRC), after which 32-bit support was aimed to be removed, and then the decision was reverted, with fixed 32-bit support in NC26. This included fixes for old 32-bit issues, like downloads and trashbin.

@j-be
Copy link
Author

j-be commented Feb 26, 2024

@MichaIng That is very weird. If you look at the comment in the constructor of https://github.com/nextcloud/server/blob/master/lib/private/Streamer.php it would indicate, that a Zip32 file with more than 4GB should not be valid. But assuming PHP_INT_SIZE is 4 on 32 bit would mean, that a 32 Bit instance of Nextcloud would always create a Zip32 archive for Windows and Linux, regardless of size or content.

I seem to remember I saw somewhere, that the used library cannot create Zip64 files on 32Bit machines (not sure why, never cared), which is why the Tar thingy is there to begin with.

Am I missing something, or how is it possible this code produces a valid Zip archive with more than 4GB content on a 32Bit machine?

@MichaIng
Copy link
Member

MichaIng commented Feb 26, 2024

Hmm, strange indeed. This interface is definitely used? The condition to set zip64 to false should definitely apply and the overall size was above 4 GB:

root@micha:~# php -r 'echo PHP_INT_SIZE . PHP_EOL;'
4

I could add some debug code to just verify that this particular condition was met/code line reached, just to be sure. And is there a way to check the format of the downloaded zip? I'll just re-download it and see.

I find this "32-bit workaround" strange anyway. I mean if the size is above 4 GB (first condition is false), and 32-bit zips are really not able to contain >4 GiB data, then NC shouldn't try to generate that zip in the first place. But it just generates a 32-bit zip then. So either 32-bit zips are able to contain data >4 GiB size, or this is complete nonsense. The comments above the code indicate that it is complete nonsense, but on the other hand, for me it just works.

Actually I do not understand either why a 32-bit zip is used when the size is below 4 GB, as even the comments state that it can fail when the "central directory" is larger (whatever this is). Is there any benefit of 32-bit zip over 64-bit zip? Otherwise, on non-mac (what about Linux clients?), why not use 64-bit zip always and disable it for 32-bit servers only?

EDIT: Ah okay, I checked the zip falsely last time. Indeed the data ending within the zip archive is at point 4 GiB. I did not recognise at is was some DVD VOB files, and I accidentally checked the 2nd last one last time, which was complete, as every file has exactly 1 GiB. But the 5th is an additional 110 MiB and is broken. The archive itself still has the whole 4.11 GiB size.

Checking back at the PR: Does the TAR streamer do any compression or just uncompressed TAR? Because if it does no compression, then I indeed would prefer zip whenever possible. Actually tar.gz would be better, better compression ratio, faster compression/decompression, native support on all *nix systems (compared to zip). To only purpose for zip is to serve users who use Windows and have no idea how to install a proper archiver aside of the native Windows zip support. But any compression is better than no compression, for downloads.

But the actual issue is the broken >4 GiB archives with 32-bit servers. And it does not help much to use TAR in some more cases (Linux), but ist must be simply ruled out that a >4 GiB zip archive is even tried to be generated on a 32-bit server, but instead a proper error message must be presented that this is (currently) not possible. The 32-bit vs 64-bit decision should/can then be done based on the architecture, ignoring the anyway not failsafe size check, at least as long as there are no other downsides of 64-bit zips. As it is the default in this ZipStreamer, I assume there are none.

A quick other approach, if >4 GiB tarballs are possible, would be of course to stream TAR on all 32-bit servers. Like any download is better than no download?

@MichaIng
Copy link
Member

MichaIng commented Feb 26, 2024

I just tested a tarball download, and that one was complete. So we can use that as fallback for larger directories on 32-bit servers. Not sure about the max number of files there, but that should be a rare case compared to the max size.

So taking everything together, unless someone finds time for proper GUI adjustments (GUI error message, archive type selector etc) NOW, I suggest this quick "fix":

		// Use TAR if preferred, it does support > 4 GiB on 32-bit servers
		if ($request->isUserAgent($this->preferTarFor)) {
			$this->streamerInstance = new TarStreamer();
		// 32-bit server
		} elseif (PHP_INT_SIZE === 4) {
			// Use 32-bit zip if size and numberOfFiles allow it
			if ($size > 0 && $size < 4 * 1000 ** 3 && $numberOfFiles < 65536) {
				$this->streamerInstance = new ZipStreamer(['zip64' => false]);
			// Use TAR as fallback, it does support > 4 GiB on 32-bit servers
			} else {
				"log info that zip archives with > 4 GiB data or > 65535 files are not supported on 32-bit systems, and an uncompressed TAR archive is downloaded instead"
				$this->streamerInstance = new TarStreamer();
			}
		// 64-bit server: Use 64-bit zip, respectively whatever the streamer does by default in this case
		} else {
			$this->streamerInstance = new ZipStreamer();
		}

Ah, I see that this PR is doing effectively the same. However, I would avoid the math on 64-bit servers and just serve 64-bit zips right away, respectively just what the streamer serves by default, regardless which size or number of files the archive has.

Since the tarball is completely uncompressed, I would not add further client to prefer it. Not sure whether zip is really so badly supported on Apple systems, but we might even want to prefer zip in every case, and use the raw tarball really only as fallback for large directories on 32-bit systems? For smaller directories it is already the case now, so this would mean a change only for large directories, where the compression is even more reasonable.

Not sure how to implement the log right now. Would need to check some other files to find the right interface for this. Not sure whether log level info or warning is appropriate. We do not want to spam higher severity entries into the log on a regular basis, if e.g. larger downloads are more common on this server, but we want admins to at least recognise it once, to avoid related questions popping up all the time. Since there is an admin panel warning and link to docs already on 32-bit servers, we might add the information here and keep the log level low: https://docs.nextcloud.com/server/latest/admin_manual/installation/system_requirements.html#cpu-architecture-and-os
Thinking about it, the log entry might not be required at all then.

And just to make that point clear: I totally agree with @j-be that we should fix this now (it is a bug, not an enhancement) instead of waiting for someone to implement a proper GUI choice, GUI error or such. The above code should assure that a valid archive is downloaded in every case, while currently on 32-bit servers, a (silently) broken 32-bit archive is downloaded to non-mac clients, without any trace in logs or such, which is pretty bad behaviour.

EDIT: Ah, the broken 32-bit zips are not generated silently. There is an overflow exception generated:

{
  "reqId": "0nOwimsvOqohDi8aelBr",
  "level": 3,
  "time": "2024-02-26T14:50:18+01:00",
  "remoteAddr": "192.168.1.1",
  "user": "Micha",
  "app": "no app in context",
  "method": "GET",
  "url": "/nextcloud/apps/files/ajax/download.php?dir=%2FVideos%2FDVD%20Hochzeit%20Zeremonie&files=%5B%22VIDEO_TS_1%22%5D&downloadStartSecret=3ococfijiya",
  "message": "Exception thrown: OverflowException",
  "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 OPR/109.0.0.0 (Edition developer)",
  "version": "28.0.1.1",
  "exception": {
    "Exception": "OverflowException",
    "Message": "Count64 object limited to 32 bit (overflow)",
    "Code": 0,
    "Trace": [
      {
        "file": "/var/www/nextcloud/3rdparty/deepdiver/zipstreamer/src/ZipStreamer.php",
        "line": 222,
        "function": "add",
        "class": "ZipStreamer\\Lib\\Count64_32",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/lib/private/Streamer.php",
        "line": 172,
        "function": "addFileFromStream",
        "class": "ZipStreamer\\ZipStreamer",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/lib/private/Streamer.php",
        "line": 139,
        "function": "addFileFromStream",
        "class": "OC\\Streamer",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/lib/private/legacy/OC_Files.php",
        "line": 216,
        "function": "addDirRecursive",
        "class": "OC\\Streamer",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/apps/files/ajax/download.php",
        "line": 77,
        "function": "get",
        "class": "OC_Files",
        "type": "::"
      },
      {
        "file": "/var/www/nextcloud/lib/private/Route/Route.php",
        "line": 155,
        "args": [
          "/var/www/nextcloud/apps/files/ajax/download.php"
        ],
        "function": "require_once"
      },
      {
        "function": "OC\\Route\\{closure}",
        "class": "OC\\Route\\Route",
        "type": "->",
        "args": [
          "*** sensitive parameters replaced ***"
        ]
      },
      {
        "file": "/var/www/nextcloud/lib/private/Route/Router.php",
        "line": 324,
        "function": "call_user_func"
      },
      {
        "file": "/var/www/nextcloud/lib/base.php",
        "line": 1069,
        "function": "match",
        "class": "OC\\Route\\Router",
        "type": "->"
      },
      {
        "file": "/var/www/nextcloud/index.php",
        "line": 39,
        "function": "handleRequest",
        "class": "OC",
        "type": "::"
      }
    ],
    "File": "/var/www/nextcloud/3rdparty/deepdiver/zipstreamer/src/Lib/Count64_32.php",
    "Line": 82,
    "CustomMessage": "Exception thrown: OverflowException"
  }
}

But an automatic fallback to TAR is still better.

@MichaIng MichaIng added bug and removed enhancement labels Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug feature: 32bits Bug specific to 32bits architectures
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants