Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TruncatedOutput: only 1495314753 of 2121389892 bytes read #554

Closed
Desto7 opened this issue Feb 25, 2019 · 90 comments · Fixed by Backblaze/b2-sdk-python#329 or Backblaze/b2-sdk-python#506
Assignees

Comments

@Desto7
Copy link

Desto7 commented Feb 25, 2019

I'm able to use "b2 download-file-by-name" to download small files, but when I target a 2.1 GB file, it crashes out randomly midway through. (I have run the command at least 10 times over the course of two days. Each time it crashed after having downloaded in the range of 1.4 - 2.0 GB out of 2.1 GB).
Reading through the issues page, it seemed that "b2 sync" is recommended. Same issue remains though, crashing out at about 1.7 GB.
Since no one else appears to have this rather fundamental problem, I suspect it's related to my region/isp/home network. Still.. any help would be appreciated. I have attached a --debugLog, and pasted a typical command line response here.
Thanks in advance

b2_cli.log

CMD:

C:\Users\Desto>b2 download-file-by-name PeterBackup1 PB1/PB.000.000.000.004.pb8 "C:\Users\Desto\Desktop\Matlab\Projects 2018\Peter Backup\PB1_BackupCheck\temp\PB.000.000.000.004.pb8"

Output:

-snip-
C:\Users\Desto\Desktop\Matlab\Projects 2018\Peter Backup\PB1_BackupCheck\temp\PB.000.000.000.004.pb8: 70%|7| 1.50G/2.12G [05:37<03:15, 3.20MB/s]
ERROR:b2.console_tool:ConsoleTool command error
Traceback (most recent call last):
File "c:\python27\lib\site-packages\b2\console_tool.py", line 1399, in run_command
return command.run(args)
File "c:\python27\lib\site-packages\b2\console_tool.py", line 532, in run
bucket.download_file_by_name(args.b2FileName, download_dest, progress_listener)
File "c:\python27\lib\site-packages\logfury\v0_1\trace_call.py", line 84, in wrapper
return function(*wrapee_args, **wrapee_kwargs)
File "c:\python27\lib\site-packages\b2\bucket.py", line 168, in download_file_by_name
url, download_dest, progress_listener, range_
File "c:\python27\lib\site-packages\logfury\v0_1\trace_call.py", line 84, in wrapper
return function(*wrapee_args, **wrapee_kwargs)
File "c:\python27\lib\site-packages\b2\transferer\transferer.py", line 115, in download_file_from_url
range_, bytes_read, actual_sha1, metadata
File "c:\python27\lib\site-packages\b2\transferer\transferer.py", line 122, in _validate_download
raise TruncatedOutput(bytes_read, metadata.content_length)
TruncatedOutput: only 1495314753 of 2121389892 bytes read
ERROR: only 1495314753 of 2121389892 bytes read

@Desto7
Copy link
Author

Desto7 commented Mar 4, 2019

Anyone any idea?

@ppolewicz
Copy link
Collaborator

It looks, as you say, like it's a network problem. The connection between your machine and b2 cloud server has deteriorated to a point of failure and the CLI has reported it (as it should).

A future version of B2 CLI will do more to automatically recover from this type of issues.

@Desto7
Copy link
Author

Desto7 commented Mar 5, 2019

Thanks for having a look and confirming.

Currently, this issue is completely preventing me from using the command line tool, since I can't download most of my files. (Uploads work fine, so I'm pretty baffled.)

Is there a workaround to keep the connection alive, or to download parts of a file instead?

@ppolewicz
Copy link
Collaborator

I can see from the stacktrace that you are using a relatively new version of the CLI, which has parallel transferer (which I have implemented) enabled by default for large files and your file is large.

An obvious workaround would be to split the file to several small files (using 7zip with a very low compression?) and reassemble it upon restore. It's not ideal, but maybe it will work for you?

Actually, I have observed an issue like the one you report here on my workstation during testing of parallel transferer. In my case it was caused by VirtualBox "NAT" network driver, which is known to cause massive issues when performance gets reasonably high. If you are using VirtualBox "NAT" driver, please try to switch to "Bridged" - it resolved the problem instantly in my case (and it improved performance significantly). Alternatively, (since there is no configurability for parallel trasferer parameters in the current version), you can try to revert to b2 CLI version 1.3.6, which always used just one thread to download files, regardless of their size. It may be slower, but more reliable in your case.

@Desto7
Copy link
Author

Desto7 commented Mar 5, 2019

you can try to revert to b2 CLI version 1.3.6

I would love to try this. Unfortunately, I'm not at all familiar with github (sacrilegious, I know) so I'm just looking up what I need to do exactly. I don't expect you to tutor me on how to actually use this site, but if there happens to be a command line you could tell me that would install CLI 1.3.6 off the bat, I'd love to hear it.

Otherwise, I'll continue my github crash course. Here's where I'm up to:
I have downloaded the verified CLI commit of 22 Aug, and I can reach >python, >git, and >java from my command line.
Now to compile it... hmm.

@ppolewicz
Copy link
Collaborator

ppolewicz commented Mar 5, 2019

@Desto7 pip install b2==1.3.6

and if you'd like to install a version that you have checked out locally, then:

pip install -r requirements.txt
python setup.py install

@Desto7
Copy link
Author

Desto7 commented Mar 5, 2019

Going to 1.3.6 has fixed my downloading issue. Brilliant!! Thank you so much!

For those curious, 1.3.6 is slower, taking 14 minutes for a 2.1GB file, at a very fluctuating bitrate, as opposed to 8 minutes at my max bitrate when using the latest build.

For my purpose, half speed is fine, so thank you again!
Let me know if I should close/mark as solved/ or anything.

@Desto7 Desto7 closed this as completed Mar 5, 2019
@Desto7 Desto7 reopened this Mar 5, 2019
@ppolewicz
Copy link
Collaborator

@Desto7 could you tell me a little bit more about your environment? Is it a vm, an IoT device, on what network it is etc?

@Desto7
Copy link
Author

Desto7 commented Mar 6, 2019

It is a W10 computer on a home network, nothing too special.
Since you helped me out greatly, I thought I'd do a little test to provide some clarity. I installed CLI on an old W7 machine I had sitting around and connected it to the same network as the W10. And guess what? The latest version of CLI works fine.
So I must conclude the problem is limited to my W10 machine. D'oh! It has gotten a few network drivers installed over the years (such as remote lans e.g. EvolveHQ) perhaps one of these is the cause...
But it looks like you can rest easy! It was probably an odd mixture of drivers that caused my issue. Though some CLI tools for keeping connections alive, or for continuing failed downloads would always be handy of course.

@jimkutter
Copy link

I'm having the same issue occasionally on large (> 1GB) files.

Also on W10, however I run this stuff through WSL.

Not sure how helpful this will be, but it's one more datapoint.

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/b2/console_tool.py", line 1399, in run_command
    return command.run(args)
  File "/usr/local/lib/python2.7/dist-packages/b2/console_tool.py", line 532, in run
    bucket.download_file_by_name(args.b2FileName, download_dest, progress_listener)
  File "/usr/local/lib/python2.7/dist-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/b2/bucket.py", line 168, in download_file_by_name
    url, download_dest, progress_listener, range_
  File "/usr/local/lib/python2.7/dist-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/b2/transferer/transferer.py", line 115, in download_file_from_url
    range_, bytes_read, actual_sha1, metadata
  File "/usr/local/lib/python2.7/dist-packages/b2/transferer/transferer.py", line 122, in _validate_download
    raise TruncatedOutput(bytes_read, metadata.content_length)
TruncatedOutput: only 1156913275 of 1285087970 bytes read

@ppolewicz ppolewicz self-assigned this Apr 12, 2019
@ppolewicz
Copy link
Collaborator

I'm looking into this

@DerekChia
Copy link

I'm getting this error as well and it stops consistently at around 40gb download mark for me.

ERROR:b2.console_tool:ConsoleTool command error
Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.7/site-packages/b2/console_tool.py", line 1399, in run_command
    return command.run(args)
  File "/opt/anaconda/lib/python3.7/site-packages/b2/console_tool.py", line 507, in run
    self.api.download_file_by_id(args.fileId, download_dest, progress_listener)
  File "/opt/anaconda/lib/python3.7/site-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/opt/anaconda/lib/python3.7/site-packages/b2/api.py", line 175, in download_file_by_id
    return self.transferer.download_file_from_url(url, download_dest, progress_listener, range_)
  File "/opt/anaconda/lib/python3.7/site-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/opt/anaconda/lib/python3.7/site-packages/b2/transferer/transferer.py", line 115, in download_file_from_url
    range_, bytes_read, actual_sha1, metadata
  File "/opt/anaconda/lib/python3.7/site-packages/b2/transferer/transferer.py", line 122, in _validate_download
    raise TruncatedOutput(bytes_read, metadata.content_length)
b2.exception.TruncatedOutput: only 40719702112 of 79957501184 bytes read
ERROR: only 40719702112 of 79957501184 bytes read

@ppolewicz
Copy link
Collaborator

I fixed it in Backblaze/b2-sdk-python#32

@rtrainer
Copy link

@ppolewicz I know this has been closed for over a year, but downloading a 104G file kept failing for me and using 1.3.6 fixed the issue.

@ppolewicz
Copy link
Collaborator

@rtrainer can you please try with b2cli v2.0? Quite a few things have been rewritten there, it should be correct and faster than 1.3.6

@rtrainer
Copy link

I used the latest release which is v2.0.2.

@ppolewicz
Copy link
Collaborator

@rtrainer just to be clear: downloading a 104G file kept failing for you with CLI v2.0.2, then you switched to 1.3.6 and it worked fine?

@rtrainer
Copy link

That is correct. During the download there were a couple of timeouts but it kept going until between 75GB and 100GB when it would fail. 1.3.6 took longer but worked perfectly with no timeouts. I tried running it on Windows and Ubuntu 18.04.

@ppolewicz
Copy link
Collaborator

@rtrainer this did not show up on my tests, clearly there is a difference in the environment. Could you please say a bit more about your environment, specifically everything you can say about your network connection (and any usage of it during the download process), what type of storage device you are writing on (and if anything else is writing to it), what is the age of that device and amount of remaining free space, filesystem type - anything you can think of will help me narrow down the cause. (I know of one potential cause but it seems that in your case it may be something different).

Also I'd like to ask what behavior would you like to see in a perfect world - should the download process retry for a really long time (say, a day) if that's necessary because of a horrible connection? Currently the number of attempts is limited (to 5 per fragment I believe, which is subject to change) and we might want to change that. Finding a solution that you'd be happy with would be a nice starting point.

@rtrainer
Copy link

My internet connection is a 1Gb FIOS fiber to my router. I run enterprise grade equipment in my home network. All of my switches are connected with 1GB fiber interconnects and I have a 1 Gb connection to my laptop. My laptop has 64GB RAM, an i7-7700K processor and multiple 1TB Samsung SSD storage device. I watched my network after the first couple of failures and saw nothing to make me believe there was a problem with it.

I would like to see an option for the number of thread for a download and the number of retries. Maybe also the timeout value. This would give me some tools to maybe work around the problem.

I am happy to do whatever testing you would like me to do to help you understand what is going on and hopefully solve this.

@ppolewicz
Copy link
Collaborator

Rather than giving you the tools to manually configure the program so that it doesn't crash on you, I'd like to come up with something that will automatically configure itself for you (so if running it on 8 threads causes problems, the number of threads should be decreased until a single thread remains or the problem disappears).

The program needs to know what your exit criteria is though (because otherwise we could just set infinite retries and it would eventually complete - but that's unfeasible for many usecases). Can it be a timeout for the entire (sync) operation?

@ppolewicz
Copy link
Collaborator

Backblaze/b2-sdk-python#32 improved this a little bit but the fix is not complete - sync operation can create N*10 threads for downloads which can cause thread starvation and eventually a timeout. Proper threadpool must be introduced.

@ppolewicz ppolewicz reopened this Sep 24, 2021
@ppolewicz
Copy link
Collaborator

@tteggelit this might have been caused by network conditions, where something that supervises your connection (anti-botnet, anti-ddos or something like that) might really not like you to overload the network with a lot of connections. In your case, however, you only have 1Gbe connection, so assuming no overhead it should be capable of transferring 125MB/s or so. The speed of B2 CLI is 610MB/s (on Python 3.11 on Linux), so you will easily saturate the network with just a single thread (on low latency - on larger latency you can use more). Running more threads than your network can handle can lead to broken transfers.

The documentation of sync says:

Users with low-performance networks may benefit from reducing the number of threads. Using just one thread will minimize the impact on other users of the network.

@ppolewicz
Copy link
Collaborator

@Lusitaniae TruncatedOutput is usually a consequence of irrepairable error that happened before it, it just indicates that the download has failed and retries have also failed and now we ended up with a file that is not fully downloaded so we are going to stop the sync operaiton completely. I think we are going to change this behavior in B2 CLI 4.0 to allow users to keep recovery attempts for a set period of time rather than 5 attempts per fragment, but if a fragment fails to download 5 times in a row, it's a problem and we should find what it is. Are there other exceptions in the logs that you haven't shown?

The performance counters show that it's mainly the network you are waiting for. This is expected behavior - it just means that your storage device is not too slow (23min waiting for network, 38s waiting for the drive).

@tteggelit
Copy link

@ppolewicz Blaming the network doesn't quite explain why the upload - with what I assume uses the same default thread concurrency - worked fine and why the vast majority of the downloaded files (like 97%) worked the first time, but 5 subsequent attempts at these specific 150 files kept failing. I have Speedtest results being gathered from my network and while I can certainly see an impact to the bandwidth available for the Speedtest results during these transactions (you can see the initial upload impact in the green and then the subsequent download impacts in the yellow), it's by no means saturating or causing starvation. Screenshot 2023-07-11 at 7 43 40 PM In fact, the single thread sync wasn't more or less impactful than the other syncs using higher thread concurrency. And you can see I'm regularly hitting almost symmetrical upload and download speeds with my 1Gb connection. I routinely have other network operations (many Python based as b2 is) that come closer to saturating my network than these did and they complete successfully and without retries.

@ppolewicz
Copy link
Collaborator

I don't have any access to the server infrastructure, but as far as I understand, the storage unit that a group of your files is being kept might have been undergoing critical fragment reconstruction, during which the performance of that storage unit would be degraded. This is not a problem specific to Backblaze B2 but any storage based on erasure coding - B2 uses 17+3, so if a couple of drives of the same "stripe" die, it's getting pretty important to try to recover the missing fragments and maybe (this is pure speculation as I've never seen the server code) the restore operation is being prioritized over user traffic, which in your case might have shown as broken downloads.

These things come to an end though - the data is reconstructed and the cluster performance recovers.

In order to confirm my suspicion you can try to download it now and see if it works. My guess is that when you run it today, it'll work fine.

@Lusitaniae
Copy link

Lusitaniae commented Jul 14, 2023

Given the volume of my downloads, each time I run sync takes about 1h runtime

multiple attempts in a day across different days and the result is b2 cli was "reliably failing" to perform a full sync with success

I've moved the script to use s5cmd and it has been working so far

also from discussing with backblaze team, looks like my account is assigned to the smallest cluster they have in EU region (and I'm the heaviest user there) and previously they asked to slow down traffic as it was causing too much load on their systems

@tteggelit
Copy link

@ppolewicz That wouldn't explain why when I deleted the offending files and then re-uploaded them (successfully the first time) with b2 sync they continued to fail on an immediate subsequent b2 sync download with the default thread concurrency. But then immediately succeeded on my very next attempt using --threads 1. If the offending files were on devices that were rebuilding, it's highly unlikely the re-uploaded files would be placed on rebuilding devices and that those devices just happened to complete rebuilding in between my failed attempt and immediately successful next attempt. Again, I don't know the storage device layout and how many storage devices are available for my use, but I'm highly skeptical it's just one. In any case, I seem to have a workaround in using --threads 1. I would suggest making the default be --threads 1 erring on the conservative side because failed downloads cost money. A user can then try to increase that value to achieve more performance at the risk of causing failed downloads (and added cost) if they so choose.

@ppolewicz
Copy link
Collaborator

@Lusitaniae a small cluster has a good chance of being impacted by rebuild (which takes some time). It might be that it finished rebuilding just as you were switching from cli to s5cmd and that is why it works now.

@tteggelit default of 10 might not be the best setting, it's just something that was decided years ago. We'll be releasing cli v4 soon, which will allow us to change the defaults and this might be one of the changes that we'll need to do (also we'll change the retry policy to be even more bulletproof). Thank you for your suggestion.

@yeonsy
Copy link

yeonsy commented Feb 22, 2024

I had the same issue as others noted. sync downloaded the vast majority of files but kept hitting incomplete reads for a couple of larger files. Tried --downloadThreads 1 twice and that did NOT work. --threads 1 however worked. I wasn't monitoring network so it's possible I hit a momentary network instability but I figured I'd add some more data to the thread. B2 CLI v3.15 on Ubuntu 22.04 LTS, gigabit fiber connection.

It's not clear why single threading seems to work but perhaps the CLI should try failed parallel downloads again in single threaded mode?

@ppolewicz
Copy link
Collaborator

@yeonsy can you please try to reproduce it again with logs enabled?

@yeonsy
Copy link

yeonsy commented Feb 22, 2024

@ppolewicz I have a log output from one of the failed runs but I don't have the time to properly redact it at the moment. If you email me at [email removed] I can mail you the log.

@ppolewicz
Copy link
Collaborator

Ok, sent you an email

@fluffypony
Copy link

Just wanted to add a data point - I'm moving a Windows Server box from a Hetzner box in Estonia to a new box at a different DC in Prague, and trying to transfer 45TB between the two over SMB is unusable. I've specifically created a new Backblaze account using the EU region for latency and speed, and both boxes are like 1 or 2 hops away from the Backblaze DC.

b2 sync on the Hetzner box is flying, uploading at about 2Gbps sustained. As folders have finished I've kicked off a sync on the new box with 40 threads, which saturates the line at 5Gbps as it downloads. Whilst the initial download gets about 70% - 80% of the files, it seems to fail on a ton of large (4gb+) files. It's impossible to predict and SUPER inconsistent - a 5.73TB folder got to 4.43TB on first pass, and then up to 4.65TB on a subsequent pass.

Both of these machines are high performance servers, in reliable data centers, on 10Gbps unmetered ports. They are directly connected to the Internet with live ipv4 addresses. There is no NAT or firewall or anything interfering here - and the box in Prague is a brand new installation, so there's zero chance something could be awry there. I'm running the latest version of b2 as installed by pip (4.0.1).

I then kicked off a new pass with --threads 1, which is slow-ish - downloading at about 450Mbps, and the few files it downloaded before I ctrl+c'd that seemed to be fine. I'm not mad if it'll take ~5 hours to do this final pass for the last 1TB in this folder, but what happens when I do the last folder (34.3TB)? I don't really want this to take 3 weeks, when this seems like it should be fixable by running many threads but no longer chunking the downloads.

As an extra experiment, I installed 1.3.6 in a python venv, and ran it with 40 threads. Using a version this old is honestly a little nuts, but it works at roughly the same speed (even a little slower) than the ---threads 1 version.

Finally, I ran the current version with 40 threads, but I included --max-download-streams-per-file 1, and I'm back at 5Gbps with NO TruncatedOutput errors.

There is DEFINITELY something wrong with the multiple streams per file downloading, unfortunately. I'm not sure if it's just on large files or what the deal is, but given that 1.3.6 works perfectly, and restricting download streams to 1 works perfectly, it's safe to say that this functionality isn't working as expected.

Don't worry @ppolewicz - as you pointed out, even stable open-source software isn't inherently bug free. I'm confident you'll figure this out, and I for one appreciate all the effort you put into this piece of software!

@ppolewicz
Copy link
Collaborator

@fluffypony this can, unfortunately, be caused by a number of reasons and I cannot address it without figuring out what the reason is. Your story gives me important insight, thank you very, very much for taking the time to write it down.

I think the code of ParallelDownloader is correct. At one time I've been able to pull ~40Gbps using a single instance of b2 cli running under a special interpreter, but that was on a low latency network. Ideally the code could figure out what the situation is and it would automatically adjust the thread count and the streams per file, ideally we'd be able to use a (non-existent as of yet) resource on the server to continue a broken download

Now, what happened in your case, is hard to say - I would really love to take a look at the logs of the failing operation to see why it stopped and what the problem was. Is it possible for you to re-trigger the error with logs enabled?

@fluffypony
Copy link

fluffypony commented Jun 4, 2024

Now, what happened in your case, is hard to say - I would really love to take a look at the logs of the failing operation to see why it stopped and what the problem was. Is it possible for you to re-trigger the error with logs enabled?

Sure thing - I have re-run it with --debug-logs, how can I get this to you privately? Should I use the gmail address attached to your git commits?

@mjurbanski-reef
Copy link
Collaborator

@fluffypony you can send it to me; and yes, email from the commit author field will be fine

@fluffypony
Copy link

Sent! I used 100 threads to try induce it into failing as much as possible😆

@mjurbanski-reef
Copy link
Collaborator

Received 38MB of logs. Thank you. I will review them tomorrow.

@yury-tokpanov
Copy link

I am using the latest b2 client, and I consistently get problems downloading large files (100GB+) using b2 sync on multiple systems located in different regions.

I tried using --threads 1, but it runs really slow at 20MB/s. Imagine trying downloading a 3TB file with that. And even with a single thread I occasionally get errors.

Any recommendations for alterantives? I'm currently exploring using rclone sync or simply mounting with s3fs and then rsyncing from there.

@mjurbanski-reef
Copy link
Collaborator

Please share logs so we can confirm if it is the same bug. Hopefully extra data point will make it easier to debug.

@yury-tokpanov
Copy link

yury-tokpanov commented Jul 12, 2024

I removed locations from the logs.

This happened when trying to sync a folder with two sizes: 970GB and 3.4GB (using default 10 threads). Even a smaller file failed.

As I said, I see this all the time across many clusters. b2 is the only service that I'm consistently encountering issues with when downloading files. There are no issues with uploads.

Setting threads to 1 works most of the time, but it is superslow.

ERROR:b2sdk._internal.sync.action:an exception occurred in a sync action
Traceback (most recent call last):
  File "<...>/b2sdk/_internal/sync/action.py", line 55, in run
    self.do_action(bucket, reporter)
  File "<...>/b2sdk/_internal/sync/action.py", line 329, in do_action
    downloaded_file.save_to(download_path)
  File "<...>/b2sdk/_internal/transfer/inbound/downloaded_file.py", line 282, in save_to
    return self.save(file, allow_seeking=allow_seeking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<...>/b2sdk/_internal/transfer/inbound/downloaded_file.py", line 236, in save
    self._validate_download(bytes_read, actual_sha1)
  File "<...>/b2sdk/_internal/transfer/inbound/downloaded_file.py", line 177, in _validate_download
    raise TruncatedOutput(bytes_read, self.download_version.content_length)
b2sdk._internal.exception.TruncatedOutput: only 3255188397 of 3616794702 bytes read
b2_download(<...>, 4_z5c2d97bf607f9a4e88c40e17_f225ab09c30dd54d6_d20240711_m205528_c004_v0402008_t0019_u01720731328109, <...>, 1720587476958): TruncatedOutput() only 3255188397 of 3616794702 bytes read  
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: f004.backblazeb2.com. Connection pool size: 10
dnload <...>
ERROR: Incomplete sync: sync is incomplete

@ppolewicz
Copy link
Collaborator

This only happens if an individual part is retried 20 times without success. What you are seeing in the end is sync file failure, which just states that the file couldn't be fully downloaded, but there is at least 20 exceptions prior to that (probably much more) stating the real failure reason.

@yury-tokpanov
Copy link

yury-tokpanov commented Jul 13, 2024

That is the only thing I see in the console. Where can I see the full log?

Also, is it possible to manually control the number of retries, backoff time, etc?

Also, there are no issues with uploading large files. Why would downloading them be so different?

Our clusters are located in different places across North America, and usually we don't have any issues downloading from GCS or AWS S3. But the failure rate downloading with b2 sync is close to 100%, unless we're using --threads 1, which then becomes very slow at 20MB/s and still can fail (though rarely).

Do you have a recommendation on alternative ways we can download from b2 buckets? I've just started exploring rclone or mounting a bucket using s3fs.

@ppolewicz
Copy link
Collaborator

@yury-tokpanov see https://github.com/Backblaze/B2_Command_Line_Tool?tab=readme-ov-file#detailed-logs

B2 buckets can have s3 interface compatibility, so you can try that. Let us know if that works, though at this point I'm pretty sure it's a storage device or a network issue, because you are failing with 1 thread and there is hardcoded 20 retries per file. Or maybe it's a bug - if you can find out from the logs what is causing it, we'll fix it - it's just not possible to fix that one without being able to reproduce it or see the logs. I'm sure you understand.

As far as I know, the fastest s3 downloader out there is s5cmd.

@mjurbanski-reef
Copy link
Collaborator

@fluffypony @yury-tokpanov I'm happy to say, the new B2 CLI 4.1.0 release fixes the reported problem by properly retrying in case of middle-of-the-data-stream errors; hence it will work correctly even when the network is congested due to multiple concurrent connections.

@fluffypony
Copy link

@fluffypony @yury-tokpanov I'm happy to say, the new B2 CLI 4.1.0 release fixes the reported problem by properly retrying in case of middle-of-the-data-stream errors; hence it will work correctly even when the network is congested due to multiple concurrent connections.

Amazing - thank you so much for all your effort with this!

@yury-tokpanov
Copy link

@fluffypony @yury-tokpanov I'm happy to say, the new B2 CLI 4.1.0 release fixes the reported problem by properly retrying in case of middle-of-the-data-stream errors; hence it will work correctly even when the network is congested due to multiple concurrent connections.

Thank you! We will going to test it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment