Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createWriteStream has problems in a CIFS mount path with node 20.x.x and node 18.18.0 #50061

Open
simatec opened this issue Oct 6, 2023 · 26 comments
Labels
duplicate Issues and PRs that are duplicates of other issues or PRs. fs Issues and PRs related to the fs subsystem / file system.

Comments

@simatec
Copy link

simatec commented Oct 6, 2023

Version

18.18.0 and 20.x.x

Platform

Linux iob-node20 6.2.16-5-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-6 (2023-07-25T15:33Z) x86_64 GNU/Linux

Subsystem

iobroker

What steps will reproduce the bug?

writestream into a cifs share

How often does it reproduce? Is there a required condition?

Error always occurs

What is the expected behavior? Why is that the expected behavior?

The file created with writestream should be written to the remote file system (CIFS share), but it is only 16Kb in size and corrupted

What do you see instead?

A corrupted file with 16Kb instead of about 5-10 Mb

Additional information

Hello all,

I hope you can help here... We use the package among other things for backups in the iobroker project and have with current node 20 versions and from node 18.18.0 problems with many users who write their backup directly with a CIFS mount on the Fritzbox NAS.
Currently, I am only aware of problems in connection with Fitzbox and CIFS.

It seems that all attempts since Node 18.18.0 have problems with "fs.createWriteStream" or with .pipe.
Locally on the system there are no problems. The error only occurs when the backup is to be written to the CIFS mount point.

Here is an excerpt of how the create of the backup is constructed.

return new Promise((resolve, reject) => {
            const f = fs.createWriteStream(name);
            f.on('finish', () => {
                this.removeTempBackupDir();
                resolve(path.normalize(name));
            });

            f.on('error', e => {
                console.error(`host.${this.hostname} Cannot pack directory ${this.tmpDir}/backup: ${e.message}`);
                reject(new IoBrokerError({ message: e.message, code: EXIT_CODES.CANNOT_GZIP_DIRECTORY }));
            });

            try {
                tar.create({ gzip: true, cwd: `${this.tmpDir}/` }, ['backup']).pipe(f);
            } catch (e) {
                console.error(`host.${this.hostname} Cannot pack directory ${this.tmpDir}/backup: ${e.message}`);
                reject(new IoBrokerError({ message: e.message, code: EXIT_CODES.CANNOT_GZIP_DIRECTORY }));
            }
        });
@simatec simatec changed the title createWriteStream has problems in a CIFS mount path with node 20 and node 18.18.0 createWriteStream has problems in a CIFS mount path with node 20.x.x and node 18.18.0 Oct 6, 2023
@bnoordhuis
Copy link
Member

Sounds like a duplicate of #49911. Bug fix is pending release. If you agree it's a dup, then go ahead and close this.

@bnoordhuis bnoordhuis added the fs Issues and PRs related to the fs subsystem / file system. label Oct 6, 2023
@simatec
Copy link
Author

simatec commented Oct 6, 2023

Thank you for your answer.
We will test version 18.18.1 after release and report.
Basically, it does sound like a similar problem.
If version 18.18.1 really fixes the bug, we would have to fix it in version 20.x.x as well.

@simatec
Copy link
Author

simatec commented Oct 6, 2023

Further testing under Node 20 has shown that up to version 20.2.0 the error does not occur.
Newer version have this bug.
It seems to be related to libuv.

the libuv updates have been in eight Node.js 20 releases (starting with 20.3.0 in June)

@bnoordhuis
Copy link
Member

Right, then it's almost certainly the same bug; CIFS in particular is a file system where partial writes are more likely to pop up than most other file systems.

I'll take the liberty of closing this. It's fixed in the next (upcoming) v18.x release.

@bnoordhuis bnoordhuis closed this as not planned Won't fix, can't repro, duplicate, stale Oct 7, 2023
@bnoordhuis bnoordhuis added the duplicate Issues and PRs that are duplicates of other issues or PRs. label Oct 7, 2023
@simatec
Copy link
Author

simatec commented Oct 10, 2023

v18.18.1 fix the Problem...
In Node 20 the problem is still present

@winnyschuster
Copy link

Right, then it's almost certainly the same bug; CIFS in particular is a file system where partial writes are more likely to pop up than most other file systems.

I'll take the liberty of closing this. It's fixed in the next (upcoming) v18.x release.

While release 18.18.1 resolved the problem, it definitely still exists in 20.7.0 and up. In #49911 i read in one of the comments that applied fix (88ba79b and a4928b0) is already in 20.7.0 and up. I am wondering if reopening of this issue here is needed

@winnyschuster
Copy link

@bnoordhuis

Right, then it's almost certainly the same bug; CIFS in particular is a file system where partial writes are more likely to pop up than most other file systems.

as stated above, #49911 seemed to be a similar bug but only solved the here mentioned issue in release 18.18.1, not in 20.7.0 and later. so, can this issue get reopened please?

@bnoordhuis
Copy link
Member

Only if there's a reproducer or something like strace logs that show the issue. Right now it isn't an actionable bug report.

@simatec
Copy link
Author

simatec commented Oct 11, 2023

We carried out tests under Node 20. The error occurs from Node v20.3.0. To reproduce, try writing a file via writestream on a Fritzbox NAS with a CIFS mount.

The file cannot be written and writestream exits without errors.
If I write the same file locally with writestream, the file is created cleanly.
As with Node 18.18.0, this behavior is probably due to the libuv update.

You can reproduce it with a writestream attempt on a cifs of the Fritzbox.

The fact that we are talking about the same error here shows that v18.18.1, in which the libuv update was reversed, is running smoothly again, right?

@bnoordhuis
Copy link
Member

Okay, easy way to test: what happens when you set UV_USE_IO_URING=0 in the environment? Make sure to test against the latest v20.

@simatec
Copy link
Author

simatec commented Oct 12, 2023

According to my test with node 20.8.0 and the Env "UV_USE_IO_URING=0" the file is created with writestream on a CIFS mount without error.

Without this env the error occurs and no file can be created on the CIFS mount using writestream.

@Grothesk242
Copy link

Can't confirm. My system still freaks out with nodejs 20.8.0 and the ENV set.

@bnoordhuis
Copy link
Member

Conflicting signals... I'll reopen but @simatec you'll have to investigate yourself, you're quite possibly looking at a kernel bug, and @Grothesk242 you should probably file a separate issue.

@bnoordhuis bnoordhuis reopened this Oct 12, 2023
@winnyschuster
Copy link

Conflicting signals... I'll reopen but @simatec you'll have to investigate yourself, you're quite possibly looking at a kernel bug, and @Grothesk242 you should probably file a separate issue.

if that is a kernel bug, why did LTS 18.18.1 solve this issue but in 20.7.0 and up this issue still exists. Also, both systems (linux server and smb server) are the very same with the exception of node versions which have been up- and downgraded for the several tests. We have tried strace but that was overwhelming... SMB server is a closed system with no access to the os at all but with newest firmware installed. Any further advice is highly appreciated

@bnoordhuis
Copy link
Member

@winnyschuster did you try the UV_USE_IO_URING=0 thing?

As an aside: multiple people reporting sorta-similar-but-maybe-not bugs in the same issue doesn't really help move things along, it just muddies the discussion. Unless you're really sure it's the exact same issue, you're better off opening a new bug report.

@Grothesk242
Copy link

Grothesk242 commented Oct 13, 2023 via email

@bnoordhuis
Copy link
Member

Right. It would definitely have helped if you'd stated that upfront.

And now knowing that, how come @simatec says it works but you say it doesn't?

Also, you should be aware I'm not inclined to sink too much time in this. If this befuddled bug reporting keeps on going, I'll just bow out and leave you to figure it out for yourself.

@Grothesk242
Copy link

Grothesk242 commented Oct 13, 2023

And now knowing that, how come @simatec says it works but you say it doesn't?

This is due to the strangeness of the bug. He was testing just a part of the issue on a like system and this part of the code works on both systems now. But on my systems I use extended parts of the code (it is a backup suite for backing up several modules of smarthome system 'ioBroker') and these backup files are still corrupt. I know how 'befuddled' this bug reporting looks, but the issue is very hard to track down for the three (simatec, winnyschuster and myself) of us.

@simatec
Copy link
Author

simatec commented Oct 13, 2023

@bnoordhuis I must apologize here.... I had only done a simple test and tested only a part of the backups.
Indeed, not all backups to a CIFS mount work even with the ENV.
@Grothesk242 and @winnyschuster are absolutely right at this point.

I can't yet understand what has changed from Node 20.2.0 to Node 20.3.0 that causes this error.

From Node 20.3.0 this error exists until the current version Node 20.8.0.

Our guess is libuv, because the error also occurred with the update of libuv in v18.18.0 and with undoing the libuv update in v18.18.1 everything is fine again.

As @Grothesk242 wrote, it is extremely difficult to isolate the error and it currently affects only some NAS systems with CIFS. Especially the Fritzbox with its NAS makes problems here.

@bnoordhuis
Copy link
Member

Okay, duly noted. I'll take your word for it that UV_USE_IO_URING=0 didn't make a difference.

The only suggestion I have at this point is to run git bisect and see what commit it blames.

If it's the libuv upgrade, splice in the libuv commits from v1.44.2 to v1.45.0 into deps/uv and bisect again. Some linux-only files got merged into a single file so there are a few commits with broken builds where you have to patch up deps/uv/uv.gyp, see the changes to that file in 9e68f94.

@simatec
Copy link
Author

simatec commented Oct 15, 2023

As far as I could determine it now, it can only be this commit.
This is also included in v18.18.0 and was undone with 18.18.1

#48078

@bnoordhuis
Copy link
Member

The v1.45.0 release was a big one so knowing it was the libuv upgrade doesn't tell us much in itself. You've established it's not io_uring. There aren't otherwise many file system-related changes:

$ git log --oneline v1.44.2..v1.45.0 src/unix/fs.c | grep -v macos
3990fcad docs: fix some typos (#3984)
dfae365f linux: add IORING_OP_CLOSE support (#3964)
5ca5e475 linux: add IORING_OP_OPENAT support (#3963)
d2c31f42 linux: introduce io_uring support (#3952)
2f33980a src: switch to use C11 atomics where available (#3950)
dfb206c8 linux: fix ceph copy error truncating readonly files (#3920)
5102b2c0 unix: drop kfreebsd support (#3835)
acfe668e build: add MemorySanitizer (MSAN) support (#3788)
9a5a5140 linux: remove unused or obsolete syscall wrappers (#3777)

Of those, the last one may be the cause but that means you're either running a really old kernel or your libc has a bug.

@simatec
Copy link
Author

simatec commented Oct 16, 2023

@bnoordhuis thank you so much for your reply and effort.
I'll try to explain the whole thing in more detail.
This problem does not only occur with me or 1-2 other people.
It is a general problem in connection with node > 20.2.0 and node 18.18.0.

It is about the ioT platform iobroker with about 81,000 users.
We have a backup plugin there, which among other things offers the possibility to backup not only locally but also on a remote NAS with a NFS or CIFS mount.

Now users with Node 20 are increasingly reporting this problem and we as developers are looking for the cause.

Currently we only know of cases where the user has a Fritzbox running as NAS and uses the CIFS mount there.

The whole thing must have something to do with .pipe and/or fs.createWriteStream.
Manually all files can be stored on the mount.

In our tests we have always used absolutely current kernels and I myself have found it under Debian 12 as well as under Ubuntu 22.04.

It is just a very difficult issue, because there are no error messages or the like.

The ENV UV_USE_IO_URING=0 causes only partial success, because it does not work with all backup variants.

I am currently at a loss as to what else we can test.
What exactly do you mean by libc?

@bnoordhuis
Copy link
Member

What exactly do you mean by libc?

Libuv made some system calls directly (bypassing libc) but now it goes through their libc wrappers.

@santigimeno
Copy link
Member

santigimeno commented Oct 17, 2023

@simatec do you see any CIFS related logs in syslog/dmesg?

If not, can you enable CIFS debug logs as described here and report back if there's any relevant information?

@simatec
Copy link
Author

simatec commented Oct 17, 2023

Attached are the logs of the backup process including mount command before backup and umount command after backup.

In this example we try to write 2 backup files.
Both files are corrupted on the CIFS target drive and have only 16 Kb.

I have created here the logs for smb2 and smb3.1.1 to have a possible comparison.

The mount to the test Fritzbox looks like this:

sudo mount -t cifs -o username=iob-backup,password=****,noserverino,rw,uid=iobroker,gid=iobroker,file_mode=0777,dir_mode=0777,vers=3.1.1 //10.1.1.253/fritz.nas/iob/iob-test /opt/iobroker/backups

cifs-smb2.log
cifs-smb3-1-1.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate Issues and PRs that are duplicates of other issues or PRs. fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

No branches or pull requests

5 participants