-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid really closing stdin/stdout in hclose()/hts_close()/et al #1665
Conversation
Enables repeated hopen("-") / hclose() / hopen("-") where previously the underlying STDIN/OUT_FILENO would have been prematurely closed. This means that stdout is never really closed and hclose()'s return value does not reflect closing the underlying fd. Hence particularly paranoid programs that have written significant data to stdout will want to close and check it themselves just before the end of main(): if (fclose(stdout) != 0 && errno != EBADF) perror("closing stdout") (Ignore EBADF as stdout may already have been closed and checked, e.g. if the program has been linked to an earlier HTSlib where hclose() still closes STDOUT_FILENO.)
We don't need to dup(STDOUT_FILENO) now that hclose()/hts_close() no longer irretrievably close stdout. Instead fclose(stdout) explicitly just before the end of main() as it is a last chance to observe I/O errors.
Since PR samtools/htslib#1665, hts_open("-", "w") / hts_close() no longer actually closes stdout. Close it at the end of main() so there is an opportunity to detect I/O errors in previously-uncommitted writes. Ignore EBADF as other code may have already closed stdout, e.g., either particular subcommands or when (dynamically) linked against an older version of HTSlib.
As noted on the previous htslib PR:
Alternatively, you need to explicitly |
Having (However I have a plan to rewrite pysam's I/O redirection in an upcoming release so that it will be immune to whether samtools/bcftools closes these file descriptors or not anyway.) |
I like this as a change. It does feel more natural to me, as as you state if the file wasn't opened by htslib then maybe its going too far for it to be doing the closing. Also given we're paranoid and use fdatasync / fsync already, I expect even if people don't explicitly do a close-and-check themselves then almost all errors will have been spotted already (eg ENOSPC). |
Hmmm… interesting point about You may also be amused to hear that POSIX and Linux have opposite answers to “what happens if during |
Thinking on things that can go wrong - devil's advocate if you wish. (I do infact like the idea in this PR)
This would previously report the number of records written when given a file, or when given stdout it'd just lose the printf as stdout had already been closed. Now it wouldn't get closed and we'd have chit-chat appearing in the output file instead. This is quite similar to the occasional nohup bugs people hit due to mixing in stderr when using I'm not hugely concerned as I'd argue this sort of tooling is broken and also trivial to fix, but I wouldn't be surprised if this class of bug also exists in the wild too.
(Note this also hints that we maybe should fix
So we see the issue with fdatasync not working here. In this regard, perhaps close would spot errors. Then again it's possible they'd just got lost in exactly the same manner. We're checking our writes, but it's the internals of the kernel which matter here. So we've sent some data down a pipe or socket, which succeeded at that point - we put it in the "todo" buffer basically - but later is found to fail. It's not something that is trivial to test either. Given how we prefer |
|
We seem to be in agreement on the various points, including what is valid and what is broken code that we need not support. Thanks for clarification on the pipe case too. Agreed also on trying to override this with extra |
Since PR samtools/htslib#1665, hts_open("-", "w") / hts_close() no longer actually closes stdout. Close it at the end of main() so there is an opportunity to detect I/O errors in previously-uncommitted writes. Ignore EBADF as other code may have already closed stdout, e.g., either particular subcommands or when (dynamically) linked against an older version of HTSlib.
Apply PR samtools/htslib#1665. At present, pysam would prefer that stdin/stdout were never closed from under it.
Since PR samtools/htslib#1665, hts_open("-", "w") / hts_close() no longer actually closes stdout. Close it at the end of main() so there is an opportunity to detect I/O errors in previously-uncommitted writes. Ignore EBADF as other code may have already closed stdout, e.g., either particular subcommands or when (dynamically) linked against an older version of HTSlib.
Revisiting the
stdin
/stdout
/dup
conversation from #1658:Reconsidering this… In fact it has caused some problems over the years: htsfile.c needed some circumlocutions because it too wants to write several SAM/VCF (i.e. textual)
htsFile
streams tostdout
, and pysam has had trouble due tostdout
being closed by samtools and bcftools. Moreover, as stdin and stdout are already open and hence are not opened byhts_open
/hopen
, it's not really morally right for them to be closed byhts_close
/hclose
.Hence this pull request, which adds a
mode
option letter tohdopen()
to signal that the fd should not be closed byhclose()
, and uses it inhopen_fd_stdinout()
which underlieshopen("-", …)
.This enables repeated
hopen("-")
/hclose()
/hopen("-")
where previously the underlyingSTDIN
/OUT_FILENO
would have been prematurely closed. This will mean that the linked bgzip.c pull request would not need to treat"-"
specially.This also means that stdout is never really closed and
hclose()
's return value does not reflect closing the underlying fd. Hence particularly paranoid programs that have written significant data to stdout will want to close and check it themselves just before the end of main():(Ignoring EBADF as stdout may already have been closed and checked, e.g. if the program has been linked to an earlier HTSlib where
hclose()
still closes STDOUT_FILENO.)The second commit simplifies htsfile.c's file opening and adds a final
fclose(stdout)
check accordingly.