Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Deprecate open(filename) mode strings ("r", "w+", etc...) ? #14844

Closed
samoconnor opened this issue Jan 29, 2016 · 10 comments
Closed

RFC: Deprecate open(filename) mode strings ("r", "w+", etc...) ? #14844

samoconnor opened this issue Jan 29, 2016 · 10 comments
Assignees
Labels
io Involving the I/O subsystem: libuv, read, write, etc.

Comments

@samoconnor
Copy link
Contributor

Discussion of read and write functions yielded the observation that the "w" in open(filename, "w") seems awkward: #14608 (comment).

Thinking about this more, it seems that POSIX fopen(3) mode strings in Julia's public IO API is a bit odd. It might be preferable to deprecate them in favour of something more consistent with Julia's other APIs and to use defaults that remove the need to think about file modes in most cases.

Option 1
Always open read/write and replace "w" and "a" with new function names.
Rationale: clear and simple.

open(filename) = open(filename, "r+")
create(filename) = open(filename, "w+")
append(filename) = open(filename, "a+")

Option 2
Deprecate open in favour of mmap.
Rationale: There are many things that behave like a stream (network sockets, pipes, serial ports, work queues etc...). However, local files are really much more like an array than a stream. So why not use an array interface by default for local files?

For large files, mmap is an obvious performance win (or for pure sequential access at least not any worse). For small files there is OS overhead in setting up the map but this could be optimised away. e.g. Julia's mmap could cache the whole file in a memory buffer for small files.

Option 3
Remove the need for files to be "opened" for common sequential use cases (assumes using Option 2 for random access cases).
For sequential read use cases an iterator could return items of a specified type:

for x in each("filename", UInt32) ...

For sequential write use cases an append function could be used:

for v in calculate_values()
    append("filename", v)
end
sync("filename") # Optional.

An internal hash table mapping filenames to recently appended file handles could make this efficient.

The overal goals is to minimise the degree to which the user has to think about file modes, opening and closing handles, when to use mmap etc...

@vtjnash
Copy link
Member

vtjnash commented Jan 29, 2016

these options sounds like they are inviting all kinds of issues when dealing with anything other than local files on a posix system.

option 1:
this sounds like it would interact really badly with file locking. read-write by default is actually not a very good option (read-only default is sensible because it is more reliably possible). we shouldn't reinvent the wheel here. if we aren't going to follow the posix model, we should at least note that the underlying syscall (or CreateFileW on windows) is far more useful than fopen exposes – and these options would be poorly modeled by exposing one method for each combination.

option 2:
this works pretty well for local files, but it's not clear to me this is ever faster. if the IO subsystem encounters any intermittent contention or io failures while accessing the file, Julia will be killed with a SIGSEGV (or possible another signal -- it's not defined). this also forces the kernel to allocate virtual addresses for the file, making it impossible to deal with large files, and it makes resizing of a file very buggy.

option 3:
this is non-obvious behavior, which doesn't strike me as particularly useful. it also consumes a very limited and valuable resource (file handles) without giving the user any ability to assert control

@nalimilan
Copy link
Member

What do other recent high-level languages do in that regard?

@StefanKarpinski
Copy link
Member

What do other recent high-level languages do in that regard?

The have essentially the same API that we have. I have to say I'm not entirely clear what the problem with the current interface is. If we had different types for readable and writeable IO objects, then there would be a type stability issue, but we don't have that, so...

@samoconnor
Copy link
Contributor Author

I have to say I'm not entirely clear what the problem with the current interface is.

It is just a minor point of consistency. As far as I am aware there are no other Julia API examples of passing options to a function as little strings, "w", "a+", etc. It seems that seperate functions, or keyword args would be more Julia-ish.
It seems like a lot of effort has gone into making Julia's array/martix/math APIs elegant, but the IO APIs seem to be good-enough rather than elegant.

[@vtjnash comments about locking, handles, performance, segv etc]

I don't want to seem dismissive, these are all real concerns. However, they are mostly implementation concerns. My reason for posting this issue is "Is there a simpler interface for this stuff?". I believe the implementation details can be solved. In fact, it is kind of the point for the systems engineers solve the implementation headaches once and hide them behind a simple API, so the scientific programmers don't have to think about them.

... we shouldn't reinvent the wheel here

I guess it seems to me more like hiding, as much as possible, all the cogs and grease that make the wheel go around.

What do other recent high-level languages do in that regard?

Not all examples below are recent, but there is plenty of precedence for high level languages hiding the underlying POSIX file modes. The examples below include:

  • using different names, e.g. open / create,
  • names with suffixes, e.g. open_read / open_write and
  • named arguments, e.g. :direction :input / :direction :output
BASIC 
OPEN "INPUT.TXT" FOR INPUT
OPEN "OUTPUT.TXT" FOR OUTPUT

Ada
   Open (File => Input,
         Mode => In_File,
         Name => "input.txt");
   Create (File => Output,
           Mode => Out_File,
           Name => "output.txt");

C#
using (var reader = new StreamReader("input.txt"))
using (var writer = new StreamWriter("output.txt"))


Lisp
(with-open-file (in #p"input.txt" :direction :input)
  (with-open-file (out #p"output.txt" :direction :output)

Eiffel
            create input_file.make_open_read ("input.txt")
            create output_file.make_open_write ("output.txt")


Fortran
  open(out, file="output.txt", status="new", action="write", access="stream", iostat=err)
  if ( err == 0 ) then
     open(in, file="input.txt", status="old", action="read", access="stream", iostat=err)

Java
      FileInputStream in = new FileInputStream("input.txt");
      FileOutputStream out = new FileOutputStream("ouput.txt");


JavaScript
var f_in = fso.OpenTextFile('input.txt', ForReading);
var f_out = fso.OpenTextFile('output.txt', ForWriting, true);

Mercury
   io.open_input("input.txt", InputRes, !IO),
           io.open_output("output.txt", OutputRes, !IO),


Scheme
(define in-file (open-input-file "input.txt"))
(define out-file (open-output-file "output.txt"))

Smalltalk
in := FileStream open: 'input.txt' mode: FileStream read.
out := FileStream open: 'output.txt' mode: FileStream write.

ML
  val instream = TextIO.openIn from
  val outstream = TextIO.openOut to

http://rosettacode.org/wiki/File_input/output

@kshyatt kshyatt added the io Involving the I/O subsystem: libuv, read, write, etc. label Jan 26, 2017
@samoconnor
Copy link
Contributor Author

What about changing this:

open(filename::AbstractString, mode::AbstractString)
open(filename::AbstractString, read::Bool, write::Bool, create::Bool, truncate::Bool, append::Bool)

to this:

open(filename::AbstractString; read=true, write=false, create=false, truncate=false, append=false)

Using strings of single character codes seems odd when we have kw args...

Having a 5 Bool arguments in a row seems odd when we have kw args...

Who will know what open("foo.txt", false, true, true, false, true) means in a code-review?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 8, 2017

I definitely agree about replacing the five boolean version with keyword args – the open function was introduced before there were keyword args. I made a PR to do that quite a long time ago: #3558. There were some subtleties about how passing one keyword should affect another one (see the PR thread), but since #22958 was merged the other week, we can probably now actually do this correctly. If you'd like to make a PR replacing the positional boolean method with a keyword argument method, that would be great. I would separate that from changing the mode string method, however, since I think that will be more controversial.

@JeffBezanson JeffBezanson added the triage This should be discussed on a triage call label Sep 11, 2017
@JeffBezanson
Copy link
Member

JeffBezanson commented Sep 14, 2017

We could potentially use EnumSet (#19470) for this.

@JeffBezanson JeffBezanson added this to the 1.0 milestone Sep 14, 2017
@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Sep 14, 2017
@vtjnash vtjnash self-assigned this Sep 14, 2017
@JeffBezanson
Copy link
Member

I'm going to remove this from the milestone; it should be easy to introduce a new syntax (since it would accept arguments of a new type) and eventually retire mode strings.

@JeffBezanson JeffBezanson removed this from the 1.0 milestone Sep 20, 2017
@vtjnash
Copy link
Member

vtjnash commented Mar 26, 2018

fixed by #25696

@vtjnash vtjnash closed this as completed Mar 26, 2018
@samoconnor
Copy link
Contributor Author

#25696 deprecates positional Bools, but it does not deprecate mode strings (the subject of this issue).

I don't see a pressing reason to deprecate the mode strings right now, but I think the original rationale still applies.

it seems that POSIX fopen(3) mode strings in Julia's public IO API is a bit odd.

As far as I am aware there are no other Julia API examples of passing options to a function as little strings, "w", "a+", etc. It seems that seperate functions, or keyword args would be more Julia-ish.
It seems like a lot of effort has gone into making Julia's array/martix/math APIs elegant, but the IO APIs seem to be good-enough rather than elegant.

Perhaps leave this open for consideration along side other IO stuff that needs eventual cleanup #24526.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

No branches or pull requests

6 participants