Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-friendly input macros. #67

Closed
wants to merge 1 commit into from
Closed

User-friendly input macros. #67

wants to merge 1 commit into from

Conversation

nrc
Copy link
Member

@nrc nrc commented May 4, 2014

Add scan!, scanln!, etc. macros which mirror print!, println!, etc. for doing straightforward input from stdin, a file, or other Reader.

I really miss having something like this. I think it would be really good for increasing uptake of Rust to have a good story here.

I was hacking on a prototype implementation over the weekend. It only supports {} holes (although some of the infrastructure for fixed widths is there) and doesn't work yet, but its getting there. I'd like to land something like that and then extend the mini-language gradually, letting the design evolve.

Add `scan!`, `scanln!`, etc. macros which mirror `print!`, `println!`, etc. for doing straightforward input from stdin, a file, or other Reader.
@nrc
Copy link
Member Author

nrc commented May 4, 2014

See also issue 6220.

@huonw
Copy link
Member

huonw commented May 4, 2014

cc @lifthrasiir (since you wrote a (non-procedural!) macro for this.)

@lifthrasiir
Copy link
Contributor

@huonw Actually I have my own experimental syntax extension named read.rs, and an associated draft design document.

In my opinion, the most problematic aspect of such syntax extension is how to return the read values if any. For example, if scanln!("{} {} {}", a, b, c) wouldn't silently fail on the parsing error, then a, b and c should be assigned before scanln! as scanln! may not assign to them in some cases. (My lex! macro also suffers from this problem, which I consider a bug.)

@milibopp
Copy link

milibopp commented May 5, 2014

I find it very weird to have this pseudo-initialization code before the call to scanln!. I think it makes a lot more sense to have the macro expand to code that returns a tuple of the input values. It should probably be an option tuple like Option<(str, u64)> to handle invalid input.

@nrc
Copy link
Member Author

nrc commented May 5, 2014

@aepsil0n this is pretty much what scanf and streaming io do, but now that you mention it, is a little bit weird, or at least not very Rust-y. This does have the benefit of mirroring println! too. I'm not sure how you would know what types to use for the variables if you returned them, rather than taking out params.

I think you could probably get away without the initialisation too, e.g.,

let x: int;
let y: int;
scanln!("{} {}", x, y);

I want to avoid returning a value wrapped in an Option or Result so that this is a very lightweight, easy to use API. There might need to be a recoverable form too which returns a Result. I kind of feel that if your software quality bar is high enough to check the result of io operations, you should be using a proper IO library and not scanln.

@milibopp
Copy link

milibopp commented May 5, 2014

Would this work?

let (x, y): (f64, str) = scanln!("{} {}");

It's clear from the context, but I don't know how powerful the macro system is.

@pnkfelix
Copy link
Member

pnkfelix commented May 5, 2014

There might need to be a recoverable form too which returns a Result. I kind of feel that if your software quality bar is high enough to check the result of io operations, you should be using a proper IO library and not scanln.

This comment IMO ties into questions raised about the println! API's handling of errors like EPIPE on rust-lang/rust#13824.

More specifically: there is a viewpoint put forward in the comments for that ticket that if you want to handle errors printing to stdout, you should be using io::stdout() rather than println!.

  • Note that "errors" here includes broken pipes that one can get when piping output to a unix tool like head.
  • I do not necessarily disagree with that viewpoint, but I do suspect that we need to provide an easier middle ground between the nice-but-fail-sy println! macro versus a series of method calls on stdout's LineBufferedWriter<StdWriter>.

@nrc
Copy link
Member Author

nrc commented May 5, 2014

@aepsil0n it might. I'm also not so clear on if the macro system is powerful enough to do that. It works for vec!, but I'm not sure if the cases are comparable.

@pnkfelix yeah, I agree with that viewpoint on println!. I also think we need a nice middle ground. I'm not sure if that is a recoverable version of println!/scanln! or if we just need a better API for 'serious' IO.

@o11c
Copy link
Contributor

o11c commented May 17, 2014

I disagree with several of the assumptions made in this RFC. scan should not be perfectly symmetrical with the way print! works (though perhaps a variant of print! could be improved to follow scan!)

In my experience, there are only two kinds of input:

  • token-based input, where you only every want one token at a time, then you switch on the type of token in your parser
  • line-based input, where you read a line at a time, then perform splitting operations on that line. In some cases, that may lead to requesting another line.

... and I've found that it makes for better error messages if I implement token-based input on top of line-based input.

In my C++ codebase, I currently have the following kinds of splitters:

  • an extract family that is recursive and intended for machine-produced data such as CSV (or any other separator ... space is confusing to humans since each space is its own split), but also contains the terminals for things like "extract just an int" (but Rust has FromStr for the latter), which are reused by the other families. Since Rust doesn't have variadic templates yet, it would have to use an array-of-trait for the recursive case, which is quite awkward, but still better than other approaches IMO.
  • an asplit family that splits on any run of whitespace, like a shell (I have a couple of variants of this actually, to handle different quoting styles). This family also supports only parsing the head part of a string and returning the unparsed portion.
  • a config_parse method that just splits a leading key: and then converts the value to an the type of the variable (this is an interesting exercise in "which is faster, linear scan or multiple allocation + virtual function ", though since I only use it at program startup it doesn't matter ... currently I'm using a linear scan in code. Because external vtables, Rust could avoid the per-variable allocation by doing a map of &mut Trait, but still has to pay the cost of allocating the map data itself).

@nrc
Copy link
Member Author

nrc commented May 17, 2014

@o11c I think you are right for a general purpose IO library. But I am proposing scan as a 'toy' IO library. The kind if thing which is suited for programming exercises (in the tutorials, or for a university course, for example) or programming competitions. We just want ease of use, really. Robustness, extensibility, and efficiency are not primary as they would be for a real-world IO library.

@o11c
Copy link
Contributor

o11c commented May 17, 2014

@nick29581 I completely disagree that a toy library is a good idea - and that is certainly not something that parallels print!. Tutorials and university courses should not teach you to ignore sanity. All too often, they never get around to telling you that all the code you learned to use is a completely wrong approach - and that's not nearly as good as just teaching the right approach in the first place.

It's no harder to use my libraries than to use scanf, and it's much safer.

@nrc
Copy link
Member Author

nrc commented May 18, 2014

@o11c I guess we just have to disagree. I can see the merit in your approach though. My feeling is that when teaching, you should teach one thing at a time, so when teaching about IO, you should teach the good stuff that matters, but when teaching something else where IO is peripheral and you just want to get some input to make a fun exercise, then you just want something that is as simple as possible. Any boilerplate at all is a distraction.

There are certainly pros and cons for matching scan with print, they are not doing the same thing, but then the symmetry is appealing.

I'm not sure from your library description how they are used, but it seems more complex than scanf, which is just a single function call.

In the same way that println! exists just to do output in exercies/prototyping/debugging, I think there should be something similarly simple for input.

@o11c
Copy link
Contributor

o11c commented May 18, 2014

@nick29581

I'm not sure from your library description how they are used, but it seems more complex than scanf, which is just a single function call.

Nope. All of the below linked code is replacing calls to sscanf, and the new code is much more robust, shorter, and faster.

asplit is just as simple as sscanf, except there's no format string (it always splits on whitespace).

For extract, it depends on what you're doing:

  • if you're only extracting a single item, it's simpler than sscanf because there is no format string
  • If you're extracting non-nested csv-like data of known or unknown length, you specify the character to split on as a template argument (would work a normal argument, since Rust doesn't have value template arguments yet) to the record or vrec factory functions, and pass the resulting object as the argument to extract (this is much simpler to do than to explain, see links below)
  • If you're extracting nested csv-like data, you just nest the record calls (but usually you don't need to nest them, because the inner objects have their own extract implementation)

(obviously in Rust these would be a trait implementation rather than overloaded functions)
(XString is basically &str, ZString is the same but with a guaranteed '\0', and LString (which is new) is &'static str. All my other string classes (excluding FormatString, which doesn't count) are owned.)

Links to how simple or complicated extract is for various purposes:

Also, the human-facing functions which are only called the same file they're defined in:

I haven't linked config_parse functions since they don't yet do everything I want them to do - particularly, they are currently only robust against key errors, not value errors.

@nielsle
Copy link

nielsle commented May 19, 2014

Perhaps:

let (x, y) = try!(scanln!("{:f} {:s}"));

@alexcrichton
Copy link
Member

This was discussed in today's meeting and it was decided that a feature such as this should bake in a library before being accepted into the main repo, so I'm going to close this for now.

This would be a nice feature to have though!

@uazu
Copy link

uazu commented Jun 18, 2014

I don't know whether you've considered the possibility of matching/parsing either all or nothing (but nothing in between). If the whole thing matches, then all variables are assigned. Otherwise all variables are unset (or nil'd/zero'd) and all the characters read are 'ungot'. This means you can 'try' various patterns against the input 100% safely. This approach worked well for one parsing library I wrote.

@gsingh93
Copy link

Looks like there's a library that's doing this: https://github.com/mahkoh/scan

Also, should this be closed or closed as postponed?

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants