@@ -42,13 +42,23 @@ follow-up PRs against this RFC.
42
42
* [ Relation to the system-level APIs]
43
43
* [ Platform-specific opt-in]
44
44
* [ Proposed organization]
45
- * [ Revising ` Reader ` and ` Writer ` ] (stub)
45
+ * [ Revising ` Reader ` and ` Writer ` ]
46
+ * [ Nonatomic results]
47
+ * [ Reader]
48
+ * [ Writer]
46
49
* [ String handling] (stub)
47
50
* [ Deadlines] (stub)
48
51
* [ Splitting streams and cancellation] (stub)
49
52
* [ Modules]
50
- * [ core::io] (stub)
51
- * [ The std::io facade] (stub)
53
+ * [ core::io]
54
+ * [ Adapters]
55
+ * [ Seeking]
56
+ * [ Buffering]
57
+ * [ MemReader and MemWriter]
58
+ * [ The std::io facade]
59
+ * [ Errors]
60
+ * [ Channel adapters]
61
+ * [ stdin, stdout, stderr]
52
62
* [ std::env] (stub)
53
63
* [ std::fs] (stub)
54
64
* [ std::net] (stub)
@@ -447,7 +457,173 @@ counts, arguments to `main`, and so on).
447
457
## Revising ` Reader ` and ` Writer `
448
458
[ Revising `Reader` and `Writer` ] : #revising-reader-and-writer
449
459
450
- > To be added in a follow-up PR.
460
+ The ` Reader ` and ` Writer ` traits are the backbone of IO, representing
461
+ the ability to (respectively) pull bytes from and push bytes to an IO
462
+ object. The core operations provided by these traits follows a very
463
+ long tradition for blocking IO, but they are still surprisingly subtle
464
+ -- and they need to be revised.
465
+
466
+ * ** Atomicity and data loss** . As discussed
467
+ [ above] ( #atomicity-and-the-reader-writer-traits ) , the ` Reader ` and
468
+ ` Writer ` traits currently expose methods that involve multiple
469
+ actual reads or writes, and data is lost when an error occurs after
470
+ some (but not all) operations have completed.
471
+
472
+ The proposed strategy for ` Reader ` operations is to return the
473
+ already-read data together with an error. For writers, the main
474
+ change is to make ` write ` only perform a single underlying write
475
+ (returning the number of bytes written on success), and provide a
476
+ separate ` write_all ` method.
477
+
478
+ * ** Parsing/serialization** . The ` Reader ` and ` Writer ` traits
479
+ currently provide a large number of default methods for
480
+ (de)serialization of various integer types to bytes with a given
481
+ endianness. Unfortunately, these operations pose atomicity problems
482
+ as well (e.g., a read could fail after reading two of the bytes
483
+ needed for a ` u32 ` value).
484
+
485
+ Rather than complicate the signatures of these methods, the
486
+ (de)serialization infrastructure is removed entirely -- in favor of
487
+ instead eventually introducing a much richer
488
+ parsing/formatting/(de)serialization framework that works seamlessly
489
+ with ` Reader ` and ` Writer ` .
490
+
491
+ Such a framework is out of scope for this RFC, but the
492
+ endian-sensitive functionality will be provided elsewhere
493
+ (likely out of tree).
494
+
495
+ * ** The error type** . The traits currently use ` IoResult ` in their
496
+ return types, which ties them to ` IoError ` in particular. Besides
497
+ being an unnecessary restriction, this type prevents ` Reader ` and
498
+ ` Writer ` (and various adapters built on top of them) from moving to
499
+ ` libcore ` -- ` IoError ` currently requires the ` String ` type.
500
+
501
+ With associated types, there is essentially no downside in making
502
+ the error type generic.
503
+
504
+ With those general points out of the way, let's look at the details.
505
+
506
+ ### Nonatomic results
507
+ [ Nonatomic results ] : #nonatomic-results
508
+
509
+ To clarity dealing with nonatomic operations and improve their
510
+ ergonomics, we introduce some new types into ` std::error ` :
511
+
512
+ ``` rust
513
+ // The progress so far (T) paired with an err (Err)
514
+ struct PartialResult <T , Err >(T , Err );
515
+
516
+ // An operation that may fail after having made some progress:
517
+ // - S is what's produced on complete success,
518
+ // - T is what's produced if an operation fails part of the way through
519
+ type NonatomicResult <S , T , Err > = Result <S , PartialResult <T , Err >>;
520
+
521
+ // Ergonomically throw out the partial result
522
+ impl <T , Err > FromError <PartialResult <T , Err > for Err { ... }
523
+ ```
524
+
525
+ The `NonatomicResult ` type (which could use a shorter name )
526
+ encapsulates the common pattern of operations that may fail after
527
+ having made some progress . The `PartialResult ` type then returns the
528
+ progress that was made along with the error , but with a `FromError `
529
+ implementation that makes it trivial to throw out the partial result
530
+ if desired .
531
+
532
+ ### `Reader `
533
+ [Reader ]: #reader
534
+
535
+ The updated `Reader ` trait (and its extension ) is as follows :
536
+
537
+ ```rust
538
+ trait Reader {
539
+ type Err ; // new associated error type
540
+
541
+ // unchanged except for error type
542
+ fn read (& mut self , buf : & mut [u8 ]) -> Result <uint , Err >;
543
+
544
+ // these all return partial results on error
545
+ fn read_to_end (& mut self ) -> NonatomicResult <Vec <u8 >, Vec <u8 >, Err > { ... }
546
+ fn read_to_string (& self ) -> NonatomicResult <String , Vec <u8 >, Err > { ... }
547
+ fn read_at_least (& mut self , min : uint , buf : & mut [u8 ]) -> NonatomicResult <uint , uint , Err > { ... }
548
+ }
549
+
550
+ // extension trait needed for object safety
551
+ trait ReaderExt : Reader {
552
+ fn bytes (& mut self ) -> Bytes <Self > { ... }
553
+ fn chars <'r >(& 'r mut self ) -> Chars <'r , Self , Err > { ... }
554
+
555
+ ... // more to come later in the RFC
556
+ }
557
+ impl <R : Reader > ReaderExt for R {}
558
+ ```
559
+
560
+ #### Removed methods
561
+
562
+ The proposed ` Reader ` trait is much slimmer than today's. The vast
563
+ majority of removed methods are parsing/deserialization, which were
564
+ discussed above.
565
+
566
+ The remaining methods (` read_exact ` , ` push ` , ` push_at_least ` ) were
567
+ removed largely because they are * not memory safe* : they involve
568
+ extending a vector's capacity, and then * passing in the resulting
569
+ uninitialized memory* to the ` read ` method, which is not marked
570
+ ` unsafe ` ! Thus the current design can lead to undefined behavior in
571
+ safe code.
572
+
573
+ The solution is to instead extend ` Vec<T> ` with a useful unsafe method:
574
+
575
+ ``` rust
576
+ unsafe fn with_extra (& mut self , n : uint ) -> & mut [T ];
577
+ ```
578
+
579
+ This method is equivalent to calling ` reserve(n) ` and then providing a
580
+ slice to the memory starting just after ` len() ` entries. Using this
581
+ method, clients of ` Reader ` can easily recover the above removed
582
+ methods, but they are explicitly marking the unsafety of doing so.
583
+
584
+ (Note: ` read_to_end ` is currently not memory safe for the same reason,
585
+ but is considered a very important convenience. Thus, we will continue
586
+ to provide it, but will zero the slice beforehand.)
587
+
588
+ ### ` Writer `
589
+ [ Writer ] : #writer
590
+
591
+ The ` Writer ` trait is cut down to even smaller size:
592
+
593
+ ``` rust
594
+ trait Writer {
595
+ type Err ;
596
+ fn write (& mut self , buf : & [u8 ]) -> Result <uint , Err >;
597
+
598
+ fn write_all (& mut self , buf : & [u8 ]) -> NonatomicResult <(), uint , Err > { ... };
599
+ fn write_fmt (& mut self , fmt : & fmt :: Arguments ) -> Result <(), Err > { ... }
600
+ fn flush (& mut self ) -> Result <(), Err > { ... }
601
+ }
602
+ ```
603
+
604
+ The biggest change here is to the semantics of ` write ` . Instead of
605
+ repeatedly writing to the underlying IO object until all of ` buf ` is
606
+ written, it attempts a * single* write and on success returns the
607
+ number of bytes written. This follows the long tradition of blocking
608
+ IO, and is a more fundamental building block than the looping write we
609
+ currently have.
610
+
611
+ For convenience, ` write_all ` recovers the behavior of today's ` write ` ,
612
+ looping until either the entire buffer is written or an error
613
+ occurs. In the latter case, however, it now also yields the number of
614
+ bytes that had been written prior to the error.
615
+
616
+ The ` write_fmt ` method, like ` write_all ` , will loop until its entire
617
+ input is written or an error occurs. However, it does not return a
618
+ ` NonatomicResult ` because the number of bytes written cannot be
619
+ straightforwardly interpreted -- the actual byte sequence written is
620
+ determined by the formatting system.
621
+
622
+ The other methods include endian conversions (covered by
623
+ serialization) and a few conveniences like ` write_str ` for other basic
624
+ types. The latter, at least, is already uniformly (and extensibly)
625
+ covered via the ` write! ` macro. The other helpers, as with ` Reader ` ,
626
+ should migrate into a more general (de)serialization library.
451
627
452
628
## String handling
453
629
[ String handling ] : #string-handling
@@ -473,12 +649,229 @@ throughout IO, we can go on to explore the modules in detail.
473
649
### ` core::io `
474
650
[ core::io ] : #coreio
475
651
476
- > To be added in a follow-up PR.
652
+ The ` io ` module is split into a the parts that can live in ` libcore `
653
+ (most of it) and the parts that are added in the ` std::io `
654
+ facade. Being able to move components into ` libcore ` at all is made
655
+ possible through the use of
656
+ [ associated error types] ( #revising-reader-and-writer ) for ` Reader ` and
657
+ ` Writer ` .
658
+
659
+ #### Adapters
660
+ [ Adapters ] : #adapters
661
+
662
+ The current ` std::io::util ` module offers a number of ` Reader ` and
663
+ ` Writer ` "adapters". This RFC refactors the design to more closely
664
+ follow ` std::iter ` . Along the way, it generalizes the ` by_ref ` adapter:
665
+
666
+ ``` rust
667
+ trait ReaderExt : Reader {
668
+ // already introduced above
669
+ fn bytes (& mut self ) -> Bytes <Self > { ... }
670
+ fn chars <'r >(& 'r mut self ) -> Chars <'r , Self , Err > { ... }
671
+
672
+ // Reify a borrowed reader as owned
673
+ fn by_ref <'a >(& 'a mut self ) -> ByRef <'a , Self > { ... }
674
+
675
+ // Read everything from `self`, then read from `next`
676
+ fn chain <R : Reader >(self , next : R ) -> Chain <Self , R > { ... }
677
+
678
+ // Adapt `self` to yield only the first `limit` bytes
679
+ fn take (self , limit : u64 ) -> Take <Self > { ... }
680
+
681
+ // Whenever reading from `self`, push the bytes read to `out`
682
+ fn tee <W : Writer >(self , out : W ) -> Tee <Self , W > { ... }
683
+ }
684
+ impl <T : Reader > ReaderExt for T {}
685
+
686
+ trait WriterExt : Writer {
687
+ // Reify a borrowed writer as owned
688
+ fn by_ref <'a >(& 'a mut self ) -> ByRef <'a , Self > { ... }
689
+
690
+ // Whenever bytes are written to `self`, write them to `other` as well
691
+ fn carbon_copy <W : Writer >(self , other : W ) -> CarbonCopy <Self , W > { ... }
692
+ }
693
+ impl <T : Writer > WriterExt for T {}
694
+
695
+ // An adaptor converting an `Iterator<u8>` to a `Reader`.
696
+ pub struct IterReader <T > { ... }
697
+ ```
698
+
699
+ As with ` std::iter ` , these adapters are object unsafe an hence placed
700
+ in an extension trait with a blanket ` impl ` .
701
+
702
+ Note that the same ` ByRef ` type is used for both ` Reader ` and ` Writer `
703
+ -- and this RFC proposes to use it for ` std::iter ` as well. The
704
+ insight is that there is no difference between the * type* used for
705
+ by-ref adapters in any of these cases; what changes is just which
706
+ trait defers through it. So, we propose to add the following to ` core::borrow ` :
707
+
708
+ ``` rust
709
+ pub struct ByRef <'a , Sized ? T : 'a > {
710
+ pub inner : & 'a mut T
711
+ }
712
+ ```
713
+
714
+ which will allow ` impl ` s like the following in ` core::io ` :
715
+
716
+ ``` rust
717
+ impl <'a , W : Writer > Writer for ByRef <'a , W > {
718
+ #[inline]
719
+ fn write (& mut self , buf : & [u8 ]) -> Result <uint , W :: Err > { self . inner. write (buf ) }
720
+
721
+ #[inline]
722
+ fn flush (& mut self ) -> Result <(), W :: Err > { self . inner. flush () }
723
+ }
724
+ ```
725
+
726
+ #### Free functions
727
+ [ Free functions ] : #free-functions
728
+
729
+ The current ` std::io::util ` module also includes a number of primitive
730
+ readers and writers, as well as ` copy ` . These are updated as follows:
731
+
732
+ ``` rust
733
+ // A reader that yields no bytes
734
+ fn empty () -> Empty ;
735
+
736
+ // A reader that yields `byte` repeatedly (generalizes today's ZeroReader)
737
+ fn repeat (byte : u8 ) -> Repeat ;
738
+
739
+ // A writer that ignores the bytes written to it (/dev/null)
740
+ fn sink () -> Sink ;
741
+
742
+ // Copies all data from a Reader to a Writer
743
+ pub fn copy <E , R , W >(r : & mut R , w : & mut W ) -> NonatomicResult <(), uint , E > where
744
+ R : Reader <Err = E >,
745
+ W : Writer <Err = E >
746
+ ```
747
+
748
+ #### Seeking
749
+ [Seeking ]: #seeking
750
+
751
+ The seeking infrastructure is largely the same as today 's , except that
752
+ `tell ` is renamed to follow the RFC 's design principles and the `seek `
753
+ signature is refactored with more precise types :
754
+
755
+ ```rust
756
+ pub trait Seek {
757
+ type Err ;
758
+ fn position (& self ) -> Result <u64 , Err >;
759
+ fn seek (& mut self , pos : SeekPos ) -> Result <(), Err >;
760
+ }
761
+
762
+ pub enum SeekPos {
763
+ FromStart (u64 ),
764
+ FromEnd (u64 ),
765
+ FromCur (i64 ),
766
+ }
767
+ ```
768
+
769
+ #### Buffering
770
+ [ Buffering ] : #buffering
771
+
772
+ The current ` Buffer ` trait will be renamed to ` BufferedReader ` for
773
+ clarity (and to open the door to ` BufferedWriter ` at some later
774
+ point):
775
+
776
+ ``` rust
777
+ pub trait BufferedReader : Reader {
778
+ fn fill_buf (& mut self ) -> Result <& [u8 ], Self :: Err >;
779
+ fn consume (& mut self , amt : uint );
780
+
781
+ // This should perhaps yield an iterator
782
+ fn read_until (& mut self , byte : u8 ) -> NonatomicResult <Vec <u8 >, Vec <u8 >, Self :: Err > { ... }
783
+ }
784
+
785
+ pub trait BufferedReaderExt : BufferedReader {
786
+ fn lines (& mut self ) -> Lines <Self , Self :: Err > { ... };
787
+ }
788
+ ```
789
+
790
+ In addition, ` read_line ` is removed in favor of the ` lines ` iterator,
791
+ and ` read_char ` is removed in favor of the ` chars ` iterator (now on
792
+ ` ReaderExt ` ). These iterators will be changed to yield
793
+ ` NonatomicResult ` values.
794
+
795
+ The ` BufferedReader ` , ` BufferedWriter ` and ` BufferedStream ` types stay
796
+ essentially as they are today, except that for streams and writers the
797
+ ` into_inner ` method yields any errors encountered when flushing,
798
+ together with the remaining data:
799
+
800
+ ``` rust
801
+ // If flushing fails, you get the unflushed data back
802
+ fn into_inner (self ) -> NonatomicResult <W , Vec <u8 >, W :: Err >;
803
+ ```
804
+
805
+ #### ` MemReader ` and ` MemWriter `
806
+ [ MemReader and MemWriter ] : #memreader-and-memwriter
807
+
808
+ The various in-memory readers and writers available today will be
809
+ consolidated into just ` MemReader ` and ` MemWriter ` :
810
+
811
+ ` MemReader ` (like today's ` BufReader ` )
812
+ - construct from ` &[u8] `
813
+ - implements ` Seek `
814
+
815
+ ` MemWriter `
816
+ - construct freshly, or from a ` Vec `
817
+ - implements ` Seek `
818
+
819
+ Both will allow decomposing into their inner parts, though the exact
820
+ details are left to the implementation.
821
+
822
+ The rationale for this design is that, if you want to read from a
823
+ ` Vec ` , it's easy enough to get a slice to read from instead; on the
824
+ other hand, it's rare to want to write into a mutable slice on the
825
+ stack, as opposed to an owned vector. So these two readers and writers
826
+ cover the vast majority of in-memory readers/writers for Rust.
827
+
828
+ In addition to these, however, we will have the following ` impl ` s
829
+ directly on slice/vector types:
830
+
831
+ * ` impl Writer for Vec<u8> `
832
+ * ` impl Writer for &mut [u8] `
833
+ * ` impl Reader for &[u8] `
834
+
835
+ These ` impls ` are convenient and efficient, but do not implement ` Seek ` .
477
836
478
837
### The ` std::io ` facade
479
838
[ The std::io facade ] : #the-stdio-facade
480
839
481
- > To be added in a follow-up PR.
840
+ The ` std::io ` module will largely be a facade over ` core::io ` , but it
841
+ will add some functionality that can live only in ` std ` .
842
+
843
+ #### ` Errors `
844
+ [ Errors ] : #error
845
+
846
+ The ` IoError ` type will be renamed to ` std::io::Error ` , following our
847
+ [ non-prefixing convention] ( https://github.com/rust-lang/rfcs/pull/356 ) .
848
+ It will remain largely as it is today, but its fields will be made
849
+ private. It may eventually grow a field to track the underlying OS
850
+ error code.
851
+
852
+ The ` IoErrorKind ` type will become ` std::io::ErrorKind ` , and
853
+ ` ShortWrite ` will be dropped (it is no longer needed with the new
854
+ ` Writer ` semantics), which should decrease its footprint. The
855
+ ` OtherIoError ` variant will become ` Other ` now that ` enum ` s are
856
+ namespaced.
857
+
858
+ #### Channel adapters
859
+ [ Channel adapters ] : #channel-adapters
860
+
861
+ The ` ChanReader ` and ` ChanWriter ` adapters will be kept exactly as they are today.
862
+
863
+ #### ` stdin ` , ` stdout ` , ` stderr `
864
+ [ stdin, stdout, stderr ] : #stdin-stdout-stderr
865
+
866
+ Finally, ` std::io ` will provide a ` stdin ` reader and ` stdout ` and
867
+ ` stderr ` writers. These will largely work as they do today, except
868
+ that we will hew more closely to the traditional setup:
869
+
870
+ * ` stderr ` will be unbuffered and ` stderr_raw ` will therefore be dropped.
871
+ * ` stdout ` will be line-buffered for TTY, fully buffered otherwise.
872
+ * most TTY functionality in ` StdReader ` and ` StdWriter ` will be moved
873
+ to ` os::unix ` , since it's not yet implemented on Windows.
874
+ * ` stdout_raw ` and ` stderr_raw ` will be removed.
482
875
483
876
### ` std::env `
484
877
[ std::env ] : #stdenv
0 commit comments