Skip to content

Change EOL convention to LF? #2

Closed
@masinter

Description

@masinter

@rmkaplan reported:

Windows may still be the outlier, Mac OSX and Unix/Linux are LF.

It’s really a question of what the default EOL convention should be when output streams are created. It shouldn’t matter for input streams, at least for the operations that read characters and not bytes. The character reading functions should all recognize any of the EOL conventions on files and map them into the internal CR (the value of (CHARCODE EOL). (I remember setting it up that way—silly to have to know where or how a file was created before you could read it).

I thought that Unicode might have something to say about this, but they aren’t very helpful. They point out (in the Unicode 3.0 book that I have) that the CR/LF/CRLF conventions are confused…and then they add to the confusion. They define a new code U+2028 as the unambiguous “line separator” (also an unambiguous “paragraph separator" U+2029).

My Xerox XCCS book doesn’t say anything about this, so I’m not sure what the representation is in XCCS-compliant files (which would have run-codes for mixed character-set strings, but control characters are unique in any run).

My temptation is to change the default, so that we are more compatible with Unix/Mac files. We were never compatible with Windows/CRLF. If not for all files, then at least for UTF8 and UTF16 files.

In prowling around, I have also discovered that some of the low-level files got corrupted by the Japanese. A substantial part of the LLREAD file, for example, is filled with conversion tables for various Japanese coding systems, and this stuff is mixed in in a number of other places. Should have been in separate and later files—hard to imagine that these would be needed in the INIT.SYSOUT.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions