Skip to content
This repository has been archived by the owner on Apr 13, 2023. It is now read-only.

Move the non-IO parts of ceylon.io into a cross-platform module #449

Closed
ASzc opened this issue Nov 2, 2015 · 9 comments · Fixed by #457
Closed

Move the non-IO parts of ceylon.io into a cross-platform module #449

ASzc opened this issue Nov 2, 2015 · 9 comments · Fixed by #457
Milestone

Comments

@ASzc
Copy link
Contributor

ASzc commented Nov 2, 2015

The data representation (Buffers) and text/data conversion (Charsets, Base64) features of ceylon.io could be adapted without much effort (most of the work is adding a pure Ceylon ByteBuffer) to use on the JS backend.

However, one of the limitations of the native annotation in a cross-platform module is that all shared things must implement both backends. As the actual Input/Output parts (FileDescriptors, etc.) of ceylon.io can't be made to work in browsers (though maybe they could work on node.js?), I propose splitting the module.

The new cross-platform module might be called ceylon.binary (that being the common theme), but maybe there is a better name. I don't think many modifications would be required to the remaining ceylon.io code to allow it to import/use ceylon.binary. One thing I did spot was some things relying on underlyingBuffer instead of implementation from the ByteBuffer interface, but that's a quick fix.

I'm willing to work on this myself, if people agree it should be done.

@tombentley
Copy link

How about ceylon.buffer? I know that doesn't quite cover the encoding bits, but it seems a little more obvious to me than ceylon.binary.

@quintesse
Copy link
Contributor

Btw, there are plans to allow shared native declarations that aren't cross-platform (see ceylon/ceylon-spec#1459) , although personally I'd only use it if the amount of non-crossplatform code would be small in comparison to the rest. Otherwise splitting it off into a separate module might not be a bad idea.

@FroMage FroMage added this to the 1.3 milestone Nov 2, 2015
@ASzc
Copy link
Contributor Author

ASzc commented Nov 2, 2015

@tombentley yes ceylon.buffer might be better, as "binary" doesn't have an obvious context to it. Buffers are (at least) one side to all operations done in the module (the other side being text/String within the module, and FileDescriptor in ceylon.io).

@quintesse Good, I see that making native more useful. Although my initial approach tried to keep ceylon.io together, I think a new module is clearer, as you say.

I will wait a few more days for further comment, then start working on the PR if there are no objections.

@ASzc
Copy link
Contributor Author

ASzc commented Nov 11, 2015

Started working on this here. Not quite ready for a PR yet, one of the tests fails on JavaScript, not sure why.

@ASzc
Copy link
Contributor Author

ASzc commented Nov 12, 2015

This may be a bug in the utf16 Charset, rather than the new ByteBuffer implementation. testUTF16Encoder produces:

ceylon.test::AssertionComparisonError "Encoding of z水𐀀𝄞􏿽
expected <[0, 122, 108, 52, 216, 0, 220, 0, 216, 52, 221, 30, 219, 255, 223, 253]>
 but was <[0, 122, 108, 52, 216, 0, 220, 0, 216, 52, 221, 30, 221, 30]>"

This doesn't look like something a ByteBuffer bug would case (and all its own tests pass). I see most of the Charset implementations do some bit manipulation. I know from Svalinn that this doesn't work reliably cross-backend, so that might be the source of the problem.

@ASzc
Copy link
Contributor Author

ASzc commented Nov 12, 2015

Similar problem with utf8 and that test string:

ceylon.test::AssertionComparisonError "Encoding of z水𐀀𝄞􏿽
expected <[122, 230, 176, 180, 240, 144, 128, 128, 240, 157, 132, 158, 244, 143, 191, 189]>
 but was <[122, 230, 176, 180, 240, 144, 128, 128, 240, 157, 132, 158, 237, 180, 158]>"

Expected is Python's encoding, generated with:

", ".join("#"+hex(x)[2:] for x in "z水𐀀𝄞􏿽".encode("utf-8"))

The bit manipulations I see in both seem fairly safe; they don't appear to rely on a fixed integerAddressableSize. Any ideas @FroMage?

@FroMage
Copy link
Contributor

FroMage commented Nov 13, 2015

Well it has to be a bit issue. You should be able to track it down using a small test (that one char that causes issues) and print statements. (or run under the browser's JS debugger).

@ASzc
Copy link
Contributor Author

ASzc commented Nov 13, 2015

I added print statements into CharacterBuffer.get and on encode's codepoint value. They're different between JVM and JS, so it's different before it gets to the bit manipulation. Will look deeper at CharacterBuffer...

Printing the String literal and printing CharacterBuffer.array after copyTo:

JVM:

𐀀􏿽􏿽
{ 𐀀, 􏿽, 􏿽 }

JS:

𐀀􏿽􏿽
{ 𐀀, 􏿽, � }

If I swap out string.copyTo(array); for:

for (i->c in string.indexed) {
    array.set(i, c);
}

Then JS returns (the same as JVM):

𐀀􏿽􏿽
{ 𐀀, 􏿽, 􏿽 }

With that change in place, all ceylon.buffer tests pass!

So, it looks like we've discovered a bug in whatever the native JS implementation is for copyTo?

@ASzc
Copy link
Contributor Author

ASzc commented Nov 14, 2015

Pull request filed (#457) with the workaround in place.

@ASzc ASzc closed this as completed in #457 Feb 13, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants