Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile to javascript and use it as local storage #117

Closed
samoht opened this issue Jan 26, 2015 · 28 comments
Closed

Compile to javascript and use it as local storage #117

samoht opened this issue Jan 26, 2015 · 28 comments
Labels
type/feature Introduce a new feature

Comments

@samoht
Copy link
Member

samoht commented Jan 26, 2015

It is theoretically possible to compile Irmin to Javascript but in practice we are not here yet. Most of the C externals which appears in the bytecode are related to https://github.com/mirleft/ocaml-nocrypto. Might be worth looking at https://github.com/evanvosberg/crypto-js for the missing ones.

@samoht samoht added type/feature Introduce a new feature enhancement labels Jan 26, 2015
@talex5
Copy link
Contributor

talex5 commented Feb 17, 2015

I also needed to provide JS helpers for:

  • caml_blit_string_to_bigstring
  • caml_blit_bigstring_to_string
  • bin_prot_blit_string_buf_stub
  • bin_prot_blit_buf_stub

@avsm
Copy link
Member

avsm commented Feb 17, 2015

It would be most useful to separate the bin_prot and bigstring JS helpers to enable Core_kernel to be javascripted up easily. /cc @yminsky @agarwal

@dbuenzli
Copy link

I think ocaml-nocrypto should provide a Js backend. Since they should be mostly libc independent their C can likely easily go through emscripten.

@dbuenzli
Copy link

Btw @samoht if all the crypto you need is SHA1 there a pure ocaml implementation in uuidm here.

@samoht
Copy link
Member Author

samoht commented Feb 17, 2015

See mirleft/ocaml-nocrypto#48

For Irmin, it is already possible (but maybe not documented well enough) to change the Hash module. It is not yet possible to do it in ocaml-git but should be straightforward to add (see mirage/ocaml-git#69)

For instance, if you want a pure in-memory backend, with a custom hash function:

module My_hash: Irmin.Hash = struct ... end
module Store = Irmin_mem.Make(Irmin.Contents.String)(Irmin.Tag.String)(My_hash)

Note: in this case, all of the "basic" API is not available anymore, and the user should use Store.<fn> instead of Irmin.<fn>.

@talex5
Copy link
Contributor

talex5 commented Feb 17, 2015

@samoht yes, but Irmin still pulls in Nocrypto by default and crashes on startup (uncaught exception: 0,0,Failure,ml_z_init not implemented). This happens just by having the Irmin module linked in to the application, even if you don't use it.

@dbuenzli thanks - I was looking for a pure OCaml SHA1 and couldn't find one (was using Digest's MD5 instead, but having SHA1 available is much better).

@samoht
Copy link
Member Author

samoht commented Feb 17, 2015

uncaught exception: 0,0,Failure,ml_z_init not implemented

This is due to zarith auto-init code (Z.init). That's annoying ... I'm fine to switch to Daniel pure-ocaml impl of SHA1 for now on. This needs to go into ocaml-git as well. Speaking of which: @dbuenzli do you have any pure implementation of the delfate zlib algorithm by any chance? :p

@avsm
Copy link
Member

avsm commented Feb 17, 2015

This one works pretty well in JS, https://github.com/imaya/zlib.js although I tool would prefer a zlibm from @dbuenzli :)

@dbuenzli
Copy link

Le mardi, 17 février 2015 à 13:05, Thomas Gazagnaire a écrit :

@dbuenzli (https://github.com/dbuenzli) do you have any pure implementation of the delfate zlib algorithm by any chance?

Unfortunately I just have sketched a non-blocking API so far, implementing deflatem is still on my list at the moment…

Daniel

@pqwy
Copy link

pqwy commented Feb 17, 2015

@samoht Zarith failure could conceivably be solved by moving nocrypto from packed format to a top-level module which exposes proper module aliases for the sub-parts. In that case, if you are using just the hashes, the rest shouldn't even be linked in. Thoughts?

@samoht
Copy link
Member Author

samoht commented Feb 17, 2015

Unfortunately I just have sketched a non-blocking API so far, implementing deflatem is still on my list at the moment…

note that a delflatem doing 0 compression (using level=0) would be more than enough for our needs.

@pqwy in that case, need to turn-on the relevant option (-no-alias-deps) in the build system, not sure how to do that properly (but should be doable). but that seems like a good solution!

@avsm
Copy link
Member

avsm commented Feb 17, 2015

Note that using module aliases will lift nocrypto to be 4.02.0+ only. Perhaps it's time to just switch...

@pqwy
Copy link

pqwy commented Feb 17, 2015

Asking precisely because of 4.02.0+. Planned on doing that, was holding off until a major 4.01.0 deprecation MirageOS-wide.

@avsm
Copy link
Member

avsm commented Feb 17, 2015

every major deprecation starts with one small uprising. Perhaps nocrypto can lead the 4.01 retirement charge.

@pqwy
Copy link

pqwy commented Feb 17, 2015

DEPRECATE ALL THE THINGS
allthethings.

@dsheets
Copy link
Member

dsheets commented Feb 17, 2015

allthecamels

@dbuenzli
Copy link

Le mardi, 17 février 2015 à 18:45, David Kaloper a écrit :

@samoht (https://github.com/samoht) Zarith failure could conceivably be solved by moving nocrypto from packed format to a top-level module which exposes proper module aliases for the sub-parts.

At a certain point I'm sure we still want to be able to use nocrypto with js_of_ocaml…

Daniel

@ztlpn
Copy link
Contributor

ztlpn commented Apr 27, 2015

I managed to get Irmin in the browser almost working (using @talex5's codes as a guide ;)).

There is an ugly issue though. The setup I am trying to get working is Irmin_mem store in the browser pulling from Irmin_fs store via XHR requests. Pulling fails because of mismatch between commit hashes calculated by different backends. I tracked issue down to bin_prot serialization of big integers, e.g. dates, uids (in particular integers bigger than 0x7fff, so not really that big ;)). For instance the following code:
Cstruct.hexdump (Tc.write_cstruct (module Tc.Int64) (Int64.of_int 0x8000))
when run in the browser outputs fc 00 80 00 00 00 00 00 00 and not fd 00 80 00 00 as it should.

To be fair js_of_ocaml spits out a bunch of warnings during compilation:

Warning: integer overflow: integer 0xffffffff truncated to 0xffffffff; the generated code might be incorrect.
Warning: integer overflow: integer 0x80000000 truncated to 0x80000000; the generated code might be incorrect.
Warning: integer overflow: integer 0x100000000 truncated to 0x0; the generated code might be incorrect.
Warning: integer overflow: integer 0x80000000 truncated to 0x80000000; the generated code might be incorrect.
Warning: integer overflow: integer 0x100000000 truncated to 0x0; the generated code might be incorrect.
Warning: integer overflow: integer 0x80000000 truncated to 0x80000000; the generated code might be incorrect.

and in this case the incorrect translation of this bin_prot code seems to be the culprit.

Not sure what to do with this bug though. Seems to be a bug/limitation of js_of_ocaml, but not easily fixable. Any suggestions?

@samoht
Copy link
Member Author

samoht commented Apr 27, 2015

Thanks for the report, that's very interesting! Would be interesting to see if you can reproduce your example without Tc (it should not make any difference normally see [1]) and if yes report the issue upstream (to the bin_prot issue tracker).

As a short-term workaround, you can try to make your own serialisation format for the blob, tree and commits. Something like that might work (ie. adapting Ir_s.Make to serialise everything into JSON:

module JSON = struct
  let to_string t = Ezjsonm.(to_string (wrap t))
  let of_string s = Ezjsonm.(unwrap (from_string s))

  let write t buf =
    let str = to_string t in
    let len = String.length str in
    Cstruct.blit_from_string str 0 buf 0 len;
    Cstruct.shift buf len

  let read buf =
    Mstruct.get_string buf (Mstruct.length buf)
    |> of_string

  let size_of t =
    let str = to_string t in
    String.length str

  let make (type t) (module S: Tc.S0 with type t = t):
    t Tc.writer * t Tc.reader * t Tc.size_of =
    (fun t -> write (S.to_json t)),
    (fun b -> S.of_json (read b)),
    (fun t -> size_of (S.to_json t))

end

module Make (AO: Irmin.AO_MAKER) (RW: Irmin.RW_MAKER)
    (C: Irmin.Contents.S) (T: Irmin.Tag.S) (H: Irmin.Hash.S)
= struct
  module X = struct
    module Contents = Irmin.Contents.Make (struct
        include AO(H)(C)
        module Key = H
        module Val = struct
          include C
          let write, read, size_of = JSON.make (module C)
        end
      end)
    module Node = struct
      module Key = H
      module Val = struct
        module X = Irmin.Private.Node.Make (H)(H)(C.Path)
        include X
        let write, read, size_of = JSON.make (module X)
      end
      module Path = C.Path
      include AO(Key)(Val)
    end
    module Commit = struct
      module Key = H
      module Val  = struct
        module X = Irmin.Private.Commit.Make(H)(H)
        include X
        let write, read, size_of = JSON.make (module X)
      end
      include AO (Key)(Val)
    end
    module Tag = struct
      module Key = T
      module Val = H
      include RW (Key)(Val)
    end
    module Slice = Irmin.Private.Slice.Make(Contents)(Node)(Commit)
    module Sync = Irmin.Private.Sync.None(H)(T)
  end
  include Irmin.Make_ext(X)
end

A longer-term but better fix (but a bit more involved...) would be to use a the Git format instead of bin-prot, which means porting ocaml-git to js_of_ocaml (main blocking think is that library do not properly separate zlib call to the serialization bits, so it's not so easy to port without porting zlib to js_of_ocaml).

[1] https://github.com/mirage/mirage-tc/blob/master/lib/tc.ml#L670

@avsm
Copy link
Member

avsm commented Apr 27, 2015

Is there a js_of_ocaml bug upstream about the serialisation at all?

@ztlpn
Copy link
Contributor

ztlpn commented Apr 27, 2015

I don't think it is bin_prot issue at all, because the test case stripped to its essence is simply

if 0 < 0x80000000 then print_string "ok\n" else print_string "fail\n"

I will open an issue on js_of_ocaml tracker but based on the fact that the warning is emitted the issue is known and not easy to deal with.

@samoht Thanks! Will try your workaround tomorrow.

@talex5
Copy link
Contributor

talex5 commented May 7, 2015

@ztlpn is your code available publicly? I'd like to get remote sync working from the browser too.

@samoht
Copy link
Member Author

samoht commented May 7, 2015

btw the issue of truncated 64b ints has been reported upstream: ocsigen/js_of_ocaml#285

@ztlpn
Copy link
Contributor

ztlpn commented May 13, 2015

@talex5 Not at the moment - the setup is very hackish at the moment (e.g. I had to use an nginx proxy in front of Irmin http server to work around CORS issues) but I plan to clean it up and release a minimal example of js <-> http server sync which would be useful for others.

@talex5
Copy link
Contributor

talex5 commented May 14, 2015

Ah, OK. I'm working on adding support to CueKeeper at the moment. I'm using my own server code so that I can host the static files on the same server that stores the repository, and provide a more limited API (just fast-forward push and fetch).

@talex5
Copy link
Contributor

talex5 commented May 14, 2015

So, to summarise the bit_prot issue:

When writing out an int, bin_prot does if n >= 0x80000000 to check whether it would overflow a 32-bit signed integer field. If so, it writes out a 1 byte tag followed by 8 bytes of data. If not, it writes out a different tag and 4 bytes of data.

On a 32-bit system 0x80000000 is negative, but the test is only present when compiled for a 64-bit target. However, for Javascript we usually compile for 64-bit and then run js_of_ocaml on the 64-bit code, which truncates everything to 32-bit (not sure why; Javascript uses doubles for everything and so should be OK up to about 52 bits).

When we ask to write out a smallish int value such as 0x10000 using bin_prot on Javascript, it therefore decides to use the longer 9 byte representation even though 5 bytes would have been fine.

When we sync, we send the data as JSON to the server, which then converts it back to binary, but this time choosing the shorter representation, which therefore has a different hash. Presumably, the same problem would happen in reverse when going the other way.

Using a 32-bit compiler would avoid the issue, but it's annoying to have to switch compilers when compiling different parts of the application.

If the problem is fixed in a later version, we should still be able to load data saved with the longer representation, but we won't be able to sync it (therefore, history must be lost or regenerated when upgrading).

@dsheets
Copy link
Member

dsheets commented May 14, 2015

In JavaScript, bitwise integer ops are 32-bit so, e.g., x|0 is a 32-bit cast. JITs know this and optimize appropriately. js_of_ocaml uses this to attempt to get some semblance of sane numerical operations on its target.

@talex5
Copy link
Contributor

talex5 commented Jun 19, 2015

Closing this bug, as the remaining fixes are not part of Irmin.

For reference, to use Irmin with js_of_ocaml you need this pin:

opam pin add bin_prot 'https://github.com/talex5/bin_prot.git#js_of_ocaml'

This removes the 64-bit optimisations that go wrong when recompiled to 32-bit Javascript.

You also need to pass helpers.js on the js_of_ocaml command line. This provides the stubs for blitting strings to/from cstruct/binprot buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Introduce a new feature
Projects
None yet
Development

No branches or pull requests

7 participants