dlopen(), random, I/O, eval, and co-expressions (co-routines) #1843

nicowilliams · 2019-02-25T03:43:46Z

This PR adds:

access to the system RNG on *nix (via /dev/urandom) and Windows (via CNG)
dynamically-loadable C-coded builtin support for modules
support for "builtin modules" named "jq/..."
support for "file handles"
a builtin "jq/io" module that have more advanced I/O facilities using file handles (the user must use the -I/--io command-line option to enable this)
a builtin "jq/proc" module that has bindings for popen(3) and system(3) (the user must use the -I/--io command-line option to enable this)
unwind(protect) -- basically a finally
fixes try/catch catches more than it should #1859 and ER: parsing of try ... catch ... #1608, and introduces a try-catch-finally as well
eval
co-expressions (co-routines)

See README.plugins.md for details on how to write C-coded functions for jq modules.

Windows builds, but who knows if it works.

TBD:

add support for some sort of authorization policy for I/O (sandbox)
code review

coveralls · 2019-02-25T04:18:48Z

Coverage decreased (-15.5%) to 68.633% when pulling f339f1d on nicowilliams:dlopen into e7d3798 on stedolan:master.

nicowilliams · 2019-03-05T06:02:13Z

I just wrote a little tool to extract function names and types from DWARF dumps, scripts/dwarf2h. I'm using it to generate the typedefs, v-tables, macros, and such for the plugin system.

We probably don't want to run this from the build, as it would add a dependency on llvm-6.0, though maybe in a clang build we could do it. Besides, we want to have a struct per-ABI number... So generating this will have to be a manual step to be run whenever a jv or jq function is added that plugins should be able to call.

nicowilliams · 2019-03-07T05:20:22Z

@elric1 here's dlopen() functionality for jq. @muhmuhten's work made this easier.

@muhmuhten maybe you'd like to take a look? Now we're not far from being able to have builtin modules.

@joelpurra I notice you have a jq-coded PRNG. Maybe now you can have a proper RNG. tests/modules/somod/ has a trivial function to read random numbers from /dev/urandom, and it wouldn't be hard to make this better and work on Windows, and then make a builtin proper.

nicowilliams · 2019-03-07T05:46:35Z

This needs a way to say: don't export the C-coded builtins. Then we could actually have something like C-coded builtins that provide file I/O with file handles and all. The jq-coded wrappers can use try/catch to correctly cleanup on scope exit. We could have something like:

def appendfile(output):
    _openappend as $fh
  | try _writefh($fh; output) catch (_closefh($fh),error);

Here the file handle does not even leak, so this is very idiomatic. The value of $fh need not even be unpredictable, since _writefh wouldn't be exposed, but if file handle values are unpredictable then we can export writefh and have something of a with_open_files:

def with_open_files(body):
    [.[]|_open] as $fhs
  | $fhs
  | try body catch (($fhs[] | _closefh),error);

  [{name:"foo",write:true},{name:"bar",append:true}]
| with_open_files(. as $fhs | "hello world" | (writefh($fhs[0]), writefh($fhs[1]))

nicowilliams · 2019-03-07T06:28:14Z

And now we can haz private C-coded builtins.

muhmuhten · 2019-03-08T17:09:42Z

Huh. I would've expected private C-coded functions by tossing the corresponding C module on the libs list separately from the module's jq code. Then you'd always bind the jq code to the importer, you'd always bind the C module to the jq part, and optionally also bind the C functions to the importer based on the modulemeta.

Granted, that gets less export granularity, but pulls the export-or-not check up two levels. Not actually sure whether that's an improvement...

Another thought would be to add a significant "exported" key to the modulemeta and throw out non-exported top-level functions in bind_block_referenced where it sort of thematically belongs. The exported flag on cfunctions can be automatically added to that, perhaps.

nicowilliams · 2019-03-08T17:15:35Z

@muhmuhten Interesting idea. That would complicate the linker code a bit, since I'd have to add a pseudo-library with a name that cannot be imported in the middle of loading the shared object. The granularity, I think, is nice. I'll think about it.

muhmuhten · 2019-03-08T17:49:19Z

Third one sounded nice in my head but the real benefits out of it (unexported jq functions, building a list of exported functions for modulemeta to use) probably can't be realized until the builtins are a real module. I was looking into resolving library linking not being able to drop unuseds by deferring all binding until the end but that's a bit stalled on wrapping my head around import-as.

dynamically-loaded modules as a solution for the I/O problem looks pretty neat, actually.

(Seems like the easiest such interface to implement would basically be explicitly throwing around file descriptors, though, or reinventing them by passing around indices into a C-side array to make FILE pointers json-representable. That's not necessarily a bad thing if we can write multiple modules with the same high-level interface though.)

nicowilliams · 2019-03-08T19:21:34Z

Right, I'd want to use an object like so as a file handle: {kind:..., index:N, verifier:...}. The idea is that it should not be possible to guess a file handle value. The index would be an index into an array -- just like a file descriptor, or, in some cases, actually a file descriptor.

For co-routines the handle namespace would have be shared by all the related jq_states, but not global.

nicowilliams · 2019-03-08T22:13:28Z

Actually, I too dislike the new bindflag I added. I think instead I'll also add a hidden field to inst, and then mark all non-exported functions as hidden after the block_bind_self() step, and block_bind_subblock_inner() would be changed to ignore binders marked hidden.

(We could also not include non-exported functions from a library's defs blocks, but then we'd have to count references to insts via bound_by so that we don't leak in block_free(). I'd rather not do this.)

nicowilliams · 2020-01-25T18:33:33Z

I should point out that this has become a collaboration. @leonid-s-usov isn't just reviewing the code, but producing fixes. For example, in his branch he took my one-opcode-bytecoded-function-inliner and generalized it to support inlining of larger functions and even of functions that have params! That is truly fantastic! (Though, obviously, because of the 16-bit instruction count limit, this has to be used sparingly, and will be.)

nicowilliams · 2020-01-25T21:08:18Z

So the interesting case that UNWINDING (which will get renamed) creates is that we now will have this sort of thing:

lhs | try(stuff | protect(write_footer) | more_stuff; error_handler) | outside

and we want write_footer to run when backtracking normally, when more_stuff raises, and when outside raises, but we don't want error_handler to run if only outside raises. And we also have to consider the possibility that write_footer raises an error itself.

So, PROTECT and protect/1 will re-raise the error it catches (if it catches one) if the protect handler does not itself raise an error. And if the protect handler does raise an error, then PROTECT/protect/1 will let that one take over.

…target

when considering to add RET_JQ at the end of a block. currently two instructions backtrack: BACKTRACK and TAIL_OUT

In case a path gets deleted, we should iterate arrays backwards in path/1 context.

bb010g · 2022-04-09T02:16:14Z

Could the non-I/O parts of this PR be split out to another PR that would be easier to review & get merged?

thaliaarchi · 2024-02-08T00:47:47Z

What's blocking this PR? Does it have unfinished features? Does it have difficult merge conflicts? If it's too large too review, I could take a stab at splitting it into smaller, more focused PRs, via careful git rebase (I've done this kind of thing often).

I'd also suggest an execve API matching the system API. Then the argument parsing could be done by jq and passed raw without shell parsing.

nicowilliams · 2024-02-08T04:37:31Z

What's blocking this PR? Does it have unfinished features? Does it have difficult merge conflicts? If it's too large too review, I could take a stab at splitting it into smaller, more focused PRs, via careful git rebase (I've done this kind of thing often).

Mainly I think we need a careful review of the design (i.e., the signatures and semantics of the new builtins) and the code too, of course. And I need the energy to finish it, or someone who has that energy to step in.

I'd also suggest an execve API matching the system API. Then the argument parsing could be done by jq and passed raw without shell parsing.

Oh yes, that's on my list of wants, and all of posix_spawn(), because I have this silly idea that one could build a shell in jq, and move all the jq command-line options parsing into jq code.

thaliaarchi · 2024-02-08T09:47:25Z

random, randombytes/1, and randomstring/1 seem to be functionally separated from the rest. They don't depend on the IO permission system and seem self-contained. Could that be split off? With your go-ahead, I'd be interested in taking that on.

I'm well familiar with the internals MT19937, the PRNG used by Python and many others. Although it's not cryptographically secure, I have context from its API design.

random/0 produces a random int with 51 bits of precision. IEEE-754 (and thus JavaScript, JSON, and jq) float64 has 53 bits of precision, so the mask remainder bits should be changed from 0x7 to 0x1f. I assume those two missing bits were a typo. Since jq doesn't actually deal with ints, I think this should be renamed to randomint/0.

randombytes/1 looks fine. If jq arrays can have a pre-allocated capacity, that would make it faster, though.

randomstring/1 seems dubious to me. It tries to treat each codepoint as its own random unit, but its generated codepoints are in the wrong range. It takes two bytes as a uint16 from the random buffer and encodes them as a codepoint. This generates codepoints in the range U+0000–U+FFFF. It should really generate in the range U+0000–U+D7FF and U+E000–U+10FFFF, to exclude surrogate halves and include codepoints outside the Basic Multilingual Plane.

I think the API should be:

randfloat: float in the range [0, 1) with 53 bits of precision
randint: non-negative integer in the range [0, 2^53) (the current random/0 with fixes)
randint(max): non-negative integer in the range [0, max)
randint(min; max): integer in the range [min, max)
randstring(len): string of random valid UTF-8 codepoints (the current randomstring/1 with fixes)
randbytes(len): array of random integers in the range [0, 256) (the current randombytes/1)

I think that randfloat would actually be the focal point of the random API, not randint, since jq is float-first for numbers. For reference, Python implements this in random.random, originally sourced from and mt19937ar in genrand_res53. (Note that they combine the uint32 halves with double arithmetic to avoid uint64, since the code is old, but we can just use uint64.)

Would it make sense to have a random API for jq's big decimals? How would that work with gojq?

With it using /dev/urandom, it's more like Go crypto/rand, but I hope to get a nice API like Go math/rand.

Outside of random, I think there's plenty of room to improve the UTF-8 and byte APIs. I've been thinking about this in other contexts, so I have lots of thoughts here. If that would be welcome, I could open an issue.

If splitting random off into a separate PR and polishing it would be welcome, I'd be willing.

thaliaarchi · 2024-02-08T10:01:20Z

Besides random, the rest seems tightly intertwined. eval defers to coeval, which uses coexpressions and requires IO permission. I didn't review the plugin system as closely, so I haven't determined how it's connected.

Why is eval defined in terms of coeval? I see that COEVAL creates a new jq instance, which I would assume to be fairly expensive. Since I assume eval wouldn't need to be concurrent, could it parse the filter expression in the current environment?

Would coexpressions be useful broadly outside working with file handles? It might be able to be isolated from the IO changes. With the amount of new syntax it introduces, I think it would benefit from its own PR, to be able to discuss its syntax and semantics.

nicowilliams · 2024-02-08T16:31:13Z

Besides random, the rest seems tightly intertwined. eval defers to coeval, which uses coexpressions and requires IO permission. I didn't review the plugin system as closely, so I haven't determined how it's connected.

Pretty much. I suppose eval shouldn't need permission.

One thing I've wondered is whether we should try to do a Haskell IO monad like thing where only the main program can "do I/O", and all modules that want to do I/O need to get utility closures from the main program. But... in jq that would just be very unwieldy.

Why is eval defined in terms of coeval? I see that COEVAL creates a new jq instance, which I would assume to be fairly expensive. Since I assume eval wouldn't need to be concurrent, could it parse the filter expression in the current environment?

To eval we need to compile and interpret the program. I took a short-cut and simply re-used the existing compiler and VM machinery, and so eval just... runs that machinery for the given program. And as it happens that's also the easiest way to get co-routines implemented, so the two share this.

Would coexpressions be useful broadly outside working with file handles? It might be able to be isolated from the IO changes. With the amount of new syntax it introduces, I think it would benefit from its own PR, to be able to discuss its syntax and semantics.

Any time you want breadth-first recursive traversal you'll need coexpressions. Long ago when I used Icon I rarely used coexpressions, so they might not be that necessary most of the time.

nicowilliams · 2024-02-08T17:15:54Z

Also, consider some options for implementing eval:

write an interpreter in jq (jqjq style)
compile the given program as usual but link it into the currently running program so as to reuse the existing VM
compile the given program as usual and run it in a new VM
a complete re-write that compiles to native code or something

(1) would be too slow.
(2) is reasonable, but since I was already doing (3) to make co-routines possible, I went with (3).
(4) is a great idea for someone with the time and energy to take it on.

nicowilliams · 2024-02-08T18:44:40Z

random/0 produces a random int with 51 bits of precision. IEEE-754 (and thus JavaScript, JSON, and jq) float64 has 53 bits of precision, so the mask remainder bits should be changed from 0x7 to 0x1f. I assume those two missing bits were a typo. Since jq doesn't actually deal with ints, I think this should be renamed to randomint/0.

Well, jq 1.7 does have something of a bignum feature, so we could make this better. I agree that it should probably be named randomnum/0 or randomint/0.

nicowilliams · 2024-02-08T18:46:03Z

randombytes/1 looks fine. If jq arrays can have a pre-allocated capacity, that would make it faster, though.

It would be good to finish the binary support branch and make randombytes/1 output binary.

randomstring/1 seems dubious to me. It tries to treat each codepoint as its own random unit, but its generated codepoints are in the wrong range. It takes two bytes as a uint16 from the random buffer and encodes them as a codepoint. This generates codepoints in the range U+0000–U+FFFF. It should really generate in the range U+0000–U+D7FF and U+E000–U+10FFFF, to exclude surrogate halves and include codepoints outside the Basic Multilingual Plane.

I agree. I should remove it completely.

nicowilliams · 2024-02-08T18:47:13Z

Outside of random, I think there's plenty of room to improve the UTF-8 and byte APIs. I've been thinking about this in other contexts, so I have lots of thoughts here. If that would be welcome, I could open an issue.

I've a branch that adds binary support :) fq-style.

nicowilliams force-pushed the dlopen branch from 83cc14b to 5b284bb Compare February 26, 2019 02:01

nicowilliams changed the title ~~WIP: add dlopen support~~ Add dlopen support Feb 26, 2019

nicowilliams changed the title ~~Add dlopen support~~ WIP: Add dlopen support Feb 26, 2019

nicowilliams force-pushed the dlopen branch 3 times, most recently from bc8386a to a5cad96 Compare February 26, 2019 17:18

nicowilliams force-pushed the dlopen branch 4 times, most recently from 935d382 to bc4e6a4 Compare March 6, 2019 20:16

nicowilliams changed the title ~~WIP: Add dlopen support~~ Add dlopen support Mar 6, 2019

nicowilliams force-pushed the dlopen branch 3 times, most recently from 68c0890 to a781366 Compare March 7, 2019 05:15

nicowilliams requested a review from wtlangford March 7, 2019 05:15

nicowilliams force-pushed the dlopen branch 2 times, most recently from f49f803 to 2ffa9d6 Compare March 7, 2019 06:27

nicowilliams mentioned this pull request Mar 8, 2019

try/catch catches more than it should #1859

Closed

nicowilliams force-pushed the dlopen branch 2 times, most recently from fb024cc to 46fb339 Compare March 9, 2019 20:16

leonid-s-usov and others added 5 commits January 26, 2020 03:19

WIP: coroutines++: runs but doesn't work

32de6a0

VM: improve debug trace

4f867be

jq_handle_write: fix a jv_free bug

d449484

coexp: ITS ALIVE!

f1f29bd

Add back the lost output builtin

5379ef2

nicowilliams mentioned this pull request Jan 29, 2020

Add random functions #1260

Closed

leonid-s-usov mentioned this pull request Jan 29, 2020

add command inside if in jq #2053

Closed

leonid-s-usov and others added 6 commits January 29, 2020 22:21

block_inline: account for the case when the inlined call is a branch …

b1d67fd

…target

compile: account for all backtracking instructions

48b3a75

when considering to add RET_JQ at the end of a block. currently two instructions backtrack: BACKTRACK and TAIL_OUT

preliminary optimization is the root of all evil

ad5405f

fix I/O policy leak

9fdf955

Fix jqlang#2051 -- iterate arrays backwards in path ctx

9c9013b

In case a path gets deleted, we should iterate arrays backwards in path/1 context.

No more EOF error

fd5dc00

pkoppstein mentioned this pull request Jul 19, 2021

[ER]: capturing the output of a subprocess itchyny/gojq#101

Closed

wader mentioned this pull request Nov 15, 2022

Read file list from file #2499

Open

liquidaty mentioned this pull request Mar 7, 2023

organizing to release JQ 1.6.2 and JQ 1.7: Your suggestions/comments would be appreciated #2550

Closed

itchyny added the feature request label Jun 3, 2023

wader mentioned this pull request Aug 6, 2024

Add builtin to output to file #3153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dlopen(), random, I/O, eval, and co-expressions (co-routines) #1843

dlopen(), random, I/O, eval, and co-expressions (co-routines) #1843

nicowilliams commented Feb 25, 2019 •

edited

Loading

coveralls commented Feb 25, 2019 •

edited

Loading

nicowilliams commented Mar 5, 2019 •

edited

Loading

nicowilliams commented Mar 7, 2019

nicowilliams commented Mar 7, 2019

nicowilliams commented Mar 7, 2019

muhmuhten commented Mar 8, 2019

nicowilliams commented Mar 8, 2019

muhmuhten commented Mar 8, 2019 •

edited

Loading

nicowilliams commented Mar 8, 2019

nicowilliams commented Mar 8, 2019

nicowilliams commented Jan 25, 2020

nicowilliams commented Jan 25, 2020

bb010g commented Apr 9, 2022

thaliaarchi commented Feb 8, 2024

nicowilliams commented Feb 8, 2024

thaliaarchi commented Feb 8, 2024

thaliaarchi commented Feb 8, 2024

nicowilliams commented Feb 8, 2024

nicowilliams commented Feb 8, 2024 •

edited

Loading

nicowilliams commented Feb 8, 2024

nicowilliams commented Feb 8, 2024

nicowilliams commented Feb 8, 2024

dlopen(), random, I/O, eval, and co-expressions (co-routines) #1843

Are you sure you want to change the base?

dlopen(), random, I/O, eval, and co-expressions (co-routines) #1843

Conversation

nicowilliams commented Feb 25, 2019 • edited Loading

coveralls commented Feb 25, 2019 • edited Loading

nicowilliams commented Mar 5, 2019 • edited Loading

nicowilliams commented Mar 7, 2019

nicowilliams commented Mar 7, 2019

nicowilliams commented Mar 7, 2019

muhmuhten commented Mar 8, 2019

nicowilliams commented Mar 8, 2019

muhmuhten commented Mar 8, 2019 • edited Loading

nicowilliams commented Mar 8, 2019

nicowilliams commented Mar 8, 2019

nicowilliams commented Jan 25, 2020

nicowilliams commented Jan 25, 2020

bb010g commented Apr 9, 2022

thaliaarchi commented Feb 8, 2024

nicowilliams commented Feb 8, 2024

thaliaarchi commented Feb 8, 2024

thaliaarchi commented Feb 8, 2024

nicowilliams commented Feb 8, 2024

nicowilliams commented Feb 8, 2024 • edited Loading

nicowilliams commented Feb 8, 2024

nicowilliams commented Feb 8, 2024

nicowilliams commented Feb 8, 2024

nicowilliams commented Feb 25, 2019 •

edited

Loading

coveralls commented Feb 25, 2019 •

edited

Loading

nicowilliams commented Mar 5, 2019 •

edited

Loading

muhmuhten commented Mar 8, 2019 •

edited

Loading

nicowilliams commented Feb 8, 2024 •

edited

Loading