Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ER: basic TCO #86

Closed
pkoppstein opened this issue Jul 10, 2021 · 26 comments
Closed

ER: basic TCO #86

pkoppstein opened this issue Jul 10, 2021 · 26 comments

Comments

@pkoppstein
Copy link

pkoppstein commented Jul 10, 2021

Basic TCO support would substantially improve gojq's speed and memory efficiency for cetain jq programs.

This note first considers a simple and direct test of recursion limits (see [*Recursion] below), and then a more practical test involving an optimized form of walk and a well-known large JSON file
https://github.com/ryanwholey/jeopardy_bot/blob/master/JEOPARDY_QUESTIONS1.json

[*Recursion]

for jq in gojq ; do
    echo $jq ::
    /usr/bin/time -lp $jq --argjson max 100000000 -n '
    def zero_arity:
      if . == $max then "completed \($max)"
      else if (. % 100000 == 0) then . else empty end, ((.+1)| zero_arity)
      end;
    1 | zero_arity'
done

In abbreviated form, the output for jq is:

"completed 100000000"
user        90.48
sys          0.24
   1880064  maximum resident set size

For gojq, the program fails to complete in a reasonable amount of time. In fact, it takes many hours to reach 600,000.

Setting max to 100,000 gives these performance indicators:

user 78.83
sys 0.65
47173632 maximum resident set size

[*walk]

#!/bin/bash

for jq in jq gojq ; do
    echo $jq
  /usr/bin/time -lp $jq '
  def walk(f):
  def w:
    if type == "object"
    then . as $in
    | reduce keys[] as $key
        ( {}; . + { ($key):  ($in[$key] | w) } ) | f
    elif type == "array" then map( w ) | f
    else f
    end;
  w;

  walk(if type == "string" then 0 else 1 end)
  ' jeopardy.json > /dev/null

done

Results:

jq
user         6.44
sys          0.11
 226742272  maximum resident set size
gojq
user         9.99
sys          0.85
1861201920  maximum resident set size
@itchyny
Copy link
Owner

itchyny commented Jul 11, 2021

Current TCO implementation of gojq limits the optimization to functions with no local variables. There might be something we can do here.

gojq/compiler.go

Line 1393 in 244f9f7

if c.codes[i].v.([2]int)[1] == 0 {

@itchyny
Copy link
Owner

itchyny commented Jul 16, 2021

I fixed this, thanks for reporting.

@pkoppstein
Copy link
Author

Despite this enhancement, gojq still seems to consume far more memory than one would expect. Compare:

./gojq --version
gojq 0.12.4 (rev: 45af604/go1.16)

/usr/bin/time -lp ./gojq -n 'def f: . as $x | if . == 10000000 then . else (.+1|f) end; 0|f'
10000000
user         3.89
sys          0.08
  10539008  maximum resident set size

$ /usr/bin/time -lp jq -n 'def f: . as $x | if . == 10000000 then . else (.+1|f) end; 0|f'
10000000
user         4.95
sys          0.01
   1859584  maximum resident set size

Is that what you'd expect?

@itchyny
Copy link
Owner

itchyny commented Jul 17, 2021

Reusing the local variable will reduce the linear growth of memory consumption (which is important in ordinary languages) but it cannot always be applied in jq. For example, def f: . as $x | $x, (if $x < 1000000 then $x + 1 | f else empty end), $x; f cannot reuse the stack frame. So the problem here is I need to implement some logic to judge the recursive call is at the last or not.

@itchyny
Copy link
Owner

itchyny commented Jul 17, 2021

Oops, 86ee519 breaks the behavior of the query in my comment...

itchyny added a commit that referenced this issue Jul 17, 2021
#86)"

This reverts commit 86ee519.

This commit causes critical behavior regression on tail call functions.
@itchyny
Copy link
Owner

itchyny commented Jul 17, 2021

The optimization of jq simply changes the return address to avoid the quadratic growth of instructions, and does not care the stack frame growth. On the other hand, the optimization of gojq replaces the call instruction to jump. This is why I limited to functions with no variables, like _while and _until. This is large difference than I thought and rushing to extending the optimization targets causes various queries fail to run correctly. I need more more time to answer to how to handle this.

@pkoppstein
Copy link
Author

Since this issue was closed based on an update that has since been reverted,
it would be helpful at least for some people if either the issue was reopened, or if
some notes about TCO optimization were added to the README page.

@itchyny itchyny reopened this Jul 22, 2021
@itchyny
Copy link
Owner

itchyny commented Jul 22, 2021

I improved the optimization algorithm.

@pkoppstein
Copy link
Author

Yes, a huge improvement, but I was surprised that the zero_arity program still requires more than 4,000 times more memory than jq. This is a bit surprising to me as the recursive function has no local variables or structures at all.

(Given that Go is supposed to be as fast as C for comparable programs, the CPU time difference is also a bit surprising, especially as gojq CPU times are are often less than for jq when memory reclamation is not an issue.)

"completed 100000000"
user       159.04
sys         38.90
7845294080  maximum resident set size

jq ::
"completed 100000000"
user        87.91
sys          0.66
   1859584  maximum resident set size

@itchyny
Copy link
Owner

itchyny commented Jul 23, 2021

The function is already optimized by jump instruction to avoid stack frame allocation. However, there is a fork instruction, and thus it makes the stack still growing. Not sure how jq handles this situation but worth investigating.

itchyny added a commit that referenced this issue Aug 12, 2021
The fork analysis in the tail call optimization, introduced by 97658f3,
is incomplete, and produces critical behavior differences in some recursions.
This commit now again limits tail call using the jump instruction to the
functions with no local variables.
@itchyny
Copy link
Owner

itchyny commented Aug 17, 2021

To summarize, one problem was return address, resolved by the opcallrec instruction implemented by 97658f3 (which also introduced a broken fork analysis later reverted). Another problem, which I noticed today, was variable lookup algorithm (especially for variables outside the recursive functions), resolved by 932c65e. Tail call using opjump instruction is still limited to functions with no local variables (user defined or compiler emitted) just as it is since v0.4.0 (I implemented this optimization for while and until).

@wader
Copy link
Contributor

wader commented Aug 29, 2021

Hello, sorry for commenting on an old issues. I just wanted to get some clarifications about current state of TCO in gojq. If i understand it correctly gojq currently can in some cases optimize stack use from quadratic to linear? but constant is not possible?

If that is the case i'm willing to spend some time working making it constant if that is possible or very hard? in my case i would like to make range/*, repeat/1 and def loop(f): def _r: f | _r; _r; loop(.+1) (pass state to next iteration) constant. Any pointers would be greatly appreciated.

@itchyny
Copy link
Owner

itchyny commented Aug 29, 2021

Tail call using opjump consumes constant number of stacks so while, until, range, repeat run in constant memory. This is as it is since v0.4.0.

@wader
Copy link
Contributor

wader commented Aug 29, 2021

Ok that is good to hear. Maybe i'm doing something wrong or there is some bug. For example if i try this:

$ go run cmd/gojq/main.go -n 'range(100000000) | empty'

I see that the process keeps increasing in memory usage (RSS). I would have expected it increase and then decrease from time to time when the GC runs.

I did some more digging after adding basic pprof code to cli.go (it dumps memory profile if $MEMPROFILE is set):

$ go build -o gojq cmd/gojq/main.go && \
MEMPROFILE=gojq.mem.prof ./gojq -n 'range(10000000) | empty' && \
go tool pprof -http :5555 gojq gojq.mem.prof

In pprof i see this (sample "inuse_space"):
Screen Shot 2021-08-29 at 13 28 56
Screen Shot 2021-08-29 at 13 32 15

If i print env.offset it seems to keep increasing.

Sorry for my probably very basic understand of this but i want i learn and help if possible. Really appreciate that you are taking your time to help.

@itchyny
Copy link
Owner

itchyny commented Aug 29, 2021

Okay, values grows so range does not run in constant memory.

@wader
Copy link
Contributor

wader commented Aug 29, 2021

That is true, but do you think that is the main part of the memory increase?
For example env.offset seems to be constantly 4 for recurse(1) but keeps increasing for range(1000000)

@wader
Copy link
Contributor

wader commented Aug 29, 2021

Interesting, env.offset is constant for recurse(0) but increases for recurse(.+0)

@itchyny
Copy link
Owner

itchyny commented Aug 29, 2021

Here's possible solution to suppress the values growth, but needs more investigation not to introduce different bugs.

diff --git a/execute.go b/execute.go
index 6bba740..afb0cbe 100644
--- a/execute.go
+++ b/execute.go
@@ -228,7 +228,11 @@ loop:
 			if backtrack {
 				break loop
 			}
+			f := env.scopes.index > env.scopes.limit
 			s := env.scopes.pop().(scope)
+			if f {
+				env.offset = s.offset
+			}
 			pc, env.scopes.index = s.pc, s.saveindex
 			if env.scopes.empty() {
 				return env.pop(), true

@wader
Copy link
Contributor

wader commented Aug 29, 2021

Thanks, i tried the patch and it seems to have fixed most of the memory increase issue. The allocations done by make in case opscope: looks reasonable now. Still there seems to be quite a lot of allocations done by *stack.push() that seems to not get GC:ed.

gojq master with offset patch and pprof (https://gist.github.com/wader/87b034dbc986b4d50c8ad5f41c39d2b7 and a defer maybeProfile()() in *cli.run())

go build -o gojq cmd/gojq/main.go && MEMPROFILE=gojq.mem.prof ./gojq -n 'range(20000000) | empty' && go tool pprof -http :5555 gojq gojq.mem.prof

Screen Shot 2021-08-30 at 00 22 26

@wader
Copy link
Contributor

wader commented Aug 30, 2021

Nice, thanks! the script where i originally noticed this seems to still increase in memory usage. I managed to minimize it down to this: recurse(def f($a): 1; f(1)). Without argument it is constant but with argument it is increasing. Is that expected with current TCO implementation? I see same increase using repeat/1 and while/2 also.

@itchyny
Copy link
Owner

itchyny commented Aug 30, 2021

How about now? The problem is not in tail call optimization, anymore. Maybe I need to stop somewhere, the code complexity is growing. I'll cut a new tag soon.

@wader
Copy link
Contributor

wader commented Aug 30, 2021

Not much difference. I think the issue is still env.offset, with recurse(def f($a): 1; f(1)) it keeps increasing trigger the if env.offset > len(env.values) { ... make(...) ... } allocation i think. But i understand it gets complicated, i'm trying myself to wrap my head around the current TCO code so that i can help, making some progress. It would be nice if recurse/1, repeat/1 and while/2 (and until/2?) could be constant in memory, if i understand the jq documentation that is what jq can TCO?

@wader
Copy link
Contributor

wader commented Aug 30, 2021

jq -n 'recurse(def f($a): 1; f(1))' seems to use constant memory, RSS is constant. TCO requirement for jq seems to be call must be last and have no arguments jqlang/jq#437

@wader
Copy link
Contributor

wader commented Aug 30, 2021

I'm trying to understand the code related to opscope/opret and the various stacks. I did some tests to see how env.offset changes during calls and noticed this:

How come that after the first f(1) call env.offset it not back to 3 again? i'm failing to understand the conditions in popscope() for when offset is restored.

$ GOJQ_DEBUG=1 go run -tags debug cmd/gojq/main.go -n 'def f($v): 1; f(1) | f(1)'
	0	scope                    [0,3,0]
	1	store                    [0,0]	## $ARGS
	2	jump                     13
	3	scope                    [1,3,1]	## f
	4	store                    [1,0]
	5	store                    [1,1]	## v
	6	load                     [1,0]
	7	load                     [1,1]
	8	callpc                   null
	9	store                    [1,2]	## $v
	10	load                     [1,0]
	11	const                    1
	12	ret                      null	## end of f
	13	store                    [0,1]
	14	jump                     18
	15	scope                    [2,0,0]	## lambda:15
	16	const                    1
	17	ret                      null	## end of lambda:15
	18	pushpc                   15
	19	load                     [0,1]
	20	call                     3	## call f
	21	store                    [0,2]
	22	jump                     26
	23	scope                    [3,0,0]	## lambda:23
	24	const                    1
	25	ret                      null	## end of lambda:23
	26	pushpc                   23
	27	load                     [0,2]
	28	call                     3	## call f
	29	ret                      null
	----------------------------------------+
	0	scope                    [0,3,0]	|	null	{"named":{}}
2021/08/31 00:32:54 opscope start: env.offset=0
2021/08/31 00:32:54 opscope end: env.offset=3
	1	store                    [0,0]	|	null	{"named":{}}			## $ARGS
	2	jump                     13	|	null
	13	store                    [0,1]	|	null
	14	jump                     18	|
	18	pushpc                   15	|
	19	load                     [0,1]	|	[15,0]
	20	call                     3	|	[15,0]	null			## call f
	3	scope                    [1,3,1]	|	[15,0]	null			## f
2021/08/31 00:32:54 opscope start: env.offset=3
2021/08/31 00:32:54 opscope end: env.offset=6
	4	store                    [1,0]	|	[15,0]	null
	5	store                    [1,1]	|	[15,0]			## v
	6	load                     [1,0]	|
	7	load                     [1,1]	|	null
	8	callpc                   null	|	null	[15,0]
	15	scope                    [2,0,0]	|	null			## lambda:15
2021/08/31 00:32:54 opscope start: env.offset=6
2021/08/31 00:32:54 opscope end: env.offset=6
	16	const                    1	|	null
	17	ret                      null	|	1			## end of lambda:15
2021/08/31 00:32:54 opret start: env.offset=6
2021/08/31 00:32:54 opret end: env.offset=6
	9	store                    [1,2]	|	1			## $v
	10	load                     [1,0]	|
	11	const                    1	|	null
	12	ret                      null	|	1			## end of f
2021/08/31 00:32:54 opret start: env.offset=6
2021/08/31 00:32:54 opret end: env.offset=6
	21	store                    [0,2]	|	1
	22	jump                     26	|
	26	pushpc                   23	|
	27	load                     [0,2]	|	[23,0]
	28	call                     3	|	[23,0]	1			## call f
	3	scope                    [1,3,1]	|	[23,0]	1			## f
2021/08/31 00:32:54 opscope start: env.offset=6
2021/08/31 00:32:54 opscope end: env.offset=9
	4	store                    [1,0]	|	[23,0]	1
	5	store                    [1,1]	|	[23,0]			## v
	6	load                     [1,0]	|
	7	load                     [1,1]	|	1
	8	callpc                   null	|	1	[23,0]
	23	scope                    [3,0,0]	|	1			## lambda:23
2021/08/31 00:32:54 opscope start: env.offset=9
2021/08/31 00:32:54 opscope end: env.offset=9
	24	const                    1	|	1
	25	ret                      null	|	1			## end of lambda:23
2021/08/31 00:32:54 opret start: env.offset=9
2021/08/31 00:32:54 opret end: env.offset=9
	9	store                    [1,2]	|	1			## $v
	10	load                     [1,0]	|
	11	const                    1	|	1
	12	ret                      null	|	1			## end of f
2021/08/31 00:32:54 opret start: env.offset=9
2021/08/31 00:32:54 opret end: env.offset=9
	29	ret                      null	|	1
2021/08/31 00:32:54 opret start: env.offset=9
2021/08/31 00:32:54 opret end: env.offset=9
1
	29	ret <backtrack>          null	|

@itchyny
Copy link
Owner

itchyny commented Sep 10, 2021

I understand the issue exists in calling the filter argument, and it needs be fix some day. It's not an issue of tail call, and I think there's no critical issues on tail call optimization (the zero_arity query runs in comparable duration as jq), but it's OK to continue discussion here. The problem is that updating the stack limit is actually needed on switching to the frame calling argument, so basic strategy should be saving (well, saving to limit is already done, but not used) the stack limit and restoring on appropriate timing. Just restoring the stack limit on opret breaks execution of forks. I don't have clue to the correct solution yet, but hopefully I'll come up with it just like the other fixes for tail call optimization.

@wader
Copy link
Contributor

wader commented Sep 11, 2021

👍 I would like to know more about jq:s stack model so while learning more about i will keep this issue in mind. Does jq:s stack implementation differ a lot compared to gojq:s in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants