Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memleak #3

Closed
thangld322 opened this issue Feb 24, 2020 · 12 comments
Closed

Memleak #3

thangld322 opened this issue Feb 24, 2020 · 12 comments
Labels

Comments

@thangld322
Copy link

Hi,

I used this library to implement a grok filter to parse plaintext logs.

First, I compile the pattern to use later.

        re, err := rure.Compile(rurePattern)
	if err != nil {
		return nil, err
	}
	names := re.CaptureNames()
	gr.rure = re
	gr.names = names

Everytime a logs passes by, I call this function:

func (g *Grok) RureParseTypedCompiled(gr *GRegexp, text string) (map[string]interface{}, error) {
	captures := make(map[string]interface{})
	if !gr.rure.IsMatch(text) {
		return captures, nil
	}
	it := gr.rure.Iter(text)
	caps := gr.rure.NewCaptures()
	it.Next(caps)
	for i, segmentName := range gr.names {
		if segmentName != "" {
			begin, end, ok := caps.Group(i)
			if ok {
				if g.config.RemoveEmptyValues == true && begin == end {
					continue
				}
				captures[segmentName] = text[begin:end]
			}
		}
	}
	return captures, nil
}

After just one minute of running, system's memory is full and the program crashes with this logs:

2020-02-24T05:06:49.482036293Z memory allocation of 85135520 bytes failedSIGABRT: abort
2020-02-24T05:06:49.488995649Z PC=0x7f8a2b223fff m=66 sigcode=18446744073709551610
2020-02-24T05:06:49.489023063Z 
2020-02-24T05:06:49.489036334Z goroutine 0 [idle]:
2020-02-24T05:06:49.489053444Z runtime: unknown pc 0x7f8a2b223fff
2020-02-24T05:06:49.489065969Z stack: frame={sp:0x7f8880d3e9d8, fp:0x0} stack=[0x7f888053f3a8,0x7f8880d3efa8)
2020-02-24T05:06:49.489077871Z 00007f8880d3e8d8:  00007f81e8f19d70  00007f81e8f19d78 
2020-02-24T05:06:49.489089394Z 00007f8880d3e8e8:  00007f8a2b5da8bb  00007f81e8f19a50 
2020-02-24T05:06:49.489101007Z 00007f8880d3e8f8:  0000000000000010  00007f8a2b334f80 
2020-02-24T05:06:49.489112534Z 00007f8880d3e908:  00007f8880d3e988  43c58fd9b1336d0c 
2020-02-24T05:06:49.489123995Z 00007f8880d3e918:  00007f8a2b5e1efa  00007f81e8f19d78 
2020-02-24T05:06:49.489135506Z 00007f8880d3e928:  0000000000000000  00007f81e8f197c0 
2020-02-24T05:06:49.489146943Z 00007f8880d3e938:  00007f81e8f19a50  0000000000000003 
2020-02-24T05:06:49.489159083Z 00007f8880d3e948:  00007f8a2c0251d9  0000000000000000 
2020-02-24T05:06:49.489170429Z 00007f8880d3e958:  00007f8a2b6647e1  0000000000000000 
2020-02-24T05:06:49.489181825Z 00007f8880d3e968:  0000000000000007  0000000000000000 
2020-02-24T05:06:49.489193368Z 00007f8880d3e978:  00007f8a2c0251d9  0000000000000000 
2020-02-24T05:06:49.489204760Z 00007f8880d3e988:  00007f8a2b6647e1  0000000000000001 
2020-02-24T05:06:49.489216462Z 00007f8880d3e998:  00007f8a2b901f48  0000000000000000 
2020-02-24T05:06:49.489227989Z 00007f8880d3e9a8:  000000000000000d  0000000000000000 
2020-02-24T05:06:49.489258227Z 00007f8880d3e9b8:  00007f8880d3eac8  0000000000000001 
2020-02-24T05:06:49.489268684Z 00007f8880d3e9c8:  00007f8a2b901f48  00007f8880d3ea28 
2020-02-24T05:06:49.489278698Z 00007f8880d3e9d8: <00007f8880d3eb48  fffffffe7fffffff 
2020-02-24T05:06:49.489442030Z 00007f8880d3e9e8:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489452214Z 00007f8880d3e9f8:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489462198Z 00007f8880d3ea08:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489472075Z 00007f8880d3ea18:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489482014Z 00007f8880d3ea28:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489491847Z 00007f8880d3ea38:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489501891Z 00007f8880d3ea48:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489511840Z 00007f8880d3ea58:  ffffffffffffffff  00000000051310a0 
2020-02-24T05:06:49.489521773Z 00007f8880d3ea68:  00007f8a2b22542a  0000000000000020 
2020-02-24T05:06:49.489531668Z 00007f8880d3ea78:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489541652Z 00007f8880d3ea88:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489551592Z 00007f8880d3ea98:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489561526Z 00007f8880d3eaa8:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489571482Z 00007f8880d3eab8:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489581413Z 00007f8880d3eac8:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489591405Z runtime: unknown pc 0x7f8a2b223fff
2020-02-24T05:06:49.489605674Z stack: frame={sp:0x7f8880d3e9d8, fp:0x0} stack=[0x7f888053f3a8,0x7f8880d3efa8)
2020-02-24T05:06:49.489616058Z 00007f8880d3e8d8:  00007f81e8f19d70  00007f81e8f19d78 
2020-02-24T05:06:49.489626062Z 00007f8880d3e8e8:  00007f8a2b5da8bb  00007f81e8f19a50 
2020-02-24T05:06:49.489635993Z 00007f8880d3e8f8:  0000000000000010  00007f8a2b334f80 
2020-02-24T05:06:49.489645938Z 00007f8880d3e908:  00007f8880d3e988  43c58fd9b1336d0c 
2020-02-24T05:06:49.489655942Z 00007f8880d3e918:  00007f8a2b5e1efa  00007f81e8f19d78 
2020-02-24T05:06:49.489665851Z 00007f8880d3e928:  0000000000000000  00007f81e8f197c0 
2020-02-24T05:06:49.489675738Z 00007f8880d3e938:  00007f81e8f19a50  0000000000000003 
2020-02-24T05:06:49.489685726Z 00007f8880d3e948:  00007f8a2c0251d9  0000000000000000 
2020-02-24T05:06:49.489695733Z 00007f8880d3e958:  00007f8a2b6647e1  0000000000000000 
2020-02-24T05:06:49.489705618Z 00007f8880d3e968:  0000000000000007  0000000000000000 
2020-02-24T05:06:49.489715566Z 00007f8880d3e978:  00007f8a2c0251d9  0000000000000000 
2020-02-24T05:06:49.489725549Z 00007f8880d3e988:  00007f8a2b6647e1  0000000000000001 
2020-02-24T05:06:49.489745215Z 00007f8880d3e998:  00007f8a2b901f48  0000000000000000 
2020-02-24T05:06:49.489755400Z 00007f8880d3e9a8:  000000000000000d  0000000000000000 
2020-02-24T05:06:49.489765643Z 00007f8880d3e9b8:  00007f8880d3eac8  0000000000000001 
2020-02-24T05:06:49.489776109Z 00007f8880d3e9c8:  00007f8a2b901f48  00007f8880d3ea28 
2020-02-24T05:06:49.489786367Z 00007f8880d3e9d8: <00007f8880d3eb48  fffffffe7fffffff 
2020-02-24T05:06:49.489796656Z 00007f8880d3e9e8:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489806765Z 00007f8880d3e9f8:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489816648Z 00007f8880d3ea08:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489826572Z 00007f8880d3ea18:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489836595Z 00007f8880d3ea28:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489846504Z 00007f8880d3ea38:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489856382Z 00007f8880d3ea48:  ffffffffffffffff  ffffffffffffffff 
2020-02-24T05:06:49.489866270Z 00007f8880d3ea58:  ffffffffffffffff  00000000051310a0 
2020-02-24T05:06:49.489876599Z 00007f8880d3ea68:  00007f8a2b22542a  0000000000000020 
2020-02-24T05:06:49.489886558Z 00007f8880d3ea78:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489896523Z 00007f8880d3ea88:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489906459Z 00007f8880d3ea98:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489916490Z 00007f8880d3eaa8:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489926421Z 00007f8880d3eab8:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489936369Z 00007f8880d3eac8:  0000000000000000  0000000000000000 
2020-02-24T05:06:49.489946360Z 
2020-02-24T05:06:49.489956150Z goroutine 623 [syscall]:
2020-02-24T05:06:49.489966314Z runtime.cgocall(0xaa9470, 0xc002c1ad40, 0xc000000000)
2020-02-24T05:06:49.489976528Z 	/usr/local/go/src/runtime/cgocall.go:128 +0x5b fp=0xc002c1ad10 sp=0xc002c1acd8 pc=0x454ebb

Can you tell me what is wrong please?

@BurntSushi
Copy link
Owner

BurntSushi commented Feb 24, 2020

Without a complete reproducible example (including some input to test it on), I really can't help you.

The most likely explanation is that the freeing logic in these bindings has some problem with it. So you might consider debugging that. This FFI binding isn't terribly complicated.

Another explanation is that you're just holding on to the captures (or regexes). But as I said, it's impossible for me to know whether this is the case because you haven't provided a reproducible example.

@thangld322
Copy link
Author

I wrote a small source code to reproduce the issue:

Directory structure:
.
├── grok
│   └── grok.go
├── main.go

Grok.go
Main.go

RUN

go build .
./grok-wrapper -memprofile memprofile
go tool pprof grok-wrapper memprofile
top10

RESULTS

      flat flat%   sum%        cum   cum%
  524.09kB 33.85% 33.85%   524.09kB 33.85%  time.startTimer
  512.05kB 33.07% 66.93%  1036.14kB 66.93%  main.loop_rure
  512.03kB 33.07%   100%   512.03kB 33.07%  github.com/BurntSushi/rure-go.newIter
         0     0%   100%   512.03kB 33.07%  github.com/BurntSushi/rure-go.(*Regex).Iter
         0     0%   100%   512.03kB 33.07%  github.com/BurntSushi/rure-go.(*Regex).IterBytes
         0     0%   100%   512.03kB 33.07%  grok-wrapper/grok.(*Grok).RureParseTypedCompiled
         0     0%   100%   512.03kB 33.07%  main.loop_rure.func1
         0     0%   100%   524.09kB 33.85%  time.After
         0     0%   100%   524.09kB 33.85%  time.NewTimer

Seems like the problem is the way I set timeout for parsing process:

                go func() {
                        parse()
                }()
                select {
                case parse success:
                        // Parse success before timeout -> Add captures to event as fields
                case timeout:
                        // Timeout: Add tag timeout to the event
                }

Sometimes, it takes too long to parse a text so I need to set a timeout for parse process. If this is the problem, can you suggest me the right way to implement timeout? I need to get all the captured fields if parse success. Thank you!

@thangld322
Copy link
Author

thangld322 commented Feb 25, 2020

I found out that time.After() isn't killed after parse success.
So many time.After() is spawned.
I found a workaround is to spawn short lived time.After():


                                TimeoutLoop:

                                                for i := 0; i < TIMEOUT; i++ {

                                                                select {

                                                                case values := <-tmpValues:

                                                                                // Parse success before timeout -> Add captures to event as fields

                                                                                if values == nil {

                                                                                                fmt.Println("abc")

                                                                                }

                                                                                break TimeoutLoop

                                                                case <-time.After(time.Millisecond):

                                                                                // Timeout: Add tag timeout to the event

                                                                                //fmt.Println("timeout")

                                                                }

                                                }

Will this timeout implement slow down my grok? Performance is a key factor of my program.
Can I pass a timeout config to rure.Iter() function?

Off topic question: Is using Iter() the best way to get all the captured fields? Or is there any parts of RureParseTypedCompiled() that can be improved?

Thank you!

@BurntSushi
Copy link
Owner

I'm sorry, but I don't have the time to debug programs like this. It sounds like this isn't a memory leak with this library, so I'm going to close this issue. I think your question might be better suited for StackOverflow or some other general help forum.

Can I pass a timeout config to rure.Iter() function?

As should be clear from the API docs, no, you cannot.

Is using Iter() the best way to get all the captured fields?

Is it the simplest? Sure. Is it the fastest? No. Because every call to match incurs FFI overhead. Does this matter? I don't know. You need to make that decision for yourself.

If performance is your first and only priority then you probably shouldn't be using Go.

Or is there any parts of RureParseTypedCompiled() that can be improved?

Uh... I mean, you're calling IsMatch and then you're creating an iterator to match again and then you're only reading the first match from the iterator. So yeah, I'd say that could be improved. Just use Regex.Captures.

@thangld322
Copy link
Author

I finally find out that which line makes it memleak: caps := gr.rure.NewCaptures()

This line allocates memory for Captures everytime an event passes by function RureParseTypedCompiled() but it doesn't free the memory after it exits RureParseTypedCompiled()
I can't call NewCaptures() once then use many times since it runs multithread.
Is there any ways to free captures memory before exiting that function?

@BurntSushi
Copy link
Owner

Please provide a minimal program that reproduces the problem.

If you look at the source, NewCaptures creates a finalizer that will free the memory:

rure-go/rure.go

Lines 322 to 327 in 0338b65

runtime.SetFinalizer(caps, func(caps *Captures) {
if caps.p != nil {
C.rure_captures_free(caps.p)
caps.p = nil
}
})

I can't call NewCaptures() once then use many times since it runs multithread.

Sure you can. Just call NewCaptures() once per goroutine and reuse that.

Is there any ways to free captures memory before exiting that function?

Is there any way to free memory used by a Go slice? Nope. Welcome to life with a GC. :-) The only way such things get freed is by not using it and having the GC run.

@thangld322
Copy link
Author

When I run my program on test server, the EPS is very high. Therefore, NewCaptures() is called very fast and continuously.

One way to reproduce this problem is continuously calling NewCaptures() as fast as possible.
https://pastecode.xyz/view/0c067c70#L16

Create once and reuse Captures many times would solve the problem.
https://pastecode.xyz/view/26098eff

It would be much better for me if I can free Captures before quitting RureParseTypedCompiled() because for my usecase, it is possible but very complicated to create once and reuse later. :D

 

@BurntSushi
Copy link
Owner

Unless you can show me evidence of a bug, I'm not making any changes here and I'm not going to add an explicit free function, sorry. If you need deterministic memory management, then don't use a language with a GC.

@thangld322
Copy link
Author

thangld322 commented Mar 4, 2020

https://github.com/thangld322/rure-go-mem-test

Please check out this code. It will reproduce memory problem. The mem usage increases continuously and dramatically.
Thanks!

Code details: I have 20000 nginx access raw plaintext logs and 8 patterns to match these logs.
I compiled each pattern once and appended it to a slice of *rure.Regex to use later.
Finally, I loop over all 20000 logs and call Captures() to see whether it matches any pattern.

@BurntSushi
Copy link
Owner

It looks like this might be a poor interaction between Go's runtime and the thread local storage cache that regex uses internally. Try applying this patch to the regex crate:

diff --git a/regex-capi/Cargo.toml b/regex-capi/Cargo.toml
index 6d9807fe..7e53c0e7 100644
--- a/regex-capi/Cargo.toml
+++ b/regex-capi/Cargo.toml
@@ -18,4 +18,9 @@ crate-type = ["staticlib", "cdylib"]

 [dependencies]
 libc = "0.2"
-regex = { version = "1", path = ".." }
+
+[dependencies.regex]
+version = "1"
+path = ".."
+default-features = false
+features = ["std", "unicode", "perf-dfa", "perf-inline", "perf-literal"]

and then rebuild regex-capi and rebuild your Go program.

@thangld322
Copy link
Author

thangld322 commented Mar 21, 2020

Thank you a lot for this!

cargo build --release --manifest-path ./regex/regex-capi/Cargo.toml
warning: /home/thangld/Gitlab/regex/Cargo.toml: unused manifest key: http
   Compiling regex v1.3.4 (/home/thangld/Gitlab/regex)
   Compiling rure v0.2.1 (/home/thangld/Gitlab/regex/regex-capi)
    Finished release [optimized + debuginfo] target(s) in 17.05s

This is the monitor for memory on the server I run my application:
ruremem3
This patch did fix that memory problem. Now my application can run a full day without crash.
But look like it still uses a lot of RAM and the usage increases slowly and continuously.
Hopefully this will be improved more in the future!

@mjs
Copy link

mjs commented May 13, 2020

I've been bitten by this as well in a situation where rure-go is used to process huge amounts of data each day. In some situations all memory on the host would be consumed, triggering the OOM killer. The patch to disable the perf-cache feature did the trick.

BurntSushi added a commit to rust-lang/regex that referenced this issue Mar 12, 2021
This commit removes the thread_local dependency (even as an optional
dependency) and replaces it with a more purpose driven memory pool. The
comments in src/pool.rs explain this in more detail, but the short story
is that thread_local seems to be at the root of some memory leaks
happening in certain usage scenarios.

The great thing about thread_local though is how fast it is. Using a
simple Mutex<Vec<T>> is easily at least twice as slow. We work around
that a bit by coding a simplistic fast path for the "owner" of a pool.
This does require one new use of `unsafe`, of which we extensively
document.

This now makes the 'perf-cache' feature a no-op. We of course retain it
for compatibility purposes (and perhaps it will be used again in the
future), but for now, we always use the same pool.

As for benchmarks, it is likely that *some* cases will get a hair
slower. But there shouldn't be any dramatic difference. A careful review
of micro-benchmarks in addition to more holistic (albeit ad hoc)
benchmarks via ripgrep seems to confirm this.

Now that we have more explicit control over the memory pool, we also
clean stuff up with repsect to RefUnwindSafe.

Fixes #362, Fixes #576

Ref BurntSushi/rure-go#3
BurntSushi added a commit to rust-lang/regex that referenced this issue Mar 12, 2021
This commit removes the thread_local dependency (even as an optional
dependency) and replaces it with a more purpose driven memory pool. The
comments in src/pool.rs explain this in more detail, but the short story
is that thread_local seems to be at the root of some memory leaks
happening in certain usage scenarios.

The great thing about thread_local though is how fast it is. Using a
simple Mutex<Vec<T>> is easily at least twice as slow. We work around
that a bit by coding a simplistic fast path for the "owner" of a pool.
This does require one new use of `unsafe`, of which we extensively
document.

This now makes the 'perf-cache' feature a no-op. We of course retain it
for compatibility purposes (and perhaps it will be used again in the
future), but for now, we always use the same pool.

As for benchmarks, it is likely that *some* cases will get a hair
slower. But there shouldn't be any dramatic difference. A careful review
of micro-benchmarks in addition to more holistic (albeit ad hoc)
benchmarks via ripgrep seems to confirm this.

Now that we have more explicit control over the memory pool, we also
clean stuff up with repsect to RefUnwindSafe.

Fixes #362, Fixes #576

Ref BurntSushi/rure-go#3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants