Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Segfault with net.http.get_text compiled in gcc 13.2.1 #20506

Closed
mamoss-oss opened this issue Jan 12, 2024 · 7 comments · Fixed by #20660
Closed

Intermittent Segfault with net.http.get_text compiled in gcc 13.2.1 #20506

mamoss-oss opened this issue Jan 12, 2024 · 7 comments · Fixed by #20660
Labels
Bug This tag is applied to issues which reports bugs. Compiler: GCC Bugs/feature requests, that are related to compiler GCC.

Comments

@mamoss-oss
Copy link

mamoss-oss commented Jan 12, 2024

Describe the bug

Running the code example compiled with -prod -cc gcc results in intermittent segfaults on OpenSuse Tumbleweed. No segfault when compiling with clang or tcc.

Reproduction Steps

module main

import net.http

fn main() {
        println(http.get_text('https://www.vlang.io'))
}
v -prod -cc gcc .
./get_test 1> /dev/null

Expected Behavior

No Segfault

Current Behavior

./get_test 1> /dev/null
signal 11: segmentation fault
                                                        | 0x7f484f288e8c | /lib64/libc.so.6(_IO_default_xsputn+0x9e) 
                                                        | 0x7f484f287882 | /lib64/libc.so.6(_IO_file_xsputn+0x144) 
                                                        | 0x7f484f27a852 | /lib64/libc.so.6(_IO_fwrite+0xc4) 
                                                        |       0x481493 | ./get_test() 
                                                        |       0x4026b0 | ./get_test() 
                                                        | 0x7f484f2281b0 | /lib64/libc.so.6(+0x281b0) 
                                                        | 0x7f484f228279 | /lib64/libc.so.6(__libc_start_main+0x8b) 
/home/abuild/rpmbuild/BUILD/glibc-2.38/csu/../sysdeps/x86_64/start.S:117: |       0x402725 | ./get_test() 

Possible Solution

No response

Additional Information/Context

No response

V version

V 0.4.4 0c4611f

Environment details (OS name and version, etc.)

V full version: V 0.4.4 4640627.0c4611f
OS: linux, Linux version 6.6.10-1-default (geeko@buildhost) (gcc (SUSE Linux) 13.2.1 20231130 [revision 741743c028dc00f27b9c8b1d5211c1f602f2fddd], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.41.0.20230908-1) #1 SMP PREEMPT_DYNAMIC Mon Jan 8 08:58:39 UTC 2024 (e04388e)
Processor: 32 cpus, 64bit, little endian, AMD Ryzen 9 7950X3D 16-Core Processor

getwd: /home/mario/code/v/get_test
vexe: /home/mario/bin/v/v
vexe mtime: 2024-01-12 07:40:07

vroot: OK, value: /home/mario/bin/v
VMODULES: OK, value: /home/mario/.vmodules
VTMP: OK, value: /tmp/v_1000

Git version: git version 2.43.0
Git vroot status: 0c4611f
.git/config present: true

CC version: cc (SUSE Linux) 13.2.1 20231130 [revision 741743c028dc00f27b9c8b1d5211c1f602f2fddd]
thirdparty/tcc status: thirdparty-linux-amd64 12f392c3

Note

You can use the 👍 reaction to increase the issue's priority for developers.

Please note that only the 👍 reaction to the issue itself counts as a vote.
Other reactions and those to comments will not be taken into account.

@mamoss-oss mamoss-oss added the Bug This tag is applied to issues which reports bugs. label Jan 12, 2024
@shove70
Copy link
Contributor

shove70 commented Jan 12, 2024

[root@672dc452a866 123]# cat > a.v
module main

import net.http

fn main() {
	println(http.get_text('https://www.vlang.io'))
}

[root@672dc452a866 123]# v -prod -cc gcc a.v
[root@672dc452a866 123]# ./a 1> /dev/null

[root@672dc452a866 123]# gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[root@672dc452a866 123]# uname -a
Linux 672dc452a866 6.5.11-linuxkit #1 SMP PREEMPT_DYNAMIC Wed Dec  6 17:14:50 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Not reproduced, possibly because some factor triggered an array out of bounds

@mamoss-oss
Copy link
Author

Issue reproduced on Arch Linux with gcc 13.2.1.
Not reproduced in the vlang dockerhub container with gcc 12.2.1.
I will try if I can reproduce it in some different environments.

./test 1> /dev/null
signal 11: segmentation fault
                                                        | 0x7fae60bd03ed | /usr/lib/libc.so.6(_IO_default_xsputn+0x9d) 
                                                        | 0x7fae60bcef02 | /usr/lib/libc.so.6(_IO_file_xsputn+0x142) 
                                                        | 0x7fae60bc2ad2 | /usr/lib/libc.so.6(_IO_fwrite+0xc2) 
                                                        | 0x5557b3e569a3 | ./test(+0x7f9a3) 
                                                        | 0x5557b3de040e | ./test(+0x940e) 
                                                        | 0x7fae60b71cd0 | /usr/lib/libc.so.6(+0x27cd0) 
                                                        | 0x7fae60b71d8a | /usr/lib/libc.so.6(__libc_start_main+0x8a) 
                                                        | 0x5557b3de0455 | ./test(+0x9455) 

v version
V 0.4.4 0c4611f

uname -a
Linux n100 6.6.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 05 Jan 2024 16:20:41 +0000 x86_64 GNU/Linux

gcc --version
gcc (GCC) 13.2.1 20230801
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@mamoss-oss
Copy link
Author

I hope this can be used to reproduce the bug.

Dockerfile

FROM fedora:39
WORKDIR /
RUN dnf install -y git gcc libatomic
RUN git clone --depth 1 https://github.com/vlang/v
WORKDIR /v
RUN make
RUN mkdir /app
WORKDIR /app
COPY <<"EOT" /app/app.v
module main
import net.http
fn main() {
        println(http.get_text('https://www.vlang.io'))
}
EOT
RUN /v/v -prod -cc gcc -o app app.v
CMD /app/app 1> /dev/null;echo ;/v/v version;echo ;gcc --version
docker build -t v_test .
docker run --rm v_test

Reproduce the issue

The issue is intermittent, but I see overall more failed runs then successful runs.

# Failed

docker run --rm v_test  
signal 11: segmentation fault
                                                        | 0x7ff214991fac | /lib64/libc.so.6(_IO_default_xsputn+0xac) 
                                                        | 0x7ff214990a77 | /lib64/libc.so.6(_IO_file_xsputn+0x117) 
                                                        | 0x7ff2149841e2 | /lib64/libc.so.6(_IO_fwrite+0xd2) 
                                                        |       0x481473 | /app/app() 
                                                        |       0x4026b0 | /app/app() 
                                                        | 0x7ff21493214a | /lib64/libc.so.6(+0x2814a) 
                                                        | 0x7ff21493220b | /lib64/libc.so.6(__libc_start_main+0x8b) 
                                                        |       0x402715 | /app/app() 

V 0.4.4 0c4611f

gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

# Working

docker run --rm v_test

V 0.4.4 0c4611f

gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@mamoss-oss mamoss-oss changed the title Intermittent Segfault with net.http.get_text compiled in gcc Intermittent Segfault with net.http.get_text compiled in gcc 13.2.1 Jan 12, 2024
@felipensp felipensp added the Compiler: GCC Bugs/feature requests, that are related to compiler GCC. label Jan 13, 2024
@GGRei
Copy link
Contributor

GGRei commented Jan 24, 2024

After observing the problem on my side for a while :

Tested on : Ubuntu 23.10 with GCC 13.2.0

For my part, I only notice the problem when using the compiler optimization at -O3; below that, there are no issues. This suggests that GCC's '-O3' optimization might sometimes lead to changes in memory management and can either result in the segmentation fault posted above (panic on a malloc_noscan) or no error at all, but the println in the user code will return nil.

In taking a closer look at the 'http.get_text()' function and many of the functions that it calls, I noticed that a 'Response' instance is returned, and from this instance, a string ('Response.body') is extracted.
This string is what is returned to the user's code, and that's where the problem begins.

The 'Response' instance is destroyed at the end of 'http.get_text()' execution, and I suspect this leads to the memory of all its fields being freed. Unfortunately, 'Response.Body' is no exception and gets released, causing either a segmentation fault or returning a nil value by the time execution reaches the user's code, depending on the situation.

Could it be that in certain specific scenarios, aggressive optimization like -O3 can lead to complex memory management issues? It's possible.

To continue my testing, I then cloned the function's return:

pub fn get_text(url string) string {
    resp := fetch(url: url, method: .get) or { return '' }
    return resp.body.clone()
}

Of course, this resolves the issue. Using 'autofree' as a compilation parameter also solves the problem. (I believe 'autofree' does exactly what I manually did in my tests? Cloning a string to ensure it does not point to freed memory).

Additionally, using the 'noinline' attribute on the 'http.fetch()' function allows us to 'circumvent' the problem since it forces GCC to reorganize its optimization and treat 'http.fetch()' as a distinct function to be called. This results in a different management of memory and the lifespan of objects.

To conclude, I am still too new to the Vlang community and to understanding its compiler to say what the most effective response should be. I believe that the solution should not involve direct intervention with GCC, but rather an adaptation of Vlang's or module with the memory management in the context of aggressive GCC optimization.

This is just my opinion, and I hope that my comment will help in resolving the issue.

### Edit 01-26-2024 : After some performance tests conducted by spytheman and Casper64, it was determined that the solution with the least negative impact for temporary correcting this bug is to use the 'noinline' attribute for the time being, until a more permanent solution is found.

@spytheman
Copy link
Member

I could reproduce it with gcc 13.2.0 on Windows too.

Tried various other versions of gcc and clang on Linux, FreeBSD and macos, but I could not with them.

@spytheman
Copy link
Member

This fixes it for me on windows with gcc 13.2.0, and -prod:

#0 14:32:09 ᛋ xvweb_fixes /d/programs/v❱git diff
diff --git a/vlib/net/http/response.v b/vlib/net/http/response.v
index 6ec143c1d..900aca602 100644
--- a/vlib/net/http/response.v
+++ b/vlib/net/http/response.v
@@ -7,6 +7,7 @@ import net.http.chunked
 import strconv

 // Response represents the result of the request
+@[heap]
 pub struct Response {
 pub mut:
        body         string

@GGRei
Copy link
Contributor

GGRei commented Jan 25, 2024

I confirm the fix on ubuntu 23.10 with GCC 13.2.0 and -prod. Good job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug This tag is applied to issues which reports bugs. Compiler: GCC Bugs/feature requests, that are related to compiler GCC.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants