Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error receiving function response after update to latest version #1470

Closed
tuempeltaucher opened this issue Apr 12, 2019 · 29 comments
Closed
Assignees

Comments

@tuempeltaucher
Copy link

Today I updated to the latest version (docker fnproject/fnserver).

Everything is working fine when I'm invoking the function every few seconds, but if I wait roughly 10 seconds I'm getting an error for the first request.

$ fn invoke helloworld helloworld && date
{"message":"Hello World"}
Fri Apr 12 12:52:16

$ fn invoke helloworld helloworld && date
Error invoking function. status: 502 message: error receiving function response
Fri Apr 12 12:52:25

Logs:
https://pastebin.com/pEefeNAs

Another example:

$ fn invoke helloworld helloworld && date
{"message":"Hello World"}
Fri Apr 12 14:58:39 CEST 2019

$ fn invoke helloworld helloworld && date
{"message":"Hello World"}
Fri Apr 12 14:58:40 CEST 2019

$ fn invoke helloworld helloworld && date
{"message":"Hello World"}
Fri Apr 12 14:58:40 CEST 2019

$ fn invoke helloworld helloworld && date
{"message":"Hello World"}
Fri Apr 12 14:58:43 CEST 2019

$ fn invoke helloworld helloworld && date
Error invoking function. status: 502 message: error receiving function response
Fri Apr 12 14:59:05 CEST 2019

$ fn invoke helloworld helloworld && date
{"message":"Hello World"}
Fri Apr 12 14:59:08 CEST 2019

$ fn invoke helloworld helloworld && date
{"message":"Hello World"}

func.js:
const fdk=require('@fnproject/fdk');

fdk.handle(function(input){
let name = 'World';
if (input.name) {
name = input.name;
}
console.log("ljadslfjlsadkfjklsdj");
return {'message': 'Hello ' + name}
})

$ fn version
Client version is latest version: 0.5.74
Server version: 0.3.693

@Toommi
Copy link

Toommi commented Apr 12, 2019

I've experienced exactly the same issue here.

@carimura
Copy link
Member

thanks for the reports. we're looking into this asap.

@denismakogon
Copy link
Member

Hi! Can you please check if it happens only with node.js functions?

Also would like to know what’s your setup is.

@tuempeltaucher
Copy link
Author

We're using the unmodified docker fnserver image in a nomad cluster. But that is nothing fancy and it has been working in production for months.

I can not reproduce this with a java based function.

@denismakogon
Copy link
Member

Just to make sure, can you please try go, Python as well?

@rdallman
Copy link
Contributor

@denismakogon seems like it's a node fdk issue if they can't repro with java, no need to try other fdks.

any info on fdk version? just double checking that was updated, as well. I think we can try to repro from this info and debug, thanks. it looks like the container is exiting and being removed for whatever reason - it could be a race [in fn], but it would probably also happen with other fdks if that was the case. have not seen the 'unknown container' bug yet... hopefully we can repro, we might need to get more logs about what's going on.

@rdallman
Copy link
Contributor

i can't reproduce locally with go fwiw (invoke...wait 22s.... invoke... both work)

@denismakogon
Copy link
Member

Hi, again. Unfortunately I can reproduce as well. But that doesn’t mean we quit solving this problem, so, what we need:

  1. Docker version.
  2. NodeJS runtime version.
  3. Execution logs (please try setting up syslog for your applications, see fn create app -h).
  4. Ideally, We/I would like to see traces from OpenTracing during a function’s execution.

@tuempeltaucher
Copy link
Author

There are no logs. I installed a syslog server since you removed fn get logs...but that didn't work very well either.

I execed into the running fnserver docker image:

/app # docker version
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:05:38 2017
OS/Arch: linux/amd64

Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:12:29 2017
OS/Arch: linux/amd64
Experimental: false

I switched debug level on.

Here is the log from fnserver:
https://pastebin.com/raw/9BVWFsGz

And here is the output of docker events:
https://pastebin.com/raw/xZqq12py

I googled and docker exit code 137 seems to have something to do with the OOM killer. I updated func.yaml to memory:1024. My fnserver has 12 GB Ram (and there is nothing in dmesg).

I had @fnproject/fdk 0.0.11 and now tried 0.0.13.

How can I produce traces?

@denismakogon
Copy link
Member

According to your logs

https://pastebin.com/raw/9BVWFsGz

How can I produce traces?

Since Fn supports OpenTracking through jaeger binding you can collect traces by setting up the following env configuration options: https://github.com/jaegertracing/jaeger-client-go#environment-variables

More about jaeger you may find here: https://www.jaegertracing.io/docs/1.6/

@tuempeltaucher
Copy link
Author

$ fn invoke helloworld helloworld && date
{"message":"Hello World"}
Mon Apr 15 10:30:28 CEST 2019

https://pastebin.com/EVWi9Nvv

$ fn invoke helloworld helloworld && date
Error invoking function. status: 502 message: error receiving function response
Mon Apr 15 10:30:37 CEST 2019

https://pastebin.com/9zqbYkMh

Do you need anything else?

@denismakogon
Copy link
Member

Hm, nothing unusual there which means that issue is in FDK itself (in its underlying http server, probably), need to investigate it.

@tuempeltaucher
Copy link
Author

But it did not happen with an old version of fnserver.

I do not understand why you can not reproduce it. It does happen on my local dev maschine too (ubuntu 18.04).

I'm going to test it with a clean ubuntu vm.

@tuempeltaucher
Copy link
Author

Ok...I can reproduce it on a vanilla ubuntu 18.04...not that hard...

  1. Install Ubuntu 18.04
  2. apt-get install docker.io
  3. curl -LSs https://raw.githubusercontent.com/fnproject/cli/master/install | sh
  4. docker login (docker hub account)
  5. docker run --privileged -p 8080:8080 fnproject/fnserver

new console:

  1. docker ps (look for the container id)
  2. docker exec -it $container_id /bin/sh
  3. docker login

new console:

  1. cd helloworld (app from above)
  2. FN_REGISTRY=$username fn deploy

Updating function helloworld using image $username/helloworld:0.0.15...
root@fnserver-test:/helloworld# fn invoke helloworld helloworld
{"message":"Hello World"}
root@fnserver-test:
/helloworld# fn invoke helloworld helloworld
{"message":"Hello World"}
root@fnserver-test:~/helloworld# fn invoke helloworld helloworld
Error invoking function. status: 502 message: error receiving function response

I can give you a ssh access on this server. You can send me your public key.

@denismakogon
Copy link
Member

docker run --privileged -p 8080:8080 fnproject/fnserver

is that the exact way you run Fn server?

@tuempeltaucher
Copy link
Author

root@fnserver-test:~# history | grep docker
2 apt-get install docker
5 docker run --privileged -p 8080:8080 fnproject/fnserver
6 apt install docker.io
7 docker run --privileged -p 8080:8080 fnproject/fnserver

@denismakogon
Copy link
Member

denismakogon commented Apr 15, 2019

Okay, i see what's going on. You've done an upgrade but probably didn't notice that the whole Fn changed a lot (hopefully, all changes documented).

docker run --privileged -p 8080:8080 fnproject/fnserver

Short story: with the following command you get a dead Fn server that basically doesn't work.

from operation docs https://github.com/fnproject/docs/blob/master/fn/operate/options.md :

docker run --privileged \
  --rm \
  --name fns \
  -it \
  -v $PWD/data:/app/data \
  -v $PWD/data/iofs:/iofs \
  -e "FN_IOFS_DOCKER_PATH=$PWD/data/iofs" \
  -e "FN_IOFS_PATH=/iofs" \
  -p 80:8080 \
  fnproject/fnserver

that's the bare minimum command to run Fn server as a container.

@denismakogon
Copy link
Member

that's why for a single instance of the Fn we strongly recommend to use fn start.

@tuempeltaucher
Copy link
Author

Ok, but that did not change anything.

I reinstalled Ubuntu 18.04, just to be sure.

  1. Install Ubuntu 18.04
  2. apt-get install docker.io
  3. curl -LSs https://raw.githubusercontent.com/fnproject/cli/master/install | sh
  4. docker login (docker hub account)
  5. fn start

root@fnserver-test:/helloworld# FN_REGISTRY=$username fn deploy
root@fnserver-test:
/helloworld# fn invoke helloworld helloworld
{"message":"Hello World"}
root@fnserver-test:/helloworld# fn invoke helloworld helloworld
{"message":"Hello World"}
root@fnserver-test:
/helloworld# fn invoke helloworld helloworld
Error invoking function. status: 502 message: error receiving function response

But the log changed:
https://pastebin.com/70zfHzir

@denismakogon
Copy link
Member

Can you make sure that SElinux is disabled on your host?

@denismakogon
Copy link
Member

Still, can't reproduce on local Fn, ubuntu VM, k8s Fn deployment:

fn invoke node helloworld && date
{"message":"Hello World"}
Mon Apr 15 13:30:42 EEST 2019
$ fn invoke node helloworld && date
{"message":"Hello World"}
Mon Apr 15 13:30:51 EEST 2019
$ fn invoke node helloworld && date
{"message":"Hello World"}
Mon Apr 15 13:30:54 EEST 2019
$ fn invoke node helloworld && date
{"message":"Hello World"}
Mon Apr 15 13:30:56 EEST 2019
$ fn invoke node helloworld && date
{"message":"Hello World"}
Mon Apr 15 13:30:57 EEST 2019
$ fn invoke node helloworld && date
{"message":"Hello World"}
Mon Apr 15 13:30:58 EEST 2019
$ fn invoke node helloworld && date
{"message":"Hello World"}
Mon Apr 15 13:30:59 EEST 2019
$ fn invoke node helloworld && date
{"message":"Hello World"}
Mon Apr 15 13:31:00 EEST 2019
$ fn invoke node helloworld && date
{"message":"Hello World"}
Mon Apr 15 13:31:10 EEST 2019

@tuempeltaucher
Copy link
Author

What version do you have?

I narrowed it down to version 0.3.690.

v0.3.689:
root@fnserver-test:~/helloworld# sh test.sh
{"message":"Hello World"}
{"message":"Hello World"}

v0.3.690
root@fnserver-test:~/helloworld# sh test.sh
{"message":"Hello World"}
Error invoking function. status: 502 message: error receiving function response

test.sh:
#!/bin/sh
fn invoke helloworld helloworld &&
sleep 5 &&
fn invoke helloworld helloworld

@denismakogon
Copy link
Member

Okay, now I see the error appears, true, since 0.3.89 node functions are not running flawlessly. Weird that only node functions affected.
The only changes were introduced to UDS management were a part of ae36053
@rdallman would be good to get some help here.

@rdallman
Copy link
Contributor

thanks for all the info here

I googled and docker exit code 137 seems to have something to do with the OOM killer. I updated func.yaml to memory:1024. My fnserver has 12 GB Ram (and there is nothing in dmesg).

this is an interesting wrinkle. it's also interesting that it's happening on 0.3.690 and not 0.3.689, the traces shouldn't be propagated into the container even when they're turned on, so i'd be surprised if it was that, though it would make sense as those headers could be pretty large - I've tested this and confirmed that I'm not getting the headers. I'm not sure what else would be effected there.

are we working off the theory that this is related to the function hitting oom from the node fdk? I see earlier that memory was raised but it's not clear to me from the comments whether this fixed anything or not?

I'm not sure that traces will prove very useful for this case, i don't think we need to get into that here. the logs are pretty useful (especially with fn start --log-level=debug usually)

I am yet to try to repro with nodejs, I can give this a whirl, also, however my machine notably doesn't throw off 137 when it should (the tests on master fail for me locally), so I am not expecting much if that's what's going on here, which may/may not be useful to figure out.

@rdallman
Copy link
Contributor

rdallman commented Apr 15, 2019

i got a repro with the node fdk from the cli hello world function and fn 0.3.690 just now after doing the wait thing:

INFO[2019-04-13T00:00:31-07:00] starting call                                 action="server.(*Server).handleFnInvokeCall-fm" app_id=01D722GHXY180043RZJ0000001 call_id=01D8AQYSD3180043RZJ000000C container_id=01D8AQXVT6180043RZJ000000A fn_id=01D722GS89180043RZJ0000002
DEBU[2019-04-13T00:00:31-07:00] docker unpause                                app_id=01D722GHXY180043RZJ0000001 call_id=01D8AQXVT6180043RZJ000000A container_id=01D8AQXVT6180043RZJ000000A cpus= fn_id=01D722GS89180043RZJ0000002 idle_timeout=30 image="rdallman/hello:0.0.2" memory=128 stack=Unfreeze
ERRO[2019-04-13T00:00:31-07:00] api error                                     action="server.(*Server).handleFnInvokeCall-fm" code=502 error="error receiving function response" fn_id=01D722GS89180043RZJ0000002
INFO[2019-04-13T00:00:31-07:00] hot function terminating                      app_id=01D722GHXY180043RZJ0000001 container_id=01D8AQXVT6180043RZJ000000A cpus= error="Post http://localhost/call: read unix @->/tmp/iofs/lsnr.sock.tmp: read: connection reset by peer" fn_id=01D722GS89180043RZJ0000002 idle_timeout=30 image="rdallman/hello:0.0.2" memory=128

well, good news to confirm at least. need to get some node logs I think, can turn on debug mode on fdk I think...

usually this error is from the container exiting, which looks like what's going on here, we just need to figure out why the node fdk is exiting

@rdallman
Copy link
Contributor

rdallman commented Apr 16, 2019

I got container stats out of here and only see about 9-10MB of usage after 1 invocation (I wish we made this easier to do... alas). my docker kill event looks like this:

2019-04-13T00:11:57.131178484-07:00 container kill d32e00e8ac1d890679bc5c7b7547039037756b178472001da59002d7fca5e58c (image=rdallman/hello:0.0.2, name=01D8ARJVTG180043RZJ0000006, signal=9) 

^ is when the function invocation fails. if I invoke 'quickly' this also doesn't happen, ie I can run the function in the same container multiple times in a row until I wait.

now that I think about it, I think changing idle to 2 minutes is what did the trick. we're expecting to re-use the connection but the fdk server* has closed it. the node fdk needs to respect the idle timeout.

  • edit: fixed definitions of client/server

@rdallman
Copy link
Contributor

@tuempeltaucher @denismakogon I believe I have posted a fix for this fnproject/fdk-node#26 - see PR for github links to issues, seems like node made a baddie.

@rdallman
Copy link
Contributor

I confirmed this is fixed with 0.0.14 of the node fdk. thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants