Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test: TestFullHandshake #6294

Closed
arvindbr8 opened this issue May 17, 2023 · 9 comments
Closed

Flaky Test: TestFullHandshake #6294

arvindbr8 opened this issue May 17, 2023 · 9 comments
Assignees

Comments

@arvindbr8
Copy link
Member

Saw this flake recently on a presubmit run. A re-run made it go away. Opening this issue to track if this comes up again.

--- FAIL: Test/FullHandshake (20.12s)
        tlogger.go:116: INFO server.go:629 [core] [Server #1] Server created  (t=+20.193184ms)
        tlogger.go:116: INFO server.go:817 [core] [Server #1 ListenSocket #2] ListenSocket created  (t=+23.45913ms)
        tlogger.go:116: INFO server.go:629 [core] [Server #3] Server created  (t=+25.477758ms)
        tlogger.go:116: INFO server.go:817 [core] [Server #3 ListenSocket #4] ListenSocket created  (t=+25.974065ms)
        tlogger.go:116: INFO clientconn.go:177 [core] [Channel #5] Channel created  (t=+27.240883ms)
        tlogger.go:116: INFO clientconn.go:1628 [core] [Channel #5] original dial target is: "127.0.0.1:34515"  (t=+28.171396ms)
        tlogger.go:116: INFO clientconn.go:1633 [core] [Channel #5] dial target "127.0.0.1:34515" parse failed: parse "127.0.0.1:34515": first path segment in URL cannot contain colon  (t=+28.847306ms)
        tlogger.go:116: INFO clientconn.go:1649 [core] [Channel #5] fallback to scheme "passthrough"  (t=+29.002208ms)
        tlogger.go:116: INFO clientconn.go:1657 [core] [Channel #5] parsed dial target is: {Scheme:passthrough Authority: URL:{Scheme:passthrough Opaque: User: Host: Path:/127.0.0.1:34515 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+30.920035ms)
        tlogger.go:116: INFO clientconn.go:1735 [core] [Channel #5] Channel authority set to "127.0.0.1:34515"  (t=+31.404241ms)
        tlogger.go:116: INFO resolver_conn_wrapper.go:192 [core] [Channel #5] Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "127.0.0.1:34515",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Type": 0,
                  "Metadata": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+37.599329ms)
        tlogger.go:116: INFO balancer_conn_wrappers.go:165 [core] [Channel #5] Channel switches to new LB policy "pick_first"  (t=+39.878261ms)
        tlogger.go:116: INFO clientconn.go:760 [core] [Channel #5 SubChannel #6] Subchannel created  (t=+41.824788ms)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to CONNECTING  (t=+42.595399ms)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to CONNECTING  (t=+43.230008ms)
        tlogger.go:116: INFO clientconn.go:1260 [core] [Channel #5 SubChannel #6] Subchannel picks a new address "127.0.0.1:34515" to connect  (t=+43.988118ms)
        tlogger.go:116: INFO clientconn.go:177 [core] [Channel #7] Channel created  (t=+49.337894ms)
        tlogger.go:116: INFO clientconn.go:1628 [core] [Channel #7] original dial target is: "127.0.0.1:39445"  (t=+50.690513ms)
        tlogger.go:116: INFO clientconn.go:1633 [core] [Channel #7] dial target "127.0.0.1:39445" parse failed: parse "127.0.0.1:39445": first path segment in URL cannot contain colon  (t=+51.101518ms)
        tlogger.go:116: INFO clientconn.go:1649 [core] [Channel #7] fallback to scheme "passthrough"  (t=+51.21942ms)
        tlogger.go:116: INFO clientconn.go:1657 [core] [Channel #7] parsed dial target is: {Scheme:passthrough Authority: URL:{Scheme:passthrough Opaque: User: Host: Path:/127.0.0.1:39445 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+51.514424ms)
        tlogger.go:116: INFO clientconn.go:1735 [core] [Channel #7] Channel authority set to "127.0.0.1:39445"  (t=+51.660626ms)
        tlogger.go:116: INFO resolver_conn_wrapper.go:192 [core] [Channel #7] Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "127.0.0.1:39445",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Type": 0,
                  "Metadata": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+56.711497ms)
        tlogger.go:116: INFO balancer_conn_wrappers.go:165 [core] [Channel #7] Channel switches to new LB policy "pick_first"  (t=+57.075602ms)
        tlogger.go:116: INFO clientconn.go:760 [core] [Channel #7 SubChannel #8] Subchannel created  (t=+57.505509ms)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #7] Channel Connectivity change to CONNECTING  (t=+57.71[79](https://github.com/grpc/grpc-go/actions/runs/4988867731/jobs/8932073496#step:8:80)12ms)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #7 SubChannel #8] Subchannel Connectivity change to CONNECTING  (t=+61.245961ms)
        tlogger.go:116: INFO clientconn.go:1260 [core] [Channel #7 SubChannel #8] Subchannel picks a new address "127.0.0.1:39445" to connect  (t=+61.760068ms)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #7 SubChannel #8] Subchannel Connectivity change to READY  (t=+81.970553ms)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #7] Channel Connectivity change to READY  (t=+82.49296ms)
        tlogger.go:116: INFO server.go:932 [core] [Server #3] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:58176\") failed: rpc error: code = Unknown desc = processNext failure: unexpected in bytes: got: [67 108 105 101 110 116 70 105 110 105 115 104 101 100 44 0 0 0 6 0 0 0 224 132 120 35 97 38 210 56 213 235 167 222 29 147 103 49 0 227 247 52 40 168 203 65 87 134 168 121 61 133 108 52 235 103 208 244 154 79 210 166 29 0 0 0 6 0 0 0 113 90 216 11 71 147 44 214 233 189 226 211 166 249 231 235 195 168 133 179 109 161 147 152 87], want: [67 108 105 101 110 116 70 105 110 105 115 104 101 100]"  (t=+151.47433ms)
        tlogger.go:116: WARNING clientconn.go:1321 [core] [Channel #5 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:34515", ServerName: "127.0.0.1:34515", }. Err: connection error: desc = "error reading server preface: EOF"  (t=+154.133467ms)
        tlogger.go:116: INFO clientconn.go:1148 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "error reading server preface: EOF"  (t=+154.753376ms)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to TRANSIENT_FAILURE  (t=+155.00218ms)
        tlogger.go:116: INFO clientconn.go:1148 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to IDLE, last error: connection error: desc = "error reading server preface: EOF"  (t=+1.15557845s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to IDLE  (t=+1.156143158s)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to CONNECTING  (t=+1.156450862s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to CONNECTING  (t=+1.156[80](https://github.com/grpc/grpc-go/actions/runs/4988867731/jobs/8932073496#step:8:81)3967s)
        tlogger.go:116: INFO clientconn.go:1260 [core] [Channel #5 SubChannel #6] Subchannel picks a new address "127.0.0.1:34515" to connect  (t=+1.157490577s)
        tlogger.go:116: INFO server.go:932 [core] [Server #3] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:5[81](https://github.com/grpc/grpc-go/actions/runs/4988867731/jobs/8932073496#step:8:82)90\") failed: rpc error: code = Unknown desc = processNext failure: unexpected in bytes: got: [67 108 105 101 110 116 70 105 110 105 115 104 101 100 44 0 0 0 6 0 0 0 224 132 120 35 97 38 210 56 213 235 167 222 29 147 103 49 0 227 247 52 40 168 203 65 87 134 168 121 61 133 108 52 235 103 208 244 154 79 210 166 29 0 0 0 6 0 0 0 113 90 216 11 71 147 44 214 233 189 226 211 166 249 231 235 195 168 133 179 109 161 147 152 87], want: [67 108 105 101 110 116 70 105 110 105 115 104 101 100]"  (t=+1.163057155s)
        tlogger.go:116: WARNING clientconn.go:1321 [core] [Channel #5 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:34515", ServerName: "127.0.0.1:34515", }. Err: connection error: desc = "error reading server preface: EOF"  (t=+1.163527261s)
        tlogger.go:116: INFO clientconn.go:1148 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "error reading server preface: EOF"  (t=+1.163651663s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to TRANSIENT_FAILURE  (t=+1.163792865s)
        tlogger.go:116: INFO clientconn.go:1148 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to IDLE, last error: connection error: desc = "error reading server preface: EOF"  (t=+2.745313905s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to IDLE  (t=+2.745486907s)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to CONNECTING  (t=+2.750087372s)
        tlogger.go:116: INFO clientconn.go:1260 [core] [Channel #5 SubChannel #6] Subchannel picks a new address "127.0.0.1:34515" to connect  (t=+2.750340575s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to CONNECTING  (t=+2.750[82](https://github.com/grpc/grpc-go/actions/runs/4988867731/jobs/8932073496#step:8:83)1882s)
        tlogger.go:116: INFO server.go:932 [core] [Server #3] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:58206\") failed: rpc error: code = Unknown desc = processNext failure: unexpected in bytes: got: [67 108 105 101 110 116 70 105 110 105 115 104 101 100 44 0 0 0 6 0 0 0 224 132 120 35 97 38 210 56 213 235 167 222 29 147 103 49 0 227 247 52 40 168 203 65 87 134 168 121 61 133 108 52 235 103 208 244 154 79 210 166 29 0 0 0 6 0 0 0 113 90 216 11 71 147 44 214 233 189 226 211 166 249 231 235 195 168 133 179 109 161 147 152 87], want: [67 108 105 101 110 116 70 105 110 105 115 104 101 100]"  (t=+2.756089656s)
        tlogger.go:116: WARNING clientconn.go:1321 [core] [Channel #5 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:34515", ServerName: "127.0.0.1:34515", }. Err: connection error: desc = "error reading server preface: EOF"  (t=+2.756556863s)
        tlogger.go:116: INFO clientconn.go:1148 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "error reading server preface: EOF"  (t=+2.756684965s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to TRANSIENT_FAILURE  (t=+2.756800666s)
        tlogger.go:116: INFO clientconn.go:1148 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to IDLE, last error: connection error: desc = "error reading server preface: EOF"  (t=+5.630322174s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to IDLE  (t=+5.630603778s)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to CONNECTING  (t=+5.63860729s)
        tlogger.go:116: INFO clientconn.go:1260 [core] [Channel #5 SubChannel #6] Subchannel picks a new address "127.0.0.1:34515" to connect  (t=+5.638802793s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to CONNECTING  (t=+5.638978996s)
        tlogger.go:116: INFO server.go:932 [core] [Server #3] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:58220\") failed: rpc error: code = Unknown desc = processNext failure: unexpected in bytes: got: [67 108 105 101 110 116 70 105 110 105 115 104 101 100 44 0 0 0 6 0 0 0 224 132 120 35 97 38 210 56 213 235 167 222 29 147 103 49 0 227 247 52 40 168 203 65 87 134 168 121 61 133 108 52 235 103 208 244 154 79 210 166 29 0 0 0 6 0 0 0 113 90 216 11 71 147 44 214 233 189 226 211 166 249 231 235 195 168 133 179 109 161 147 152 87], want: [67 108 105 101 110 116 70 105 110 105 115 104 101 100]"  (t=+5.643249056s)
        tlogger.go:116: WARNING clientconn.go:1321 [core] [Channel #5 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:34515", ServerName: "127.0.0.1:34515", }. Err: connection error: desc = "error reading server preface: EOF"  (t=+5.643613961s)
        tlogger.go:116: INFO clientconn.go:1148 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "error reading server preface: EOF"  (t=+5.643856564s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to TRANSIENT_FAILURE  (t=+5.643987266s)
        tlogger.go:116: INFO clientconn.go:1148 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to IDLE, last error: connection error: desc = "error reading server preface: EOF"  (t=+10.047904767s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to IDLE  (t=+10.04809967s)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to CONNECTING  (t=+10.053256342s)
        tlogger.go:116: INFO clientconn.go:1260 [core] [Channel #5 SubChannel #6] Subchannel picks a new address "127.0.0.1:34515" to connect  (t=+10.053455945s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to CONNECTING  (t=+10.053603947s)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to READY  (t=+10.060843349s)
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to READY  (t=+10.061084552s)
        alts_test.go:356: c.UnaryCall() failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
        tlogger.go:116: INFO clientconn.go:451 [core] [Channel #5] Channel Connectivity change to SHUTDOWN  (t=+10.062399471s)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #5 SubChannel #6] Subchannel Connectivity change to SHUTDOWN  (t=+10.064078995s)
        tlogger.go:116: INFO clientconn.go:1496 [core] [Channel #5 SubChannel #6] Subchannel deleted  (t=+10.064369099s)
        tlogger.go:116: INFO clientconn.go:1096 [core] [Channel #5] Channel deleted  (t=+10.064527001s)
        tlogger.go:116: INFO server.go:768 [core] [Server #3 ListenSocket #4] ListenSocket deleted  (t=+10.066885534s)
        tlogger.go:116: INFO server.go:768 [core] [Server #1 ListenSocket #2] ListenSocket deleted  (t=+10.067759146s)
        tlogger.go:116: INFO clientconn.go:1146 [core] [Channel #7 SubChannel #8] Subchannel Connectivity change to IDLE  (t=+10.06[83](https://github.com/grpc/grpc-go/actions/runs/4988867731/jobs/8932073496#step:8:84)[90](https://github.com/grpc/grpc-go/actions/runs/4988867731/jobs/8932073496#step:8:91)655s)
        tlogger.go:[116](https://github.com/grpc/grpc-go/actions/runs/4988867731/jobs/8932073496#step:8:117): INFO clientconn.go:451 [core] [Channel #7] Channel Connectivity change to IDLE  (t=+10.074070535s)
        grpctest.go:39: Leaked goroutine: goroutine 21 [select]:
            google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run(0x400008da60, {0x602e58, 0x40007c8050})
            	/home/runner/work/grpc-go/grpc-go/internal/grpcsync/callback_serializer.go:67 +0xdc
            created by google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer
            	/home/runner/work/grpc-go/grpc-go/internal/grpcsync/callback_serializer.go:52 +0x13c
        grpctest.go:39: Leaked goroutine: goroutine 22 [select]:
            google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run(0x400008dab0, {0x602e58, 0x40007c80a0})
            	/home/runner/work/grpc-go/grpc-go/internal/grpcsync/callback_serializer.go:67 +0xdc
            created by google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer
            	/home/runner/work/grpc-go/grpc-go/internal/grpcsync/callback_serializer.go:52 +0x13c
        grpctest.go:60: Leak check disabled for future tests
FAIL

run:
https://github.com/grpc/grpc-go/actions/runs/4988867731/jobs/8932073496

@dfawley
Copy link
Member

dfawley commented May 17, 2023

This looks wrong:

once.Do(func() {
vmOnGCP = true
})
vmOnGCP = true

Can you take a look, please @matthewstevenson88? This test is flaking which can delay our development cycles.

@matthewstevenson88
Copy link
Contributor

Thanks for flagging this, I will take a look today.

@matthewstevenson88
Copy link
Contributor

I've sent #6300 to fix this issue. Thanks again for finding the flake!

@codyoss
Copy link

codyoss commented Jun 23, 2023

I see a similar/same leak in an upgrade deps PR in our project: googleapis/gapic-generator-go#1362

--- FAIL: TestComplianceSuite (5.07s)
    leakcheck.go:111: Leaked goroutine: goroutine 4 [select]:
        google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run(0xc000080480, {0xdc0458, 0xc000037540})
        	/home/runner/go/pkg/mod/google.golang.org/grpc@v1.56.1/internal/grpcsync/callback_serializer.go:83 +0x12a
        created by google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer
        	/home/runner/go/pkg/mod/google.golang.org/grpc@v1.56.1/internal/grpcsync/callback_serializer.go:55 +0x138
    leakcheck.go:111: Leaked goroutine: goroutine 5 [select]:
        google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run(0xc0000804e0, {0xdc0458, 0xc000037580})
        	/home/runner/go/pkg/mod/google.golang.org/grpc@v1.56.1/internal/grpcsync/callback_serializer.go:83 +0x12a
        created by google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer
        	/home/runner/go/pkg/mod/google.golang.org/grpc@v1.56.1/internal/grpcsync/callback_serializer.go:55 +0x138

@matthewstevenson88
Copy link
Contributor

Sorry this issue should have probably been closed when #6300 got merged.

@codyoss Is there anything in your failing test that indicates that the problem is related to ALTS? Or is the similarity just that the failure logs look similar?

@codyoss
Copy link

codyoss commented Jun 26, 2023

Is there anything in your failing test that indicates that the problem is related to ALTS?

Not that I know of, but I also don't know what that stands for.

Or is the similarity just that the failure logs look similar?

I had just noticed the leak referenced NewCallbackSerializer which was also in the call stack of this issue. I believe you should be able to view our test logs in the link, but I posted the full context we had from our leak checker, at least for this specific test(many failed do to what I assume is one issue.)

@matthewstevenson88
Copy link
Contributor

Thanks @codyoss. Could you open a separate issue for this failure to better get the attention of the gRPC-Go team? I don't want the leak you noticed to get buried in this flaky test issue, which I think is unrelated.

@codyoss
Copy link

codyoss commented Jun 26, 2023

@matthewstevenson88 done.

@matthewstevenson88
Copy link
Contributor

Closing this issue since #6300 is merged.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants