Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use socket from the request #1939

Merged
merged 1 commit into from
Feb 23, 2021

Conversation

mzahor
Copy link
Contributor

@mzahor mzahor commented Feb 17, 2021

Fixes #1377

@blumamir and I investigated this issue and were able to reproduce it.
It happens if client uses the same TCP connection to send multiple requests (in keep-alive mode for example), so that there are multiple active requests on the same socket.
In this scenario, there is one socket for two request/response pairs and this socket is attached only to a single response on every given time.
So it may happen, that the instance of OutgoingMessage (response in our case), will not have a socket attached when handler calls the response.end function. In this case, the data will be pushed into an internal buffer https://github.com/nodejs/node/blob/75fd447/lib/_http_outgoing.js#L334.
But in http plugin we try to access some properties of the socket, since it's equal to null we get an unhandledException event and server crashes.
Since the socket instance is the same for both request and response, it's enough to use request.socket to fix this issue and still have the same instrumentation data.

This is a small tool that we used to simulate this scenario:

package main

import (
	"bufio"
	"fmt"
	"io"
	"net"
)

func main() {
	message := "GET / HTTP/1.1\n\n"

	c, err := net.Dial("tcp", "localhost:8088")
	check(err)

	fmt.Printf(">> sending first message \n")
	_, err = fmt.Fprintf(c, "%s", message)
	check(err)

        // don't read the response and send another message
	fmt.Printf(">> sending second message \n")
	_, err = fmt.Fprintf(c, "%s", message)
        // otel should error at this point
	check(err)

	fmt.Printf("== reading responses \n")
	for {
		response, err := bufio.NewReader(c).ReadString('\n')
		if err == io.EOF {
			continue
		}
		check(err)
		fmt.Printf("<< %s\n", response)
	}
}

func check(e error) {
	if e != nil {
		panic(e)
	}
}

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 17, 2021

CLA Signed

The committers are authorized under a signed CLA.

@codecov
Copy link

codecov bot commented Feb 17, 2021

Codecov Report

Merging #1939 (48cd0db) into main (9437d7e) will increase coverage by 1.73%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1939      +/-   ##
==========================================
+ Coverage   92.62%   94.35%   +1.73%     
==========================================
  Files         170       26     -144     
  Lines        5733     2020    -3713     
  Branches     1192      477     -715     
==========================================
- Hits         5310     1906    -3404     
+ Misses        423      114     -309     
Impacted Files Coverage Δ
...es/opentelemetry-instrumentation-http/src/utils.ts 99.03% <100.00%> (+0.48%) ⬆️
packages/opentelemetry-plugin-http/src/utils.ts 99.03% <100.00%> (+0.49%) ⬆️
...ges/opentelemetry-instrumentation-http/src/http.ts 94.69% <0.00%> (-0.82%) ⬇️
...s/opentelemetry-metrics/src/UpDownCounterMetric.ts
.../opentelemetry-api/src/platform/node/globalThis.ts
...entelemetry-metrics/src/UpDownSumObserverMetric.ts
...telemetry-tracing/src/export/BatchSpanProcessor.ts
...src/platform/node/instrumentationNodeModuleFile.ts
...sync-hooks/src/AbstractAsyncHooksContextManager.ts
packages/opentelemetry-api/src/api/context.ts
... and 139 more

@mzahor
Copy link
Contributor Author

mzahor commented Feb 17, 2021

I tested it manually and everything worked. I'll check what's wrong w/ the unit tests tomorrow.

@dyladan
Copy link
Member

dyladan commented Feb 17, 2021

Awesome thanks for the contribution. Looks like a simple enough change and my guess is the test failures are because the mocks assumed the response is used but I'll hold off until tests are fixed.

@mzahor
Copy link
Contributor Author

mzahor commented Feb 18, 2021

@dyladan I've fixed the tests, also added an integration test for this case. Please take another look.

Copy link
Member

@obecny obecny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@vmarchaud
Copy link
Member

@mzahor FYI the PR need be rebased and its seems we can't do it ourselves from the github UI

@mzahor mzahor force-pushed the fix/http-keepalive branch 2 times, most recently from d1a2810 to 3cb569b Compare February 23, 2021 09:46
Co-authored-by: Daniel Dyla <dyladan@users.noreply.github.com>
Co-authored-by: Valentin Marchaud <contact@vmarchaud.fr>
@mzahor
Copy link
Contributor Author

mzahor commented Feb 23, 2021

@vmarchaud Rebased.

@vmarchaud vmarchaud merged commit 9a67045 into open-telemetry:main Feb 23, 2021
@mzahor mzahor deleted the fix/http-keepalive branch February 23, 2021 10:27
@dyladan dyladan added the bug Something isn't working label Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[plugin-http] null-check socket before destructuring
5 participants