-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - gRPC Plugin framework #1214
WIP - gRPC Plugin framework #1214
Conversation
Great start. My main comment is about avoiding duplicating the trace model. |
plugin/storage/grpc/options.go
Outdated
"github.com/jaegertracing/jaeger/pkg/grpc/config" | ||
) | ||
|
||
const pluginBinary = "grpc-plugin.binary" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should refer to storage since we may want to use grpc plugins for other needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you propose? I'm afraid to come up with something thats too long
.editorconfig
Outdated
@@ -0,0 +1,4 @@ | |||
# Override for Makefile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't include this. You can add it to your global gitignore.
cmd/memory-grpc-plugin/main.go
Outdated
plugin.Serve(&plugin.ServeConfig{ | ||
HandshakeConfig: shared.Handshake, | ||
VersionedPlugins: map[int]plugin.PluginSet{ | ||
18: map[string]plugin.Plugin{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is 18?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant for 1.8 (current version). Should we use something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the semantics here. Is this the version of the plugin API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourcegraph helped me find some docs here:
VersionedPlugins is a map of PluginSets for specific protocol versions. These can be used to negotiate a compatible version between client and server. If this is set, Handshake.ProtocolVersion is not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the version of the plugin API. I changed it to 1
glide.yaml
Outdated
@@ -76,3 +76,4 @@ import: | |||
- package: golang.org/x/sys | |||
subpackages: | |||
- unix | |||
- package: github.com/hashicorp/go-plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a major version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No there is none
plugin/storage/grpc/plugin.go
Outdated
VersionedPlugins: map[int]plugin.PluginSet{ | ||
18: shared.PluginMap, | ||
}, | ||
Cmd: exec.Command("sh", "-c", configuration.PluginBinary), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need sh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not, good point
@@ -0,0 +1,90 @@ | |||
syntax = "proto3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this copied just to remove gogoproto options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it was
"github.com/jaegertracing/jaeger/plugin/storage/grpc/proto" | ||
) | ||
|
||
func DependencyLinkSliceFromProto(p []*proto.DependencyLink) []model.DependencyLink { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid the need for this file. One of the main reasons for generating domain model from proto IDL was to be able to reuse those types in the grpc plugin interfaces
Hi @olivierboucher, we're keen to use the work that you have started here, thanks! Do you need any assistance in pushing it to a point where it can be considered complete? |
@chvck I have been a bit busy lately but I still plan on finishing this. I might need some help with the proto generation in order to use gogoproto and remove the mapping code |
I am trying to add support for configuration and I may need advice. I am doing the following to read from the viper context:
However I am confused as what to add in the My goal is for people to use the args like this:
and have Not sure if this conversation should be moved to an issue either. Thanks |
@olivierboucher making CLI flags dynamic is pretty complicated. The way we did this for the main SPAN_STORAGE_TYPE is that even though this flag is accepted on the command line, it is actually parsed manually (not via library), then used to instantiate specific storage factory, which is then responsible for registering the actual flags. Only after that the command line is parsed by the flags library. I think the same process will be pretty complicated to reproduce for grpc-plugin. There is a way, but I think it's way more complicated than it is worth. Do you have any concerns with only doing a config-file approach? Then the only CLI flag we need to define is the path to the config file. The parsing of the config file can be still done via Viper inside the plugin main(), this way some config params can still be overwritten with environment variables. |
Makes sense ! I just committed the implementation. I think it's time to benchmark and write tests against the new code. I'm not familiar with code benchmarking, any method you would suggest? Is there existing code somewhere ? Thanks |
This is where #1221 might come in. If you can figure out crossdock, try adding an integration test. It is definitely tricky so I'll see if I have some free time to help. Benchmarking can be as simple as measuring the latency in requests and perhaps throughput. Not sure how formal the process needs to be. |
Well, crossdock is more for correctness integration testing, we never used it for performance. For benchmarking storage I don't think the integration tests are important. We just need a data generator pushing data into SpanWriter interface, one running in-process, another running via grpc-plugin. |
We have a trace generator in the internal repo that I used to stress test the agents back in the days. Let me see if I can move it to oss. |
I know it's not top quality data but I re-used the createTrace method from the integration tests and called it in a loop. I compared memory vs grpc-memory-plugin and it's about the same numbers. Keep me posted regarding that trace generator |
Wow, same numbers? That makes me suspicious of the methodology. Did you really stress the system? The dual serialization of grpc version alone should show some difference |
Did you use the standard Go benchmarking from the testing package? |
trace generator also creates fairly simple traces, it's purpose is to generate load of on agents or collectors. |
I did not, what I did is probably meaningless. I will manage to checkout the tracegen and run proper tests |
I ran the tracegen tool and those are the results:
First is using the grpc plugin, second is using memory. It looks like it has a huge performance impact but I think that was expected. Is this tolerable ? |
Could you elaborate how you ran this? Did you just run the binary twice against different backends? The tracegen is capable of saturating the UDP stream, so the numbers it outputs are not meaningful for this benchmark, what we actually need to look for is the write latency in the collectors and the number of spans saved (both should be available via Prometheus metrics). Another type of test we could do is to try to saturate the storage, i.e. vary the |
You're correct I did ran tracegen against the 2 backends. Can you help me by telling me which metrics are meaningful? I still have all the prometheus metrics stored locally. |
These are the metrics that should be coming out of any storage jaeger/storage/spanstore/metrics/write_metrics.go Lines 24 to 30 in 50429cb
But, for some reason, I cannot find them from all-in-one. Try looking for something like these:
|
I have limited testing capabilities on my MBP (16GB ram fills up quickly) Here is a graph, my prometheus instance scrape rate is tuned to 2s for this test: First I ran grpc, then memory. From my understanding there only seem to be a ~1ms difference which leaves me perplex EDIT: more graphs Looks like most of the spans were rejected |
I think I understand what to do in order to make it work. I'll define trace ids as bytes just like it is done in |
I believe the Protobuf compiles for any language (despite gogo-specific annotations), as long as the gogo Protobuf files are available for inclusion. See this C example I was playing with: https://github.com/isaachier/jaeger-client-c/tree/master/idl. |
Added proper annotations to storage.proto instead of exposing TraceID in model.proto. Tests are back to green. Signed-off-by: Olivier Boucher <info@olivierboucher.com>
Moved DependencyLink to model.proto since it had to be used and the existing struct did not implement any Marshaler/Unmarshaler methods. Changed storage configuration to allow testing via mocks. Added DependencyReader interface to the StoragePlugin interface. Signed-off-by: Olivier Boucher <info@olivierboucher.com>
I fixed the issue by reverting changes to TraceID but I had to move EDIT: @isaachier I meant that other languages have to look at the go implementation because they only see |
That should be fine, although we may want to do that as a separate PR. Also, there seem to be any changes in the generated file, something is off, most likely a different version of the generator. |
What versions of protoc/protoc-gen-go should I run locally? Maybe this is off, I'm running 3.6.1. So if we want to do this in 2 separate PR I would have to do the DependencyLink one first because this one depends on it. |
Yes, we can do proto change as a separate (first) PR. I don't think the version of protoc matters much, it's the version of gogo that's important. The |
this seems to be a blocking issue #1258 |
I fixed this issue adding the prune settings in Gopkg.toml. I will make a PR for those changes EDIT: may have spoken too fast after overlooking the issue. I added prune settings that brought back the missing proto files, it fixed´make proto’ |
I'm not sure if this is the right place for this but it might be worth consideration. I've had a quick look at implementing a plugin using this and it seems nice, I have an issue trying to create a plugin from outside of jaeger due to vendoring though. In my own package if I do
then I see
I could be doing this wrong/jumping the gun but I thought that it was worth bringing it up just in case. |
@chvck this often happens if your code is pulling jaegertracing/jaeger from GOPATH instead of a flat vendor dir inside your repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olivierboucher I think in this PR you have a proof of concept that the approach works, so I would really like to see it move forward. I have a proposal - could you pull all proto-related changes into a separate PR and let's try to merge it first? It should include changes to the Makefile & Gopkg. Once it's merged it will be easier to iterate on the code. The reason I prefer it to be separate is because (a) it's the part that creates the most conflicts with master at the moment, and (b) see my comment about Badger/#760 PR - once we have the IDL merged we could proceed on that direction independently of the plugin work.
The only thing to do before splitting proto-changes is to decide on my comment about splitting the Storage proto service into underlying reader/writer/etc services, similar to how Jaeger code is structured. If we don't do that split, then we're creating a monolith storage, which may not be always practical, like those methods about sampling storage that you commented out. So maybe worth trying the split first in this PR, and if it works, then fork a new PR just for proto changes.
-I vendor/github.com/grpc-ecosystem/grpc-gateway \ | ||
-I vendor/github.com/gogo/googleapis \ | ||
-I vendor/github.com/gogo/protobuf/protobuf \ | ||
-I vendor/github.com/gogo/protobuf \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use $(PROTO_INCLUDES)
?
|
||
.PHONY: plugin-proto | ||
plugin-proto: | ||
protoc \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is not an independent model, let's add the compile step to the main proto
target, e.g. before model/proto/model_test.proto
@@ -0,0 +1,201 @@ | |||
syntax = "proto3"; | |||
|
|||
package jaeger.api_v2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct?
option go_package = "proto"; | ||
|
||
import "gogoproto/gogo.proto"; | ||
import "model.proto"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make it the last import, separated by a blank line
// | ||
//message WriteDependenciesResponse { | ||
// | ||
//} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove dead code
TraceID: traceID, | ||
}) | ||
if err != nil { | ||
return nil, fmt.Errorf("grpc error: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use "github.com/pkg/errors"
to wrap the error instead of converting to a string
@@ -0,0 +1,335 @@ | |||
// Copyright (c) 2018 The Jaeger Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please spit this file into client.go and server.go
string message = 1; | ||
} | ||
|
||
service StoragePlugin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rethink this naming. Once #760 is merged, it can only be used with all-in-one because collector and query service still need to share the in-memory storage. It would be nice to build a standalone "remote storage" component that can be run as a separate service and accessed by collector/query via gRPC API. This file is that API, but it wouldn't be an API of just the harshicorp-plugin, it's an abstract gRPC Storage API .
// dependencystore/Writer | ||
// rpc WriteDependencies(WriteDependenciesRequest) returns (WriteDependenciesResponse); | ||
// dependencystore/Reader | ||
rpc GetDependencies(GetDependenciesRequest) returns (GetDependenciesResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a strong reason for merging all different types of storage into a single proto service
? Could we not have the same component implement different services?
"time" | ||
) | ||
|
||
type Store struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this struct? Given its 1-1 delegation pattern, couldn't shared.StoragePlugin
type used directly in place of wherever this Store
type was meant to use?
@olivierboucher - do you need any assistance in completing this work? Also, just to raise this for awareness as I'm not sure it's an issue for this PR or should be addressed once this is in but when I request ~200+ traces I'm hitting gRPC message size limits.
|
@chvck could you kindly open an issue with your description? We would investigate it independently of this PR :) |
@yurishkuro am I correct in thinking that a rough plan here would be:
Sound about right? Also, what would be the best way for me to go about doing this? I think that I'm going to have to fork and create new PRs even for just updating the Go plugin part. @annanay25 I'd be happy to create an issue but does it make sense outside of this PR? I think that it's specific to the span data volume returned over the grpc service implemented in the PR. |
@chvck yes, although I'm not sure about the difference between 1 and 2. |
Relating to issue jaegertracing#422 and review jaegertracing#1214, add proto changes to later allow for gRPC plugins to be used. Signed-off-by: Charles Dixon <chvckd@gmail.com>
Relating to issue jaegertracing#422 and review jaegertracing#1214, add proto changes to later allow for gRPC plugins to be used. Signed-off-by: Charles Dixon <chvckd@gmail.com>
What's the state of this PR? |
* Add gRPC plugin proto changes Relating to issue #422 and review #1214, add proto changes to later allow for gRPC plugins to be used. Signed-off-by: Charles Dixon <chvckd@gmail.com> * Change FindTraces signature to return a stream. FindTraces can hit grpc message size limits if a large number of spans are requested, using a stream mitigates this issue. Signed-off-by: Charles Dixon <chvckd@gmail.com> * Satisfy gofmt tool Signed-off-by: Charles Dixon <chvckd@gmail.com> * Change proto package and service names Signed-off-by: Charles Dixon <chvckd@gmail.com> * Delete commented out spanstorage Signed-off-by: Charles Dixon <chvckd@gmail.com> * Change FindTraces response to be a stream of spans Signed-off-by: Charles Dixon <chvckd@gmail.com> * Change from EmptyMessage to google.protobuf.Empty Signed-off-by: Charles Dixon <chvckd@gmail.com> * Move from using StoragePluginError to google.rpc.Status Signed-off-by: Charles Dixon <chvckd@gmail.com> * Remove commented code and clean up proto formatting Signed-off-by: Charles Dixon <chvckd@gmail.com> * Remove protobuf responses and only return successes, rely on Status for errors Signed-off-by: Charles Dixon <chvckd@gmail.com> * Update Gopkg lockfile Signed-off-by: Charles Dixon <chvckd@gmail.com> * Update Span type to come from model.proto Signed-off-by: Charles Dixon <chvckd@gmail.com> * Update storage proto file Signed-off-by: Charles Dixon <chvckd@gmail.com> * Add generated storage plugin file to lint ignores Signed-off-by: Charles Dixon <chvckd@gmail.com> * Lint ignore grpc plugin generated code by name not path Signed-off-by: Charles Dixon <chvckd@gmail.com> * Rename FindTracesResponseChunk to SpansResponseChunk Signed-off-by: Charles Dixon <chvckd@gmail.com> * Add marshal/unmarshal tests for DependencyLink Signed-off-by: Charles Dixon <chvckd@gmail.com> * Add tests for storage protos with custom types Signed-off-by: Charles Dixon <chvckd@gmail.com> * Run fmt and ignore storage_test for linting Signed-off-by: Charles Dixon <chvckd@gmail.com> * Remove DependencyLinkSource and use string Signed-off-by: Charles Dixon <chvckd@gmail.com> * Update headers Signed-off-by: Charles Dixon <chvckd@gmail.com> * Add SpansChunkResponse test Signed-off-by: Charles Dixon <chvckd@gmail.com> * Update makefile protoc calls Signed-off-by: Charles Dixon <chvckd@gmail.com> * Update proto generated files and update license script Signed-off-by: Charles Dixon <chvckd@gmail.com> * Update generated storage file to new proto layout Signed-off-by: Charles Dixon <chvckd@gmail.com> * Add storage generated files to import order cleanup ignores Signed-off-by: Charles Dixon <chvckd@gmail.com> * Move storage generated file to proto-gen dir Signed-off-by: Charles Dixon <chvckd@gmail.com> * Remove generated plugin storage file from script ignores Signed-off-by: Charles Dixon <chvckd@gmail.com> * Fix copyright headers Signed-off-by: Charles Dixon <chvckd@gmail.com> * Update storage_test generated file Signed-off-by: Charles Dixon <chvckd@gmail.com>
@jpkrohling It seems to be stale now. I've just created #1461 which builds on top of the work in this PR. |
Signed-off-by: Yuri Shkuro <ys@uber.com>
superseded by #1461 |
Which problem is this PR solving?
Short description of the changes