Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

+ kamon-armeria module #854

Open
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

lucasamoroso
Copy link

Armeria http server and client instrumentation.

I tested it manually and added tests based on okhttp and akka-http instrumentation modules tests.

Still there is work to do like the https/grpc server instrumention, but i think this PR is a good Armeria module starting point.

Thanks to @dpsoft for his help :)

@lucasamoroso lucasamoroso force-pushed the kamon-armeria-instrumentation branch 2 times, most recently from d3473fd to 9eaf0cd Compare September 14, 2020 22:14
@lucasamoroso lucasamoroso force-pushed the kamon-armeria-instrumentation branch 3 times, most recently from d80e40b to 7128c27 Compare September 26, 2020 14:29
Copy link
Contributor

@SimunKaracic SimunKaracic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpicks, but otherwise looking good.

Do try and split it into smaller commits in the future!

Copy link
Contributor

@SimunKaracic SimunKaracic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@SimunKaracic
Copy link
Contributor

I tried this out with
https://github.com/line/armeria-examples/tree/master/annotated-http-service
The example breaks, and I can't get the /docs endpoint. Stacktrace:

java.lang.NullPointerException
        at kamon.armeria.instrumentation.server.ServeMethodAdvisor$.around(ArmeriaHttpServerInstrumentation.scala:79)
        at com.linecorp.armeria.server.FallbackService.serve(FallbackService.java:40)
        at com.linecorp.armeria.server.HttpServerHandler.handleRequest(HttpServerHandler.java:384)
        at com.linecorp.armeria.server.HttpServerHandler.channelRead(HttpServerHandler.java:249)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at com.linecorp.armeria.server.ArmeriaHttpServerRequestHandler.channelRead(ArmeriaHttpServerRequestHandler.scala:36)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at com.linecorp.armeria.server.Http1RequestDecoder.channelRead(Http1RequestDecoder.java:215)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
        at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:253)
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

I'll try out the other examples as well, and report back.

@ivantopo
Copy link
Contributor

ivantopo commented Oct 7, 2020

Hey @lucasamoroso,

I took a deeper look at this PR today and found three things that are important to address:

  • Handling the context scope properly. I noticed that the ArmeriaHttpServerRequestHandler is storing the incoming Context instance, and then it gets cleared out on ArmeriaHttpServerResponseHandler, but as far as I understand, there is no guarantee that those two calls will happen on the same thread. I started scratching the surface of Armeria's threading model and so far, I can't say for sure that the behavior in this PR is appropriate. My guess is that there is a separation between Armeria's threads (running the event loops and all Armeria-specific code) and the threads that will actually run the user code (i.e. the service we are implementing), but I will need more research and testing to be sure.
  • Attaching Kamon's Context to Armeria's Channels. I'm not entirely sure if that is a good idea, specially because of the previous point. Also, I noticed that Armeria's RequestContext can carry additional attributes and that makes me wonder: is there any particular reason why the Kamon context is attached to the channels rather than Armeria's RequestContext?
  • Operation Naming. We should probably be moving to a more gRPC-like naming scheme, where the operation names (and endpoints) look more like ServiceClassName.methodName rather than using the HTTP path.

Now, we love Armeria and we want this thing merged and ready to roll! So here is what we will be doing in the next couple days:

  • Try to understand the threading model inside Armeria and make sure that we are not leaving dirty threads nor opening race conditions with Kamon's Context propagation.
  • Run all Armeria examples with this instrumentation and uncover any issues that might come up (@SimunKaracic already dropped some of that above).
  • Investigate further how the Prometheus and Zipkin integrations are working with naming and metering. There might be a couple interesting details to learn from there!

Looking forward to your ideas and feedback 😃

@lucasamoroso
Copy link
Author

@ivantopo,@SimunKaracic Thanks for the review!

I'll be doing some testing with Armeria examples, for now i've no problems with the Operation Name refactor but i need to take a closer look at the other comments.
I'll working on this in the next couple of days!

@lucasamoroso
Copy link
Author

Hi!
In the last couple of days, with @dpsoft help, we've been changing the current implementation. I've been doing this in another repo because is simple to me to do some manual testing the repo is https://github.com/lucasamoroso/kamon-armeria-test.
I'am currently doing manual testing with this armeria-examples taken from Armeria. Until now everithing is working fine so as soon as i finish this manual testing i'll push the changes here and i think with this changes we'll be able to work with grpc and https (but i've to test it).

@dpsoft
Copy link
Contributor

dpsoft commented Oct 21, 2020

@ivantopo about:

  • ** Operation Naming **. We should probably be moving to a more gRPC-like naming scheme, where the operation names (and endpoints) look more like ServiceClassName.methodName rather than using the HTTP path.

I'm not sure about moving to a gRPC-like naming scheme for all Armeria services. Is really common practice have gRPC services mixed with plain http services in an Armeria application. I think that we need other strategy for gRPC naming services in addition to the current one.

@lucasamoroso
Copy link
Author

Hi!
I pushed a reimplementation of this module, i think is a better approach. I've been doing manual testing with armeria-examples no big issue found.
In the next couple of days i'll work with these things:

  • There is a problem with some armeria-examples and the default unhandled operation name, i left a comment in the code.
  • gRPC and https, i think with this new implementation it should be easy to work with this protocols (until now i only tested http).

@ivantopo About this:

Investigate further how the Prometheus and Zipkin integrations are working with naming and metering. There might be a couple interesting details to learn from there!

I haven't had time to investigate yet but I will.

Looking for a feedback about the work done and if there is something more to do in this PR 😄

Copy link
Contributor

@SimunKaracic SimunKaracic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I have a couple of questions:

  1. You are storing the context in 4 different places, but never closing the returning scope. Won't that leave dirty threads around?
  2. You are setting the CLIENT_TRACE_SCOPE_KEY in Armerias context, but not reading it from anywhere?

Generally the code is in good shape, but I'd like to answer these before continuing with the PR.

@lucasamoroso
Copy link
Author

lucasamoroso commented Nov 9, 2020

1. You are storing the context in 4 different places, but never closing the returning scope. Won't that leave dirty threads around?

The contexts are stored inside a try-with-resources, i understand that the returned scope will be closed after the execution of the try block.

2. You are setting the `CLIENT_TRACE_SCOPE_KEY` in Armerias context, but not reading it from anywhere?

I'm using this in here https://github.com/kamon-io/Kamon/pull/854/files#diff-051d01483bf35bea603907712f76837cc9fd2e519e321daa3bb86229753865dfR49 but i'm not sure it's needed. For sure we need to propagate the Kamon context before call unwrap(), like this:

   try (Storage.Scope scope = Kamon.storeContext(Kamon.currentContext())) {
      ctx.setAttr(CLIENT_TRACE_SCOPE_KEY, scope);
      return unwrap().execute(ctx, requestHandler.request());
    } catch (Exception exception) {
      requestHandler.span().fail(exception.getMessage(), exception).finish();
      throw exception;
    }

But when the request is complete:

    ctx.log()
        .whenComplete()
        .thenAccept(log -> {
          try (Storage.Scope ignored = Kamon.storeContext(ctx.attr(CLIENT_TRACE_SCOPE_KEY).context())) {
            Timing.takeTimings(log, requestHandler.span());
            requestHandler.processResponse(KamonArmeriaMessageConverter.toKamonResponse(log));
          }
        });

I'm not sure if it's necessary to run

     Timing.takeTimings(log, requestHandler.span());
     requestHandler.processResponse(KamonArmeriaMessageConverter.toKamonResponse(log));

with Kamon context.

As an example, that I hope clarifies what I am trying to say, if i ran this code:

    ctx.log()
        .whenComplete()
        .thenAccept(log -> {
          try (Storage.Scope ignored = Kamon.storeContext(ctx.attr(CLIENT_TRACE_SCOPE_KEY).context())) {
            logger.info("processing kamon context " + Kamon.currentContext().hashCode());
            Timing.takeTimings(log, requestHandler.span());
            requestHandler.processResponse(KamonArmeriaMessageConverter.toKamonResponse(log));
          }
        });

The log line will show that the Kamon context is the same that we've configured in the Armerias context. But if i ran this code

    ctx.log()
        .whenComplete()
        .thenAccept(log -> {
            logger.info("processing kamon context " + Kamon.currentContext().hashCode());
            Timing.takeTimings(log, requestHandler.span());
            requestHandler.processResponse(KamonArmeriaMessageConverter.toKamonResponse(log));
        });

The log line will show that the Kamon context is a different from what we've configured in the Armerias context.

In this case the requestHandler was created with a Kamon context final HttpClientInstrumentation.RequestHandler<HttpRequest> requestHandler = clientInstrumentation.createHandler(KamonArmeriaMessageConverter.getRequestBuilder(req), Kamon.currentContext()); so i'm not entirly sure if that context will'be used always, no matter which Kamon context we're running in always that requestHandler's context will be used, so this try (Storage.Scope ignored = Kamon.storeContext(ctx.attr(CLIENT_TRACE_SCOPE_KEY).context())) inside thenAccept will be unnecessary.
In the the server decorator i've the same question, i hope someone with more experience with Kamon can help me.

@lucasamoroso
Copy link
Author

Hello!!
I think this pr is ready, I have been making a lot of changes in the last time because the first approach I gave to the instrumentation did not convince us much. I think this new approach is much better and it helped us add grpc and https in a very simple way, I just have a doubt regarding the propagation of the Kamon context that I left in one of the comments in response to @SimunKaracic

@ivantopo
Copy link
Contributor

Hey @lucasamoroso, I wanted to get into this PR and tried to rebase it to master locally but it didn't go well. There are some minor changes to tests due to the Scalatest version upgrade, but after finishing those changes the tests are not passing anymore 😢

Do you have time to try rebasing and pushing the updates? Otherwise let me know and I can move it forward 💪

@lucasamoroso
Copy link
Author

Hi @ivantopo! Sure I'll be working on this

@lucasamoroso
Copy link
Author

Hey @ivantopo!

I did a rebase and fixed the tests. The problem with the tests was that they must extend from InitAndStopKamonAfterAll to start and stop Kamon.

@ivantopo
Copy link
Contributor

Hey @lucasamoroso, thanks for the update!

Generally speaking it looks fine and tests pass locally, but I couldn't make it work with one of my Armeria applications by publishing locally and trying it out 🤷

I spent some time trying to debug it and figured out that the ArmeriaServerBuilderAdvisor code is not really running locally (tried both using -javaagent and with the bundle). I did some more digging and realized that if I disable the bridge on ServerBuilder then the instrumentation gets picked up:

class ArmeriaHttpServerInstrumentation extends InstrumentationBuilder {
  onType("com.linecorp.armeria.server.ServerBuilder")
    .advise(method("build"), classOf[ArmeriaServerBuilderAdvisor])
//    .bridge(classOf[ServerBuilderInternalState])

  onType("com.linecorp.armeria.server.FallbackService")
    .advise(method("handleNotFound"), classOf[HandleNotFoundMethodAdvisor])
}

But then I had to patch the server ports for it not to blow up. My guess is that when the bridge is in place the entire ServerBuilder class gets loaded before the advise is applied. The only difference I found is that all Kamon tests are running with Java 8, but the app I'm running is using Java 17. Also tried with Java 11 and it didn't work either.

Have you tried it with newer java versions?

@lucasamoroso
Copy link
Author

Hi @ivantopo !

I've tried with all java versions but the problem was with the attach mode. This commit should fix that issue.

I'll stay tuned for comments 😄

@yarosman
Copy link
Contributor

yarosman commented Apr 3, 2022

So what is the status of armeria module ?

@lucasamoroso
Copy link
Author

So what is the status of armeria module ?

Hi @yarosman!

I've resolved all @ivantopo's comments and I'm waiting for more comments or maybe this can be merged.

Are you interested in using this module?

@yarosman
Copy link
Contributor

Hello. @ivantopo you wrote that you migrate several services to armeria. Are you going to add kamon supporting ?

@ivantopo
Copy link
Contributor

Hey @yarosman. Yes! I'm going to make it my new year's resolution to get this PR merged and released ASAP!

We indeed have almost all of our services on Armeria now, but we did manual instrumentation on them long ago so kind of lost the urgency to do this for our own use case. But now is the time 🙏 (I might have said that before, hopefully this will REALLY be the time!)

@ivantopo ivantopo self-assigned this Jan 12, 2023
@trustin
Copy link

trustin commented Mar 20, 2023

We indeed have almost all of our services on Armeria now

@ivantopo, how about adding you to the adopters list and get some goods, then? 😉 line/armeria#4626

@oskoi
Copy link

oskoi commented Apr 10, 2023

@ivantopo Hi! Any ETA to merge PR?
Friendly ping

@hughsimpson
Copy link
Contributor

Commented some comments. Lot went on since then - if anybody has some context on those places, would appreciate it - I don't use armeria

@lucasamoroso
Copy link
Author

Commented some comments. Lot went on since then - if anybody has some context on those places, would appreciate it - I don't use armeria

Sadly I don't remember where are we right now, I stopped using Armeria about 2/3 years ago, because I left the company where I was using it, so I also lost track of the new features that Armeria added that could help us to resolve those issues.

If we want this feature I could help updating the library and trying to understand where are we right now.

@yarosman
Copy link
Contributor

...

@lucasamoroso
Copy link
Author

lucasamoroso commented Jun 20, 2024

Hi @yarosman, I'm not sure if this PR will reach master, I'll need to revisit the implementation and also I lost track of the new Armeria features (I'm working with Go now) .
Probably I could dedicate some time to this but I have the impression that I don't have enough knowledge about Armeria's internals in order to provide support and work on a better implementation and that is why I'm not sure if Kamon wants this PR. Do you need this instrumentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants