-
Notifications
You must be signed in to change notification settings - Fork 1
Request tracing #91
base: release/8.8
Are you sure you want to change the base?
Request tracing #91
Conversation
String reqId = MDC.get(_REQ_ID); | ||
if (reqId == null) return; | ||
if (log.isInfoEnabled()) { | ||
String type = IS_FORWARDED.get() ? "forwarded" : "external"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using this can we identify top level of request? (top query request vs subrequest for shard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Would it possible to add unit test for it? |
@noblepaul followup,
|
done
This has very little impact on CPU/memory. My own perf tests showed an extra 0.5milliseconds in increased latency |
Thanks @noblepaul. i will merge this further! |
Hi 👋🏼 , it's nice to see more tracing using standardized framework like OpenTracing! :) Since I am very new to the team, I tried to follow the tracing logic and have some thoughts - it's quite likely that I overlooked some logic/concerns as all of these solr magic is pretty new to me 😓 :
The rest of the comments are mostly just my opinions, I come from a very different background that uses OT traces quite differently 😊 . The current implementation works fine as is, but perhaps there are some elements that a bit confusing to me:
I understand we do not want to do any complete rewrite as the current implementation does work completely fine. I have tried to do a little bit of refactoring to avoid impact to other existing logic, and here is what i tried:
And for distance future, if Solr wants to do more tracing, I would definitely recommend giving opentelemetry a look as it gives more tracing capability and is an open standard that is actively being worked on 🎉
Sorry about my long write, I just get a bit too excited with tracing... 😅 |
thanks @patsonluk
Actually, we weren’t really looking to embrace the OpenTracing framework fully. The requirement was to just log the request id on each node and use BigQuery to do all the analysis. So, this is not a tracing implementation. In this case I could have just used the MDC only to set/get the request id. However, we chose to piggyback on the Jaeger support just to avoid making further changes to Solr in propagating the _req_id in internode requests. The objectives were:
Yes, Solr logs the QTime at the end of each request already out of the box. This whole exercise was aimed at ensuring the req id is propagated and logged, as well as the start of requests are logged. We will need to edit our log4j.xml to add the _req_id in the logs.it is not required to be in the code
Your observation is correct, we’re using MDC to set the request IDs, instead of span context. We needed the values in MDC for logging purposes, hence we didn’t go the extra step in populating it/using it to/from the span context. We weren’t looking for a full fledged tracing implementation, and achieved the objective using MDC alone.
The MDC thread local storage can propagate the values even for async requests using MDCAwareThreadPools: https://github.com/apache/lucene-solr/blob/branch_8x/solr/solrj/src/java/org/apache/solr/common/util/ExecutorUtil.java#L102
Since Solr 8x is pinned to using Jaeger, we assume we can rely on Jaeger’s behaviour for GlobalTracer.getTracer(), and hence we used it. There seems no plan to move Solr away from Jaeger to any other implementation (that implements GlobalTracer.getTracer() differently) yet, so we think it is safe to assume the behaviour will remain the same for the Solr 8x or 9x.
Do you have a branch/patch where you made these changes? |
Yes I agree that the current implementation absolutely fulfills the requirements and likely with less change than using full OpenTracing for tracing right now. :) We can always come back to this if in the future we want to use OpenTracing/OpenTelemetry for fuller tracing. My experimental implementation can be found in https://github.com/fullstorydev/lucene-solr/compare/noble/req_trace...patsonluk:patson/req_trace?expand=1 . It's a pretty rough draft and I have only run the unit test cases 😓 |
absolutely. We did not wish to have a complicated and costly impl. |
@patsonluk @noblepaul should we close this for now |
I think it's fine to close it? As the requirement of tracing is fulfilled by using rid for now - unless we want to trace non select queries, then we might need something like this? @noblepaul thoughts please? 😊 |
Please refer to https://docs.google.com/document/d/1-U0x2Y_m-2TemzBgryYzy7JyLEl3FcBN-G5Tftrr634/edit?ts=60e61406#
for more details