-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: provide skywalking logger plugin #5478
Conversation
apisix/plugins/skywalking-logger.lua
Outdated
trace_context = { | ||
traceId = base64.decode_base64url(ids[2]), | ||
traceSegment = base64.decode_base64url(ids[3]), | ||
spanId = tonumber(ids[4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The span id is missing in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I update the comment. Currently, the 4th part, 3
means span id.
4. Parent span ID. Must be an integer. It begins with 0. This span ID points to the parent span in parent trace segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if you also add the link to the skywalking's doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
apisix/plugins/skywalking-logger.lua
Outdated
local ids = ngx_re.split(headers["sw8"], '-') | ||
|
||
trace_context = { | ||
traceId = base64.decode_base64url(ids[2]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check the length of ids and the result of decode_base64url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
apisix/plugins/skywalking-logger.lua
Outdated
end | ||
|
||
if not log_body.route_id then | ||
log_body.route_id = "no-matched" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the no-matched
necessary for the skywalking? If not, we can leave it empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Co-authored-by: 罗泽轩 <spacewanderlzx@gmail.com>
@wu-sheng please take a look, and welcome feedback:) |
apisix/plugins/skywalking-logger.lua
Outdated
local headers = ngx.req.get_headers() | ||
if headers then | ||
-- 1-TRACEID-SEGMENTID-3-PARENT_SERVICE-PARENT_INSTANCE-PARENT_ENDPOINT-IPPORT | ||
local ids = ngx_re.split(headers["sw8"], '-') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want to do this? SkyWalking has tracer for APISIX, which has been integrated. Why don't you try to read it from the context?
Decoding the header is fine from tech perspective, but it could be a risk when one day, Lua agent supports namespace
concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And all following decode(twice for trace id and span id) and splitting cost unnecessary resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am considering about it still works when tracer plugin is not enabling.
If we give the version matching matric, it looks not very dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your call, SkyWalking wouldn't change v3 format for a long time(already years, expect more) AFAIK. namespace
would be an issue, but not very serious, just duplicate configurations.
I just think resource costing is very sensitive. And this is not that risk, as when the incompatible issue happens, it happens at the first time both plugin activated. So anyone will notice this immediately, they could do fix or disable one/both quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
But now, skywalking-nginx-lua does not expose trace context. I would like to get merged this pr first. And keep going on polishing.
We need to enhancement skywalking-nginx-lua and release a new version. And then, I back to polishing this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General review from design perspective(nothing related to code details).
apisix/plugins/skywalking-logger.lua
Outdated
local headers = ngx.req.get_headers() | ||
if headers then | ||
-- 1-TRACEID-SEGMENTID-3-PARENT_SERVICE-PARENT_INSTANCE-PARENT_ENDPOINT-IPPORT | ||
local ids = ngx_re.split(headers["sw8"], '-') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And all following decode(twice for trace id and span id) and splitting cost unnecessary resources.
} | ||
|
||
if not stale_timer_running then | ||
-- run the timer every 30 mins if any log is present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every 30 mins? For?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a huge latency in low traffic scenario. No one is willing to wait for 30 mins for log collecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sweeps the stale log data in every 30m.
When the data transport fails, the data will remain in memory. Therefore, regular cleaning is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not very general way to process logs. Usually, unless you have local cache or mmap(like SkyWalking Satellite), logs should be abandoned quickly once retry threshold reached.
The reason behind this logic is, observability presents a large data set but isn't always very useful, especially for logs. Metrics are small, and tolerable to keep latest values, but different from logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me. @spacewander how do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err. @dmsolr's explain is incorrect. This timer doesn't clean up the stale log. It clean up the stale buffers when they are no longer holding the log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is about clean up the stale buffers
only, than, this is APISIX's call. Nothing relates to SkyWalking or observability preference.
apisix/plugins/skywalking-logger.lua
Outdated
-- 1-TRACEID-SEGMENTID-SPANID-PARENT_SERVICE-PARENT_INSTANCE-PARENT_ENDPOINT-IPPORT | ||
local ids = ngx_re.split(sw_header, '-') | ||
if #ids == 8 then | ||
local trace_id, err = base64.decode_base64url(ids[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decode_base64url
only returns str or nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if so, we don't need to check the result of decode_base64url, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am correct, we should check if the trace_id is nil or not? The trace_id is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trace_context is optional. If trace_id is missing, it cannot relate to trace. But missing trace_id log can be found in log
view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I prefer to collect the missing trace_context log.
} | ||
_EOC_ | ||
|
||
$block->set_value("http_config", $http_config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New tests should contain a common section, like
apisix/t/plugin/limit-count2.t
Line 38 in f06f6cc
if (!$block->request) { |
There is no need to repeat them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -0,0 +1,117 @@ | |||
--- | |||
title: http-logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Co-authored-by: 吴晟 Wu Sheng <wu.sheng@foxmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to me. Thanks.
--- response_body | ||
opentracing | ||
--- error_log eval | ||
qr/.*\{\\\"json\\\":\\\"\{(\\\\\\\"\@timestamp\\\\\\\":\\\\\\\".*\\\\\\\"|\\\\\\\"client_ip\\\\\\\":\\\\\\\"127\.0\.0\.1\\\\\\\"|\\\\\\\"host\\\\\\\":\\\\\\\"localhost\\\\\\\"|\\\\\\\"route_id\\\\\\\":\\\\\\\"1\\\\\\\"|,){7}\}/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not easy to read, can there be a more readable way? @dmsolr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is raw skywalking log format.
Seem like have some approach to simplify it. I will have a try later.
|
||
=== TEST 7: test wrong trace context header | ||
--- request | ||
GET /opentracing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I suddenly noticed this, why /opentracing
? Rather than /request
.
What this PR does / why we need it:
Pre-submission checklist:
provide a new logger plugin to support log access log to SkyWalking in SkyWalking Log Format. If skywalking trace plugin is enabled or requests with SkyWalking Trace context header, this plugin will parse the trace context header and print trace context in log. So that the access log can relate to trace.