Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the req_id attribute on the decision logs #5006

Closed
humbertoc-silva opened this issue Aug 12, 2022 · 19 comments · Fixed by #5196
Closed

Add the req_id attribute on the decision logs #5006

humbertoc-silva opened this issue Aug 12, 2022 · 19 comments · Fixed by #5196

Comments

@humbertoc-silva
Copy link
Contributor

What is the underlying problem you're trying to solve?

I'm working on OPA logs (request/response logs, decision logs, status logs, etc.). I've noticed that the request/response logs have a correlation through the req_id attribute. I've seen that debug logging with the print built-in function uses this attribute too. This is useful as I can correlate all logs related to a specific request/response.

I would like to know if is it a good idea to have the req_id attribute on decision logs too. So the correlation would be complete and I could analyze all the logs generated by OPA when handling a single request/response.

Describe the ideal solution

The ideal solution is to add the attribute req_id to the decision log.

Additional Context

I am working with the console decision log but maybe the req_id could be useful when the decision log is configured to send the logs to a control plane too because I can search and correlate the other logs based on this attribute.

@anderseknert
Copy link
Member

Hi @humbertoc-silva ! And thanks for opening this. Maybe I'm missing something obvious here :) What does this add compared to the decision ID generated for a decision and added to the decision log entry? The request ID would still be generated by the OPA server, right?

It might also be worth to point out that the req_id attribute is currently coupled to the "request logging" facility, which is only enabled at log level INFO or below. If you'd set the log level to ERROR — which is the recommended level for production — you'll neither see this type of logs in the console, or have a req_id created. There's an open issue to allow configuring that logger independently from other logging, so perhaps we'd need to do that first if we want to have this implemented.

@humbertoc-silva
Copy link
Contributor Author

Hi @anderseknert,

The decision log ID has its own value, generally, a client application will get it when receiving a response from OPA and the responsible team will use it to filter the logs. But it can not help when we need to correlate the other logs generated by the server (request/response/print), e.g. I don't know which print precisely occurs in a certain decision. today it is not possible to correlate these other logs with the decision log.

I agree that this correlation just makes sense if the log level is INFO or higher because if the log level is lower there is nothing to correlate. So, maybe the log level needs to be considered when adding the req_id to the decision logs.

@anderseknert
Copy link
Member

Thanks @humbertoc-silva! I guess the next question then would be — when decision logging is enabled, is there any real value in keeping the "request log" enabled too? AFAICS, there's not much added in terms of data not already present in the decision log itself. Firing off a request with OPA running as opa run --server --set decision_logs.console=true:

request log

{
  "client_addr": "127.0.0.1:53452",
  "level": "info",
  "msg": "Sent response.",
  "req_id": 1,
  "req_method": "GET",
  "req_path": "/v1/data/policy/response",
  "resp_bytes": 54,
  "resp_duration": 0.6725,
  "resp_status": 200,
  "time": "2022-08-12T23:22:27+02:00"
}

decision log

{
  "decision_id": "163be9bb-b8e7-4324-a99b-9263e498a116",
  "labels": {
    "id": "d71df7ba-b0ba-40ca-a3de-4e127158c7cd",
    "version": "0.44.0-dev"
  },
  "level": "info",
  "metrics": {
    "counter_server_query_cache_hit": 0,
    "timer_rego_external_resolve_ns": 250,
    "timer_rego_input_parse_ns": 1708,
    "timer_rego_query_compile_ns": 79958,
    "timer_rego_query_eval_ns": 14125,
    "timer_rego_query_parse_ns": 58542,
    "timer_server_handler_ns": 257583
  },
  "msg": "Decision Log",
  "path": "policy/response",
  "requested_by": "127.0.0.1:53452",
  "time": "2022-08-12T23:22:27+02:00",
  "timestamp": "2022-08-12T21:22:27.575339Z",
  "type": "openpolicyagent.org/decision_logs"
}

If there are some error conditions that would stop e.g. the decision logger from logging at all, the normal error logging of OPA would hopefully be clear enough on what caused that error. Could definitely be some cases I'm not thinking of though 🙂

@humbertoc-silva
Copy link
Contributor Author

Some information is present on the decision log, some not, but when we have debugging log turned on via the print built-in function it definitely won't be present, and in some cases, it could be valuable to have some data printed to help on troubleshooting or to understand more about the decision in complex policies and all the logs connected to facilitate searching and filtering.

Maybe I am doing misuse of the print function, but this was a way I found to get more information about the decision instead return more data in the Data API response. I think that is more correct to do this using logs than putting extra information on client responses.

What do you think about this?

@anderseknert
Copy link
Member

Seems like a simple addition to make, so if it helps you with debugging, I don't see why not :)

@humbertoc-silva
Copy link
Contributor Author

humbertoc-silva commented Aug 15, 2022

Do you think that the req_id attribute always must be included on the decision log or when the log level is lower or equal info?

@humbertoc-silva
Copy link
Contributor Author

Hi @anderseknert,

I took some time to understand the decision log plugin and how req_id is generated, it is my first contact with the OPA source code. The request ID is an attribute of LogginHandler and the requestContext.ReqID attribute is set when the server handles a new request.

I saw that we just have req_id on request context when the log level is lower than or equal to info, as you said before.

I think that this method must be updated to include the re_id on the event as well as update the EventV1 type, and the AST method too, right?

@anderseknert
Copy link
Member

Seems about right @humbertoc-silva 👍 I guess we'd want to create the req_id outside of the "request logger", or else it will only be seen in the decision logs if that's enabled. OTOH, perhaps that's the only time it adds value, since it's not seen anywhere else?

@humbertoc-silva
Copy link
Contributor Author

humbertoc-silva commented Aug 18, 2022

Let me see if I got it. To always have the req_id on decision logs the ID always needs to be propagated instead to propagate it when the log level is lower or equal to info, so some change is necessary to make this independent of the log level. Here is where a new request is set with extra information based on the info log level.

At a first glance, it makes sense to have the req_id just when the other logs exist too, so we gonna have data to correlate to the decision log, and the req_id will be the attribute that joins all together.

@anderseknert
Copy link
Member

Agreed, if it just starts showing up in decision logs for everyone, there's probably going to be a lot of questions asked about its purpose 😅 So maybe we can start by only having it logged when the "request logger" is enabled. We can always change that later if we want.

@humbertoc-silva
Copy link
Contributor Author

Hi @anderseknert,

I will need help to get the request ID value on the decision log plugin. I will need to do a type assertion to extract the value from the context but the type requestContext is not exported from runtime package.

Are there other places that use this information that could be used as an example?
Do you have any suggestions on how I can do this?

Thank you!

@humbertoc-silva
Copy link
Contributor Author

humbertoc-silva commented Aug 19, 2022

In the meantime, I tried something here. Please, check my branch out and take a look.

I moved the request context temporarily to the logging package to do a test.

@humbertoc-silva
Copy link
Contributor Author

Hi @anderseknert,

Did you have a chance to take a look at my branch? As I mentioned earlier, I moved the request context to do a test but I think it is not the best place to move this code. I would like your opinion and orientation about this.

Thank you very much!

@anderseknert
Copy link
Member

Hey @humbertoc-silva! And sorry for the delay. Busy week with 2 releases.. 😅
Code looks like a good start to me, but @srenatus might be better equipped to say if there are problems with the approach.

@srenatus
Copy link
Contributor

srenatus commented Sep 8, 2022

👀 It looks OK-ish, but I wonder if we couldn't make the RequestID part of the decision struct (*server.Info).... putting values on the context always feels a tiny bit underhanded to me 😅 Or have you already considered that approach, and encountered blockers?

@humbertoc-silva
Copy link
Contributor Author

Hi guys,

I understand @anderseknert. @srenatus on the weekend I will try to implement using this approach, making the RequestID part of the decision log struct. Can I use the contributor's channel to ask questions about the implementation?

Thank you!

@srenatus
Copy link
Contributor

srenatus commented Sep 9, 2022

Can I use the contributor's channel to ask questions about the implementation?

Of course, that's what it's for 😃

@humbertoc-silva
Copy link
Contributor Author

Hi guys,

I created a new branch and implemented the feature following the @srenatus tip (making the RequestID part of the decision struct).

Could you, @anderseknert and @srenatus, check my new branch out, please?

Thank you.

@srenatus
Copy link
Contributor

You could also open a PR right away. It's easier to comment on that etc

humbertoc-silva added a commit to humbertoc-silva/opa that referenced this issue Oct 14, 2022
Today it is not possible to correlate the decision log with
other types of logs (server, print, etc.) when the server log
level is >= INFO. The log correlation could be helpful in
troubleshooting.

A solution is to add a common attribute in all logs to make
the log correlation possible, so adding the req_id attribute
on decision logs, when server log level is >= INFO, will make it
possible.

Fixes: open-policy-agent#5006

Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
humbertoc-silva added a commit to humbertoc-silva/opa that referenced this issue Oct 14, 2022
The documentation purpose is to explain the relation with
others logs, how it could be used, and when it is included
on decision logs.

Fixes: open-policy-agent#5006

Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
humbertoc-silva added a commit to humbertoc-silva/opa that referenced this issue Oct 14, 2022
Explain that request ID is incremental and unique by OPA instance.

Define RequestContext key as an unexported const to avoid colision and
provide RequestContext type-safe accessors.
More info: https://pkg.go.dev/context#Context.Value

Fixes: open-policy-agent#5006

Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
srenatus pushed a commit that referenced this issue Oct 14, 2022
…5196)

Today it is not possible to correlate the decision log with
other types of logs (server, print, etc.) when the server log
level is >= INFO. The log correlation could be helpful in
troubleshooting.

A solution is to add a common attribute in all logs to make
the log correlation possible, so adding the req_id attribute
on decision logs, when server log level is >= INFO, will make it
possible.

Fixes: #5006

* Add documentation about decision log req_id attribute

The documentation purpose is to explain the relation with
others logs, how it could be used, and when it is included
on decision logs.

Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
byronic pushed a commit to byronic/opa that referenced this issue Oct 17, 2022
…pen-policy-agent#5196)

Today it is not possible to correlate the decision log with
other types of logs (server, print, etc.) when the server log
level is >= INFO. The log correlation could be helpful in
troubleshooting.

A solution is to add a common attribute in all logs to make
the log correlation possible, so adding the req_id attribute
on decision logs, when server log level is >= INFO, will make it
possible.

Fixes: open-policy-agent#5006

* Add documentation about decision log req_id attribute

The documentation purpose is to explain the relation with
others logs, how it could be used, and when it is included
on decision logs.

Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
Signed-off-by: Byron Lagrone <byron.lagrone@seqster.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants