[for devs] New logging infrastructure #8457
jmazzitelli
started this conversation in
General
Replies: 1 comment
-
|
See this PR for new docs that are coming that discuss analyzing the new logs: kiali/kiali.io#881 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
this post is mainly for Kiali developers
TL;DR
I wanted to give some details of the new logging infrastructure in use inside the Kiali Server. In the past we used our simple, easy to use
log.DebugAPI (which we can still use today, it is not going away), but that does not allow us to provide much detail to our logs. We are going to start using structured logging and use that to track logs across requests using the zerolog API that we already have (but never fully utilized).See the "How to use the new logger" section below for the important part.
What we used in the past
In the past, logging messages in Kiali has been simple and straightforward -- using
log.Debugf()and related functions (e.g.log.Tracef(),log.Info(), etc). Under the covers, this is using the zerolog logger. But all this allowed us to do was log a string message at a specific level (debug, info, etc). At most we are able to build the string dynamically using the "f" versions of the API (which is essentially Stringf expressions).What we can use today
Today we are exposing more of the zerolog logger API to get better structured logging, along with being able to trace logs throughout an individual request. This is going to make it easier to debug and diagnose problems within the server.
What this means is that we will now have individual zerolog loggers given to our HTTP handlers, with those loggers attached to the HTTP request objects passed to the handlers. That logger can then be passed around inside
context.Contextobjects. These are actual zerolog loggers, so you need to use the zerolog API to log messages (which is not horrible, but slightly different from the simplelog.Debugfthat we have been using).How to use the new logger
Here's the important part for devs to know. To use this new logger infrastructure, you now have to obtain the logger in some way - usually either from a context within an
http.Request(i.e.http.Request.Context()) or from acontext.Contextargument passed to a function.Here's an example in our graph code of how to get the context from a request. Notice it simply gets the context from the request and passes it down to another function - that function will then use the context to obtain the logger so it can log messages.
At that point,
api.GraphNamespaceshas a context that contains the logger it should use to log messages. That function should pass the context to other functions it calls, so those other functions can use the same logger to log messages.Here's an example of how to get the logger from a context, and then using that logger to log a message.
Notice the slightly different API used to log messages. Rather than our global
log.Tracef()function we have been using, now you use the zerolog API directly. First call the log level method (in this caseTrace()) and then callMsgf()passing in the same dynamic Stringf expressions we have already been using. Solog.Tracef()should now bezl.Trace().Msgf(). Slightly different, but not much different, so we can easily refactor our logging code when switching over to the new logging API.NOTE ABOUT THE VARIABLE NAME zl: I use the variable name
zl(rather thanlog) because it makes it easy to visually see where code has been refactored to use the new logging API. It is much easier to seezlin the code and immediately know, "Ah, this is already using zerolog and the new logger stuff". Seeinglogas variable names as well as the package name will be confusing and make the devs think too hard :) It's just a little thing that makes it slightly easier when refactoring and adding this code.What do the logs look like now (i.e. "Why are we doing this?")
Not only does this add structured data to the logs (providing easier ways to search and filter log messages), but it also allows us to track log messages for a particular request, allowing for easier debugging and diagnosis of problems.
Here's an example of what a log message now looks like for the graph generation (note: I have enabled JSON logging (
spec.deployment.logger.log_format=json) because it is easier to read the data for documentation, but everything works basically the same with our default text format; you just use grep expressions rather than jq to filter and search the logs):Notice an important piece of data we now have:
request-id(this log came from request identified asd0or9k4dbm7c739fv8jg). If I then filter my logs by searching for that request-id, I can follow that graph generation request from the handler all the way down to the appenders and Prometheus queries, and see what logs were generated just for that request. Below are only some of the logs just so you get the idea; I did not show a bunch of messages here so I don't spam this thread as much as I already have ;).kubectl logs -n istio-system deployments/kiali | jq -rR 'fromjson? | select(."request-id" == "d0or9k4dbm7c739fv8jg")'Here you see the entire graph generation for this particular request took 17.66ms. You can see what Prometheus queries were performed to build this specific graph, and how long each of those queries took.
What to do now?
Only parts of the server code has been converted to use this new logging infrastructure. The graph generation code and some of the validation code have been converted because those are the ones with the most impact on performance (and thus the most important to want this new data for). There are other parts of the code that have been converted, too - like the Prometheus and Tracing code.
We can still use the original
log.Debugf()APIs. That still works. You just won't get all the nice request-level logging. If you find code where you think it will be helpful to have that, you can convert to the new logging API at that time.Beta Was this translation helpful? Give feedback.
All reactions