Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common Logging Format for Zowe Services #90

Closed
plavjanik opened this issue Mar 19, 2019 · 24 comments
Closed

Common Logging Format for Zowe Services #90

plavjanik opened this issue Mar 19, 2019 · 24 comments

Comments

@plavjanik
Copy link

plavjanik commented Mar 19, 2019

History

Each component has a different format (layout) of log messages that are stored in the log. They have different origin and there is no standard.

The major messages are written to the STDOUT DDNAME of the Zowe address space and the output looks inconsistent.

There following types:

  1. No format - plain message (shell scripts that are starting Zowe processes or services prior their logging library is initialized):

    Show location of node
    node is /z/masserv/node/node-v6.14.4-os390-s390x/bin/node
    Starting node
    Appending ZOWE_JAVA_HOME/bin to the PATH...
    Done.
    
  2. APIML - Spring Boot default format (via Logback - https://github.com/spring-projects/spring-boot/blob/master/spring-boot-project/spring-boot/src/main/resources/org/springframework/boot/logging/logback/defaults.xml)

    2019-03-19 11:24:25.242  INFO 50662274 --- .           main. o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 7
    2019-03-19 11:24:25.249  INFO 50662274 --- .           main. c.c.m.p.s.ServiceStartupEventHandler     : Discovery Service has been s
    2019-03-19 11:24:27.122  WARN 17105192 --- .           main. o.apache.tomcat.util.net.jsse.JSSEUtil   : Some of the specified .proto
    2019-03-19 11:24:27.786  INFO 17105192 --- .           main. o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 7
    2019-03-19 11:24:27.791  INFO 17105192 --- .           main. c.c.m.p.s.ServiceStartupEventHandler     : Gateway Service has been sta
    
  3. z/OS Services (Explorers)

    ..2m2019-03-19 11:24:05.250..0;39m ..32m INFO..0;39m ..35m17105200..0;39m ..2m---..0;39m ..2m.           main...0;39m ..36mo.s.b.w.e
    ..2m2019-03-19 11:24:05.330..0;39m ..32m INFO..0;39m ..35m17105200..0;39m ..2m---..0;39m ..2m.           main...0;39m ..36mo.apache.
    ..2m2019-03-19 11:24:05.330..0;39m ..32m INFO..0;39m ..35m17105200..0;39m ..2m---..0;39m ..2m.           main...0;39m ..36morg.apach
    
  4. zLUX:

    .2019-03-19 11:23:57.776 _zsf.bootstrap INFO. - Authentication plugin org.zowe.zlux.auth.apiml added to category apiml
    .2019-03-19 11:23:57.777 _zsf.bootstrap INFO. - Authentication plugin org.zowe.zlux.auth.zss added to category zss
    

Problems:

  1. Each format is different
  2. Characters that are not displayable both in IBM-037 and IBM-1047
  3. The service instance is not obvious from the log message
  4. Minor inconsistencies in log levels (WARN, WARNING)

Proposal

Example:

2019-03-19 11:23:57.776 <ZWEAGW1:threadInformation> userID INFO (locationInformation) ZWEA001I Message text

Components of the message:

  • Log level (see below) - 5 characters
  • Service name (up to 8 characters)
    • Each service that is a different source of the message should have a different value
    • In the format pfxSnnnI where pfx is Zowe prefix (ZWE, S is the subsystem identifier as defined in https://github.com/zowe/zlc/blob/master/process/messageManagement.md (e.g. A for API ML), nnn is the service ID (e.g. DS
      for Discovery Service, GW for the gateway), I is the number of the instance (in high-available more) - 1 by default
  • Thread name and/or thread ID - this part is customizable by the application (it can be ASIC, ASCB, or TCB for native applications)
  • User ID - the effective user ID under which is the code executed
  • Date and time in the format yyyy-MM-dd HH:mm:ss.SSS in the timezone of the z/OS system (put it first)
  • Location information - logger name, filename, and/or line number + component ID - for easier identification of the source (load module + offset in case of HLASM) - customizable, each service can use what makes sense in its context, if empty parentheses should be used
    • For Java: (filename:line)
  • (Optional) Message ID - see https://github.com/zowe/zlc/blob/master/process/messageManagement.md
  • Message text
  • Standardized log levels:
    • TRACE - Very detailed information, used only when diagnosing difficult problems. Not printed by default
    • DEBUG - Detailed information, typically of interest only when diagnosing problems. Not printed by default
    • INFO - Confirmation that things are working as expected
    • WARN - An indication that something unexpected happened, or indicative of some problem in the near future but the service is still working as expected
    • ERROR - Due to a more serious problem, the service has not been able to perform some function
    • FATAL - A serious error, indicating that the service itself may be unable to continue running

Recommended libraries:

No banners.

Related to #73 and #75

@plavjanik
Copy link
Author

Hello @MarkAckert, @1000TurquoisePogs, @stevenhorsman, @Joe-Winchester, @JirkaAichler!

I have tried to draft a common logging format that we could use in our services.

What do you think about it?

@plavjanik
Copy link
Author

@vvvlc

@vvvlc
Copy link

vvvlc commented Mar 27, 2019

what about

  • filename:line, this was useful for sustaining to speedup finding source file and line.
  • java thread name/id

@JirkaAichler
Copy link

I like it!
Do you have any idea when a message id should be present?

@stevenhorsman
Copy link
Member

I think it looks good - it seems to have all the info and then consistency is the main thing.

@plavjanik
Copy link
Author

I like it!
Do you have any idea when a message id should be present?

My take on this is that messages that require user action should have message IDs. Typically, all errors should have the message ID and most of the warnings.

Currently, we put all the messages to the spool via stdout. But we want to move most of the logs to log files and keep only important messages (errors, important warnings) related to the operational aspect of the services in the spool. In the end, all messages in the spool should have message IDs. In case of API gateway, it makes sense to have the log files split in the same way as other proxies or HTTP servers do (e.g. Nginx or Apache HTTP Server) and have "access log" with every access logged an error log (only the failed requests: 4xx, 5xx).

In any case, a consistent format will be useful and especially now when everything is in one place.

@plavjanik
Copy link
Author

I have made few updates to the format according to Vitek's suggestions. They were very useful in a product that we worked on in the past.

@hogstrom
Copy link
Contributor

hogstrom commented Apr 9, 2019

@plavjanik I like your recommended formats. Some other items for when a msgid is required.

Console
JES Job Log
User Interaction

Data that is trace data doesn't need message ids, but, it would be good to have a common format for all components so that log aggregation would be possible and ultimately tools like Splunk / ELK would be easier to exploit

@hogstrom
Copy link
Contributor

hogstrom commented May 1, 2019

@plavjanik to update and present at the next architecture call.

@ifakhrutdinov
Copy link

@plavjanik how flexible is this new format planned to be? For example, in the cross memory server messages that go to SYSPRINT and SYSLOG, we have additional components we need, like the ASCB, ASID and user ID of the caller application. On the other hand, the severity is expressed by the message ID suffix (E, W, I).

DATESTAMP              JOBNAME  ASCB    (ASID) TCB       MSGID     MSGTEXT                                                               
2019/04/29-15:43:16.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0001I ZSS Cross-Memory Server starting, version is 1.0.5                    
2019/04/29-15:43:16.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0002I Input parameters at 0x6FE6:                                           
2019/04/29-15:43:16.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0101I 00000000  0016D5C1 D4C57EE9 E6C5E2C9 E26DE2E3 |..NAME=ZWESIS_ST|      
2019/04/29-15:43:16.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0101I 00000010  C46BD4C5 D47EF0F0 00000000 0000     |D,MEM=00......|        
2019/04/29-15:43:16.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0004I Server name is 'ZWESIS_STD      '                                     
2019/04/29-15:43:16.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0103I Initialization step 'push recovery state' successfully completed      
2019/04/29-15:43:16.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0103I Initialization step 'verify STEPLIB' successfully completed           
2019/04/29-15:43:16.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0104I About to start console task                                           
2019/04/29-15:43:21.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0103I Initialization step 'load server' successfully completed              
2019/04/29-15:43:21.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0105I Core server initialization started                                    
2019/04/29-15:43:21.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0103I Initialization step 'init security' successfully completed            
2019/04/29-15:43:21.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0103I Initialization step 'allocate global resources' successfully completed
2019/04/29-15:43:21.97 ZWESIS01 00FB9700(00B2) 008F8588  ZWES0103I Initialization step 'established PC routines' successfully completed  
...
2019/04/30-01:18:25.24 ZWESIS01 00FB9700(00B2) 008F8098  ZWES0220I Modify LOG command received                                                            
2019/04/30-01:21:00.32 SSUSER46 00FA7800(00A7) 008F8368  ZWES0100I in authServiceFunction, parm = 0x1260E808                                              
2019/04/30-01:21:00.32 SSUSER46 00FA7800(00A7) 008F8368  ZWES0100I handleVerifyPassword(): username = SSUSER4, password = ******                          
2019/04/30-01:21:00.32 SSUSER46 00FA7800(00A7) 008F8368  ZWES0100I safVerify(VERIFY_CREATE) safStatus = 0, RACF RC = 0, RSN = 0, ACEE=0x7FF769E0          
2019/04/30-01:21:00.32 SSUSER46 00FA7800(00A7) 008F8368  ZWES0100I safVerify(VERIFY_DELETE) safStatus = 0, RACF RC = 0, RSN = 0, ACEE=0x7FF769E0          
2019/04/30-01:21:00.32 SSUSER46 00FA7800(00A7) 008F8368  ZWES0100I handleVerifyPassword() done                                                            
2019/04/30-01:22:25.21 ZWESIS01 00FB9700(00B2) 008F8230  ZWES0100I command handler called, type=1, command=0xFA7CA0, commandLength=1, userData=0x12672630 

@hogstrom
Copy link
Contributor

Move to community for implementation

@1000TurquoisePogs
Copy link
Member

Here is a prototype I have:
zowe app server:
2019-06-20 15:43:29.441 <ZWED:28753> 5070 INFO (org.zowe.terminal.proxy.tn3270data) Total TN3270 terminals connected: 1
before:
[458892] [2019-06-05 07:33:42.215 org.zowe.terminal.proxy.tn3270data INFO] - Total TN3270 terminals connected: 1

tn3270-ng2 app in browser:
image
Before:
image

PR: zowe/zlux-shared#17 and zowe/zlux-server-framework#116 and zowe/zlux-app-manager#119

I took liberties on the browser side: there is no thread or user ID to give, and since the entirety of the desktop logging would be about desktop apps, wouldn't the ID always start with ZWED?
So... I omitted them.

@1000TurquoisePogs
Copy link
Member

Updated to swap out user ID for name... process.geteuid versus os.userInfo.username

@1000TurquoisePogs
Copy link
Member

1000TurquoisePogs commented Jun 26, 2019

Let's use userID = username, and locationInformation = loggername,filename:linenumber
Colors should be omitted in logs that are plaintext, but if possible to conditionally display colors, colors are nice for active terminals.
Uncertain: should we have WARN, ERROR, and FATAL go to stderr instead of stdout? We must at the least have a log that contains all info, for support.

Examples

Java:
2019-03-19 11:23:57.776 <ZWEAGW1:threadInformation> myuser INFO (logger1,myfile.java:101) ZWEA001I Message text

nodejs Server:
2019-06-20 15:43:29.441 <ZWED:28753> myuser INFO (org.zowe.terminal.proxy.tn3270data,:) ZWED001I Total TN3270 terminals connected: 1

Browser:
2019-06-20 15:43:29.441 <ZWED:> myuser INFO (org.zowe.terminal.proxy.tn3270data,:) ZWED002I Total TN3270 terminals connected: 1

In the case of the node server & browser, we don't know filename or line number from the logger.
And, in the case of the browser, we lack thread & process info

@1000TurquoisePogs
Copy link
Member

Time: we should use UTC.
Remote logging alignment: should clients bootstrap by contacting server with their known time and the server respond with a time offset to get client timestamps pretty close to server timestamps?

How does imperative & browser align with server logs?

@1000TurquoisePogs
Copy link
Member

1000TurquoisePogs commented Jul 3, 2019

Though there is no threadID, for browser I started to think that browser+version would be important to know right in the log, so it could be printed at the beginning - not really important to know in each line.

@vvvlc
Copy link

vvvlc commented Jul 4, 2019

Here is my alternative to threadID in browser world,

  • I would consider individual http requests as individual threads. So a log message issued by a code participating in setting up data for http request or processing response of such request could have a unique ID (threadID=httpXXXXX). This way we could back track messages belonging to individual requests.

    2019-06-20 15:43:29.441 <ZWED:http1> myuser INFO (org.zowe.terminal.proxy.tn3270data,:) preparing data for /zss/getdataset
    2019-06-20 15:43:39.441 <ZWED:http1> myuser INFO (org.zowe.terminal.proxy.tn3270data,:) header asdv added
    2019-06-20 15:43:49.441 <ZWED:http1> myuser INFO (org.zowe.terminal.proxy.tn3270data,:) request started
    2019-06-20 15:44:29.441 <ZWED:http2> myuser INFO (org.zowe.terminal.proxy.tn3270data,:) preparing data for /zss/ussfiledataset
    ....
    2019-06-20 15:44:19.441 <ZWED:http1> myuser INFO (org.zowe.terminal.proxy.tn3270data,:) response from server came
    2019-06-20 15:44:29.441 <ZWED:http1> myuser INFO (org.zowe.terminal.proxy.tn3270data,:) widget updated
    
  • Another parallel to threadID is in messaging (that is among ZAF applications). So any code that sends a message and code that handles that message would have threadID=messageID

I can imagine implementation of such IDs might take some time so we can start with browser+version as a threadID.

@1000TurquoisePogs
Copy link
Member

I found that v8 (chrome, nodejs) does have a way of getting file & line number, etc, for location. Firefox doesn't as far as I can tell. Updated PR with nodejs specific coding for the JS logger.

@1000TurquoisePogs
Copy link
Member

Had a review of this concept and one point brought up was that use of the term "FATAL" instead of similar-meaning words could be taken the wrong way. The issue is that hospitals (some are big DB2 users) take the meaning of FATAL very seriously. A less sensitive term to use would be "SEVERE" or "UNRECOVERABLE", or "CRITICAL".

@1000TurquoisePogs
Copy link
Member

Let's use "CRITICAL" above "ERROR".

@DivergentEuropeans
Copy link
Member

@1000TurquoisePogs Good point.

Had a review of this concept and one point brought up was that use of the term "FATAL" instead of similar-meaning words could be taken the wrong way. The issue is that hospitals (some are big DB2 users) take the meaning of FATAL very seriously. A less sensitive term to use would be "SEVERE" or "UNRECOVERABLE", or "CRITICAL".

vvvlc pushed a commit to zowe/api-layer that referenced this issue Jul 24, 2019
Signed-off-by: Vitek Vlcek <vitezslav@vvvlcek.info>
vvvlc added a commit to vvvlc/data-sets that referenced this issue Jul 24, 2019
Signed-off-by: vvvlc  <vitezslav+github@vvvlcek.info>
disable banner based on zowe/zac#90 (comment)
vvvlc added a commit to vvvlc/data-sets that referenced this issue Jul 24, 2019
Signed-off-by: vvvlc  <vitezslav+github@vvvlcek.info>
disable banner based on zowe/zac#90 (comment)
Signed-off-by: Vitek Vlcek <vitezslav@vvvlcek.info>
vvvlc pushed a commit to zowe/api-layer that referenced this issue Jul 31, 2019
Signed-off-by: Vitek Vlcek <vitezslav@vvvlcek.info>
vvvlc pushed a commit to zowe/api-layer that referenced this issue Jul 31, 2019
banner cannot be enabled because it is displayed twice see spring-projects/spring-boot#17746

Signed-off-by: Vitek Vlcek <vitezslav@vvvlcek.info>
ilkinabdullayev pushed a commit to zowe/api-layer that referenced this issue Aug 1, 2019
banner cannot be enabled because it is displayed twice see spring-projects/spring-boot#17746

Signed-off-by: Vitek Vlcek <vitezslav@vvvlcek.info>
JirkaAichler pushed a commit to zowe/api-layer that referenced this issue Aug 1, 2019
#341)

* disable banner based on proposal in zowe/zac#90 (comment)
banner cannot be enabled because it is displayed twice see spring-projects/spring-boot#17746
@hogstrom
Copy link
Contributor

Need to create a process of clearing information to the sub-projects.

This was referenced Sep 24, 2019
@Joe-Winchester
Copy link
Member

@1000TurquoisePogs - with the move to make the Zowe runtime read only (so that we can use USS mountpoints for SMP/E and also wipe and replace with PTFs) there is a configuration directory that is introduced in 1.5. This is where the api-defs for static definition has moved to. It'd be great to move the zlux-app-server/log files also here. Is there a way to specify a parm to nodeserver.sh and nodecluster.sh to use an alternative log folder ?

@armstro
Copy link
Contributor

armstro commented Mar 18, 2020

After ZLC discussion today (March 18 2020) it was decided to close this issue as complete - will reopen or create new issues if additional work needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants