Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFOS stop logging after some time #70

Closed
petemm123 opened this issue Feb 22, 2022 · 14 comments
Closed

RFOS stop logging after some time #70

petemm123 opened this issue Feb 22, 2022 · 14 comments

Comments

@petemm123
Copy link

Your library is great. I have one issue - after some time, no information is written to log, actual file is zero size, rotation trigger is invoked but it skips rotation and writes nothing.
It is size based policy, number (bytecount) is increasing after every attempt to write but nothing is written into file (attempt to reopen file was made but it does not help).

Two notes:

  • many concurrent threads are writing to RFOS, it might occur another write attempt during rotation in place
  • currently, byteCount=2318270262 [it had been writing half day and next 5 days there is no write]

Please could you suggest solution? I believe that your library is superb, this is minor bug (but for me important).

Thanks,

MM

@vy
Copy link
Owner

vy commented Feb 22, 2022

Hello @petemm123! Thanks so much for your kind words! ❤️ Mind helping me with the following questions, please?

  • Which version do you use?
  • Could you share your complete RotationConfig, please?
  • "many concurrent threads are writing to RFOS" How do they write? Which RFOS methods do they access to?
  • "rotation trigger is invoked" How did you derive this?
  • "number (bytecount) is increasing after every attempt to write" How did you derive this?
  • Do you have LoggingRotationCallback in place? If so, mind sharing those logs please?

@petemm123
Copy link
Author

Hello @vy ,

thanks, config is de facto copied from some example (in fact paths and size is from parameters - I've replaced it with constants here)

	    	   config = RotationConfig
	    		        .builder()
	    		        .file("/tmp/abc/app.log")
	    		        .filePattern("/tmp/abc/app-%d{yyyyMMdd-HHmmss.SSS}.log")
	    		        .policy(new SizeBasedRotationPolicy(1024 * 1024 * 10 ))
	    		        .compress(true)
	    		        .policy(DailyRotationPolicy.getInstance())
	    		        .build();

                                  rfos = new RotatingFileOutputStream(config);
						
Write is using method

                        rfos.write(jo.toString().getBytes(StandardCharsets.UTF_8));
    		    rfos.write("\r\n".getBytes(StandardCharsets.UTF_8));

Everything worked ok but after some time (few days or few hours, last time it was 19th rotation, previously 90+th rotation), zero length app.log appeared and in log, there was an error that (I'm not sure exactly) rfos cannot write to closed stream.

Note that many threads share one rfos, so approx 1-2 lines per second are written.

So I've bounded rfos.write with try-catch block and when message contain "losed" (both "closed" and "Closed"), new rfos with same config is initialized
(rfos = new RotatingFileOutputStream(config);), but this didn't help.
[note that it is made multithread-safely that after "losed" catch, lock is applied, new rfos is initialized and lock is released]

As logging is set to debug, these lines appear in log:

triggering {byteCount=2428461623}\n","stream":"stdout","time":"2022-02-23T05:34:56.180176085Z"}
rotation trigger {policy=SizeBasedRotationPolicy{maxByteCount=10000000}, instant=2022-02-23T05:34:56.171219671Z}\n","stream":"stdout","time":"2022-02-23T05:34:56.180179185Z"}
empty file, skipping rotation {file=/tmp/abc/app.log}\n","stream":"stdout","time":"2022-02-23T05:34:56.180182429Z"}

and /tmp/abc/app.log is still 0 byte size.

You will understand this better.

Sorry I don't have LoggingRotationCallback in place but if I should, I can put it there and log anything and share with you, please just write me how to put this in place and what to do/log.

Thanks a lot!

MM

@vy
Copy link
Owner

vy commented Feb 23, 2022

Which rotating-fos version, operating system, and JDK?

So I've bounded rfos.write with try-catch block and when message contain "losed" (both "closed" and "Closed"), new rfos with same config is initialized
(rfos = new RotatingFileOutputStream(config);), but this didn't help.

What do you exactly mean by it didn't help? Would you also share the stack trace of the error you have mentioned?

Do you also have a thread dump for me when you are writing but nothing gets either written or rotated?

Would you also share some more DEBUG logs from com.vlkan.rfos package?

@petrmasa
Copy link

Hi,
I can share a lot, first questions:

version:

com.vlkan.rfos
rotating-fos
0.9.3

OS version, JDK, etc. - it is docker, namely "FROM tomcat:9.0-jdk17-openjdk-buster"

try - catch & "didn't help" - from some time, stream became closed and new file has zero size. I've tried to try-catch block and when it gets "closed" error, I've tried to reopen (

if (rfos!=null &!this.locked)
{
try
{
rfos.write(jo.toString().getBytes(StandardCharsets.UTF_8));
rfos.write("\r\n".getBytes(StandardCharsets.UTF_8));
}
catch (Exception E)
{
if (E.getMessage().contains("losed"))
{
if (!locked)
{
this.locked=true;
rfos = new RotatingFileOutputStream(config);
this.locked=false;
}
}
}
}
}

the log - these 3 lines in log repeats every time I try to write to file and it seems that this is the only what I get from log when rfos is not writing to file (debug log rotates quickly so I haven;t catched the moment it stopped writing)

If you suggest how, I will log any information received throught callback and can send it to you. But I need what I should log.

Thanks a lot!

@petrmasa
Copy link

Maybe one more thing - if you suggest how to "restore" operation, e.g. when I find the stream closed but rfos not null - how to release the file and "restart" (reinitialize) rfos, it would be solution for me. Having not null rfos and reinitializing by
rfos = new RotatingFileOutputStream(config);
does not work.
Thanks

@vy
Copy link
Owner

vy commented Feb 24, 2022

Thanks for the prompt and elaborate responses, @petrmasa.

File#length() madness

Judging from the following log lines you have shared,

triggering {byteCount=2428461623}
rotation trigger {policy=SizeBasedRotationPolicy{maxByteCount=10000000}, instant=2022-02-23T05:34:56.171219671Z}
empty file, skipping rotation {file=/tmp/abc/app.log}

what is going on is that:

  1. RotatingFileOutputStream#write(byte[] b) gets called.
  2. write() calculates the target size byteCount = stream.size() + b.length and notifies size-sensitive policies. Here b is the input buffer, stream is the internal FileOutputStream wrapped by a ByteCountingOutputStream.
  3. SizeBasedRotationPolicy gets notified, detects the size violation, and calls RotatingFileOutputStream#rotate(RotationPolicy, Instant).
  4. rotate() skips rotation since config.getFile().length() == 0 condition holds.

Now this is weird. stream.size() + b.length returns 2428461623, whereas config.getFile().length() returns 0. stream.size() cannot return a value greater than config.getFile().length(), since it is created as follows:

long size = config.isAppend() ? config.getFile().length() : 0;
return new ByteCountingOutputStream(fileOutputStream, size);

and I assume your b.length is not greater then 2428461623, so... something doesn't add up with File#length(), which is apparently known to be unreliable. I have pushed a smarter readFileLength() change commit for RotatingFileOutputStream. Would you mind giving this a try, please? You can either download and compile the sources yourself or use the Sonatype Snapshot repository.

Compression?

If the above fix doesn't work for you, as a last resort, could you also try with compress(false), please? Do you still observe the problem.

@petrmasa
Copy link

@vy Thanks a lot for solution proposal. I'll ask colleagues to encorporate these sources (I'm not sure how but they will).
Note that turning compression off didn't help. The only change is that zero length file rotated one time.

Where I see potential source of the problem:

  • file has been somehow closed - error it returned (maybe stuck and maybe lock on it??? or something other???), we only tried to open it again (not closing or releasing anything - by rfos = new ...)
  • situation seems to be that (as I read the log) byteCount is being increased by every write attempt but no data is written into file - so file has 0 length as number of written (attempted written) bytes increases

My opinion is that on error I should somehow close/destroy original file handles before initializing new one. But this is only idea.

Thanks a lot!
MM

@vy
Copy link
Owner

vy commented Feb 25, 2022

situation seems to be that (as I read the log) byteCount is being increased by every write attempt but no data is written into file - so file has 0 length as number of written (attempted written) bytes increases

I don't think this can happen, since ByteCountingOutputStream increments the counter once write() succeeds. Though the stream might indeed need to be flushed – I've committed a small fix for that, should be available in the SNAPSHOT artifact within a minute or two.

I will try to put together something for your reopen-on-I/O-error case. Though as you might imagine, I want to get RFOS fixed first rather than handing users workarounds.

@petrmasa
Copy link

petrmasa commented Mar 4, 2022

Hi @vy thanks for code, sorry for late answer, we tried this code and unfortunately the problem still persists. Note that we observed that it became stuck at 1:00:00 AM GMT+1 in all cases (app.log file timestamp) that is 0:00:00 GMT+0. Sometimes it did rotation of old file even it was between 5-6MB, very small or even zero size and rotation was set to 10MB (we observed it several times). Can this fact play a role?

Information from past:
Feb 5 1:00 - last rotated file (with datetime in its name) had 0 bytes

Feb 10 started over, worked fine till Feb 17 1:00 (last rotated file - i.e. file with date and time in its name is approx 5-6MB) and app.log zero length

And then, all files (compressed/uncompressed) at 1:00 typically rotates file which is empty (maybe it somehow resets the file and content).

Maybe it will help.

Note we also looked at scheduled tasks like system backup and it takes absolutely different timeslots.

Thanks a lot

MM

@vy
Copy link
Owner

vy commented Mar 11, 2022

Sometimes it did rotation of old file even it was between 5-6MB, very small or even zero size and rotation was set to 10MB (we observed it several times).

You are telling me that, using the 0.9.5-SNAPSHOT version, you still observe zero-length file rotations. That is, there are app.log and app-202202...log files and app-202202...log has zero bytes! Could you please confirm that I get it right, please? If so, this is baffling given the following preamble in the rotation method:

stream.flush();
if (readFileLength() == 0) {
    LOGGER.debug("empty file, skipping rotation {file={}}", config.getFile());
    return;
}

This clearly indicates we are not able obtain the correct file length from the file system properly.

@petrmasa, shall we have a video call where I can get the picture better? If so, would you mind emailing me which of the following time slots suit you best, please: next Monday, Tuesday, Wednesday, or Thursday at 10:00 CET.

@petrmasa
Copy link

Thanks a lot, we've tried to look what is triggering daily rotation (made many debug logs inside) and probably found the reason

	   config = RotationConfig
		        .builder()
		        .file("/tmp/abc/app.log")
		        .filePattern("/tmp/abc/app-%d{yyyyMMdd-HHmmss.SSS}.log")
		        .policy(new SizeBasedRotationPolicy(1024 * 1024 * 10 ))
		        .compress(true)
		        .policy(DailyRotationPolicy.getInstance())
		        .build();

                          rfos = new RotatingFileOutputStream(config);

We've either copypasted example or maybe colleague ment rotate daily or by size what comes earlier so we had two rotation policies in place. Now we have it in testing and seems to work ok (for last 2 nigths).
Can triggering Daily rotation be a reason to spoil Size Based rotation?

Thanks a lot,

@vy
Copy link
Owner

vy commented Mar 14, 2022

I hope the problem is resolved for you. If you still experience issues, let's try to arrange a video call as I have indicated earlier.

Can triggering Daily rotation be a reason to spoil Size Based rotation?

No, it cannot. They operate in exclusive blocks, any number of rotation policies can (and should!) coexist.

Mind closing the ticket if you don't have issues anymore, please?

@petrmasa
Copy link

Thanks a lot! Currently it is working, if something happens I'll let you know. Thanks a lot for your support.
PM

@petemm123
Copy link
Author

Thanks a lot for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants