-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use '--cursor-after' flag to get recent journal messages #51366
Use '--cursor-after' flag to get recent journal messages #51366
Conversation
When we get Elasticsearch logs from journald, we want to fetch only log messages from the last run. There are two reasons for this. First, if there are many logs, we might get a string that's too large for our utility methods. Second, when we're looking for a specific message or error, we almost certainly want to look only at messages from the last execution. Previously, we've been trying to do this by clearing out the physical files under the journald process. But there seems to be some contention over these directories: if journald writes a log file in between when our deletion command deletes the file and when it deletes the log directory, the deletion will fail. It seems to me that we might be able to use journald's "--since" flag to retrieve only log messages from the last run, and that this might be less likely to fail due to race conditions in file deletion. Unfortunately, it looks as if the "--since" flag has a granularity of one-second. I've added a two-second sleep to make sure that there's a sufficient gap between the test that will read from journald and the test before it.
Pinging @elastic/es-core-infra (:Core/Infra/Packaging) |
@@ -342,10 +341,9 @@ public void test90DoNotCloseStderrWhenQuiet() throws Exception { | |||
append(tempConf.resolve("elasticsearch.yml"), "discovery.zen.ping.unicast.hosts:15172.30.5.3416172.30.5.35, 172.30.5.17]\n"); | |||
|
|||
// Make sure we don't pick up the journal entries for previous ES instances. | |||
clearJournal(sh); | |||
String elasticsearchStartTime = sh.run("sleep 2 && date +\"%Y-%m-%d %H:%M:%S\"").stdout.trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just adjust the since time by 2 seconds instead of sleeping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would guarantee a gap between Elasticsearch journal messages that's wider than the one-second granularity that the --since
flag gives us.
If we have test A and test B and both check for the same error message, but test B is broken, we want to make sure that test A and test B's journal messages don't occur in the same second. I'm not sure how to guarantee that except by introducing a pause.
Another approach might be finding a way to tail journal messages to a test-specific file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some versions of journalctl provide for navigating output with a cursor; I'm going to see if that looks like something we'll have on all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have a solution that avoids invoking sleep
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better than what we did before 👍
* when instantiated, and advancing that cursor when the {@code clear()} | ||
* method is called. | ||
*/ | ||
public static class JournaldWrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: even though this is only invoked in one test on the master branch, I'm going to need it for keystore tests on the feature branch, and it seems like something that could be generally useful later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When we get Elasticsearch logs from journald, we want to fetch only log messages from the last run. There are two reasons for this. First, if there are many logs, we might get a string that's too large for our utility methods. Second, when we're looking for a specific message or error, we almost certainly want to look only at messages from the last execution. Previously, we've been trying to do this by clearing out the physical files under the journald process. But there seems to be some contention over these directories: if journald writes a log file in between when our deletion command deletes the file and when it deletes the log directory, the deletion will fail. Instead, we can use the cursor capablity of journald to retrieve journal entries that occur only after a certain cursor. This avoids any effort to interfere with the underlying file operations of journald.
When we get Elasticsearch logs from journald, we want to fetch only log messages from the last run. There are two reasons for this. First, if there are many logs, we might get a string that's too large for our utility methods. Second, when we're looking for a specific message or error, we almost certainly want to look only at messages from the last execution. Previously, we've been trying to do this by clearing out the physical files under the journald process. But there seems to be some contention over these directories: if journald writes a log file in between when our deletion command deletes the file and when it deletes the log directory, the deletion will fail. Instead, we can use the cursor capablity of journald to retrieve journal entries that occur only after a certain cursor. This avoids any effort to interfere with the underlying file operations of journald.
…1445) When we get Elasticsearch logs from journald, we want to fetch only log messages from the last run. There are two reasons for this. First, if there are many logs, we might get a string that's too large for our utility methods. Second, when we're looking for a specific message or error, we almost certainly want to look only at messages from the last execution. Previously, we've been trying to do this by clearing out the physical files under the journald process. But there seems to be some contention over these directories: if journald writes a log file in between when our deletion command deletes the file and when it deletes the log directory, the deletion will fail. Instead, we can use the cursor capablity of journald to retrieve journal entries that occur only after a certain cursor. This avoids any effort to interfere with the underlying file operations of journald.
…1447) When we get Elasticsearch logs from journald, we want to fetch only log messages from the last run. There are two reasons for this. First, if there are many logs, we might get a string that's too large for our utility methods. Second, when we're looking for a specific message or error, we almost certainly want to look only at messages from the last execution. Previously, we've been trying to do this by clearing out the physical files under the journald process. But there seems to be some contention over these directories: if journald writes a log file in between when our deletion command deletes the file and when it deletes the log directory, the deletion will fail. Instead, we can use the cursor capablity of journald to retrieve journal entries that occur only after a certain cursor. This avoids any effort to interfere with the underlying file operations of journald.
When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.
Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.
It seems to me that we might be able to use journald's "--since" flag to
retrieve only log messages from the last run, and that this might be
less likely to fail due to race conditions in file deletion.
Unfortunately, it looks as if the "--since" flag has a granularity of
one-second. I've added a two-second sleep to make sure that there's a
sufficient gap between the test that will read from journald and the
test before it.
Specifically, I hope this will fix issues with
test90DoNotCloseStderrWhenQuiet
inPackageTests
. The two error messages related to this issue are:and
See a build-stats report here, if you have access.