Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use '--cursor-after' flag to get recent journal messages #51366

Merged
merged 3 commits into from
Jan 24, 2020

Conversation

williamrandolph
Copy link
Contributor

When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.

Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.

It seems to me that we might be able to use journald's "--since" flag to
retrieve only log messages from the last run, and that this might be
less likely to fail due to race conditions in file deletion.

Unfortunately, it looks as if the "--since" flag has a granularity of
one-second. I've added a two-second sleep to make sure that there's a
sufficient gap between the test that will read from journald and the
test before it.

Specifically, I hope this will fix issues with test90DoNotCloseStderrWhenQuiet in PackageTests. The two error messages related to this issue are:

java.lang.AssertionError: 
Expected: a string containing "Failed to load settings from [elasticsearch.yml]"
     but: was "<<Too large to read: 119812 bytes>>"

and

java.lang.RuntimeException: Command was not successful: [bash -c rm -rf /run/log/journal/*]
   result: exitCode = [1] stdout = [] stderr = [rm: cannot remove '/run/log/journal/1c27adbf8016dcaa47e05e24d8b6a7a7': Directory not empty]

See a build-stats report here, if you have access.

When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.

Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.

It seems to me that we might be able to use journald's "--since" flag to
retrieve only log messages from the last run, and that this might be
less likely to fail due to race conditions in file deletion.

Unfortunately, it looks as if the "--since" flag has a granularity of
one-second. I've added a two-second sleep to make sure that there's a
sufficient gap between the test that will read from journald and the
test before it.
@williamrandolph williamrandolph added >non-issue >test Issues or PRs that are addressing/adding tests :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts v8.0.0 v7.7.0 v7.6.1 v7.5.3 labels Jan 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Packaging)

@@ -342,10 +341,9 @@ public void test90DoNotCloseStderrWhenQuiet() throws Exception {
append(tempConf.resolve("elasticsearch.yml"), "discovery.zen.ping.unicast.hosts:15172.30.5.3416172.30.5.35, 172.30.5.17]\n");

// Make sure we don't pick up the journal entries for previous ES instances.
clearJournal(sh);
String elasticsearchStartTime = sh.run("sleep 2 && date +\"%Y-%m-%d %H:%M:%S\"").stdout.trim();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just adjust the since time by 2 seconds instead of sleeping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that would guarantee a gap between Elasticsearch journal messages that's wider than the one-second granularity that the --since flag gives us.

If we have test A and test B and both check for the same error message, but test B is broken, we want to make sure that test A and test B's journal messages don't occur in the same second. I'm not sure how to guarantee that except by introducing a pause.

Another approach might be finding a way to tail journal messages to a test-specific file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some versions of journalctl provide for navigating output with a cursor; I'm going to see if that looks like something we'll have on all platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a solution that avoids invoking sleep.

Copy link
Contributor

@pugnascotia pugnascotia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better than what we did before 👍

* when instantiated, and advancing that cursor when the {@code clear()}
* method is called.
*/
public static class JournaldWrapper {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: even though this is only invoked in one test on the master branch, I'm going to need it for keystore tests on the feature branch, and it seems like something that could be generally useful later on.

@williamrandolph williamrandolph changed the title Use '--since' flag to get recent journal messages Use '--cursor-after' flag to get recent journal messages Jan 24, 2020
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@williamrandolph williamrandolph merged commit 8b8bff6 into elastic:master Jan 24, 2020
@williamrandolph williamrandolph deleted the os-test-readjournald branch January 24, 2020 20:12
williamrandolph added a commit to williamrandolph/elasticsearch that referenced this pull request Jan 24, 2020
When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.

Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.

Instead, we can use the cursor capablity of journald to retrieve journal
entries that occur only after a certain cursor. This avoids any effort
to interfere with the underlying file operations of journald.
williamrandolph added a commit to williamrandolph/elasticsearch that referenced this pull request Jan 24, 2020
When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.

Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.

Instead, we can use the cursor capablity of journald to retrieve journal
entries that occur only after a certain cursor. This avoids any effort
to interfere with the underlying file operations of journald.
williamrandolph added a commit that referenced this pull request Jan 24, 2020
…1445)

When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.

Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.

Instead, we can use the cursor capablity of journald to retrieve journal
entries that occur only after a certain cursor. This avoids any effort
to interfere with the underlying file operations of journald.
williamrandolph added a commit that referenced this pull request Jan 24, 2020
…1447)

When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.

Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.

Instead, we can use the cursor capablity of journald to retrieve journal
entries that occur only after a certain cursor. This avoids any effort
to interfere with the underlying file operations of journald.
@polyfractal polyfractal added v7.6.0 and removed v7.6.1 labels Feb 7, 2020
@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >non-issue Team:Delivery Meta label for Delivery team >test Issues or PRs that are addressing/adding tests v7.6.0 v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants