Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition to JSON logging for the Linux Dedicated plan #1721

Merged
merged 18 commits into from
Apr 27, 2021

Conversation

davidmrdavid
Copy link
Contributor

@davidmrdavid davidmrdavid commented Mar 11, 2021

Previously, logs on Linux Dedicated were interfacing with the legacy regex endpoint. This meant that some of our logs required extra post-processing for them to behave as expected. With this change, we'll be writing to the new JSON endpoint which should just work.

Also, this change will need to be backported

Issue describing the changes in this PR

N/A

Pull request checklist

  • My changes do not require documentation changes
    • Otherwise: Documentation PR is ready to merge and referenced in pending_docs.md
  • My changes should not be added to the release notes for the next release
    • Otherwise: I've added my notes to release_notes.md
  • My changes do not need to be backported to a previous version
  • I have added all required tests (Unit tests, E2E tests)

@davidmrdavid davidmrdavid marked this pull request as ready for review March 13, 2021 00:20
@davidmrdavid
Copy link
Contributor Author

I tested this on a linux dedicated instance already. Will test again early next week so we can merge this. I believe this needs to be backported to V1 as well though

Copy link
Contributor

@ConnorMcMahon ConnorMcMahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

test/Common/DurableTaskEndToEndTests.cs Outdated Show resolved Hide resolved
test/Common/DurableTaskEndToEndTests.cs Outdated Show resolved Hide resolved
test/Common/TestHelpers.cs Show resolved Hide resolved
@davidmrdavid
Copy link
Contributor Author

Hey @ConnorMcMahon, in my latest commit I refactored the JSON-logging tests to use the WaitUntilTrue approach and also simplify some of their shared logic. I'd love to get a final LGTM on it, thanks!

@davidmrdavid
Copy link
Contributor Author

Also, since this needs to be backported to V1, should we wait on the merge until we have a V1 PR? Just to make sure we don't forget about it.

@ConnorMcMahon
Copy link
Contributor

For V1, just open a ticket, and reference the ticket in this PR (and mention this PR in the ticket). Then you can check off the sub-box in that category.

Copy link
Contributor

@ConnorMcMahon ConnorMcMahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

If we have tested this end-to-end before we start the validation step, I am 100% on board for merging this in for v2.4.2.

@davidmrdavid
Copy link
Contributor Author

Thanks! I don't plan to modify the code further, but this could use just a tad more testing so I won't be merging this for 2.4.2 :)

Copy link
Contributor

@ConnorMcMahon ConnorMcMahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits on tests, but otherwise looks good.

foundRelatedActivityId = true;
}
// recording EventSource providers seen
int eventId = (int)json.GetValue("EventId");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is EventId the best way of identifying provider? I get that these event ids are unique across providers (at least I believe so), but there should be a Provider field that is more conclusive right?

predicate: () =>
{
bool foundEtwEventSourceLog, foundAzureStorageLog, foundDurableTaskCoreLog, foundActivityId, foundRelatedActivityId;
foundEtwEventSourceLog = foundAzureStorageLog = foundDurableTaskCoreLog = foundActivityId = foundRelatedActivityId = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we prioritized fields like activity ID and related activity ID over other fields like "InstanceId", "Partition", "MessageId", "Message", etc?

Also, it may be slightly more performant (and imo easier to read) if you perform checks one by one.

For instance:

var jsonLines = TestHelpers.WriteSafeReadAllLines(LinuxAppServiceLogger.LogginPath).Select(line => JObject.Parse(line));

if (jsonLines.Any(json => !TestHelpers.IsValidJson(json))
{
    return false;
}

if (!jsonLines.Any(json => json.GetValue<string>("ProviderName") == "DurableTask.AzureStorage"))) {
    return false;
}

//etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only reason why I prioritized the ActivityID fields is that, in the past, during development, we had received faulty logs which missed them. I realize all fields are relevant, this test is just particularly informed by a previous bug. To summarize: it was an arbitrary decision.

I also find it surprising that doing a .Any() test might be faster than the current approach, since with the current approach we only iterate through the data once. Regardless, since this is a test, I do think readability beats performance, so I'll incorporate your suggestion :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to say whether it is more performant or not.

My expectation is right now we are doing ~5x checks over 1 iteration, and we have to complete the entire iteration before returning.

With this approach, while we are iterating 5 times, which introduces overhead, we are only doing 1x check per iteration, so in the case where one of the earlier cases fails, we catch it earlier and don't have to do subsequent iterations/checks.

.Any() also shortcircuits as soon as any value matches that condition, so if the logs have 50 lines, and all the conditions are met by the first 10 lines, then it should in theory be faster.

Overall, like we said though, this is a test, so perf isn't the top priority (at least not in the order of a few million CPU cycles).

@davidmrdavid
Copy link
Contributor Author

davidmrdavid commented Apr 26, 2021

@ConnorMcMahon, only requesting a re-review since I changed the testing code after your last approval. A quick skim should do. Thanks!

// variable below is internal static for testing and other convenient purposes
// we need to be able to change the logging path for a windows-based CI
// the logger being internal static is convenient for flushing it
#pragma warning disable SA1401 // Fields should be private
internal static string LoggingPath = "/var/log/functionsLogs/durableevents.log";
internal static string LoggingPath = "/var/log/functionsLogs/durableeventsJSON.log";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we merge, we should just 100% verify that this is indeed the file path listened to by linux app services.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed and e2e tested

Copy link
Contributor

@ConnorMcMahon ConnorMcMahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants