Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binlogs Redacting support + Binlogs forward-compatibility reading support #9307

Merged
merged 52 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
a54016c
Add embedded content transfering and build events binary equality
JanKrivanek Sep 12, 2023
1ed8999
Added support for embedded files; fixing, cleaning
JanKrivanek Sep 15, 2023
699e5d4
Decrease scope where accidentally exposed
JanKrivanek Sep 18, 2023
2a75fb6
Sealing types
JanKrivanek Sep 19, 2023
2b83376
Adjust visibility, add comments
JanKrivanek Sep 20, 2023
b1c340e
Cache encoding
JanKrivanek Sep 21, 2023
cdca4e8
Fix code after merge
JanKrivanek Oct 2, 2023
c8bf570
Fix after merge
JanKrivanek Oct 4, 2023
b51f320
Add Binlog BuildEvents offsets for forward compatibility and event bu…
JanKrivanek Sep 22, 2023
5ae9bb1
Add tests and fixes for forward compatibility reading
JanKrivanek Oct 5, 2023
4726733
Bugfix failing tests
JanKrivanek Oct 6, 2023
d4e72f6
Adjust unit test to compare unpacked binlogs
JanKrivanek Oct 6, 2023
3d947b8
Add version description
JanKrivanek Oct 9, 2023
dd6b0ad
Reflecting PR comments
JanKrivanek Oct 10, 2023
0d2317e
Add support for minimum supported reader version
JanKrivanek Oct 10, 2023
651ab34
Support ProjectImports=ZipFile in Replay mode
JanKrivanek Oct 11, 2023
37f74fb
Add more granular error information about recoverable errors
JanKrivanek Oct 11, 2023
0741167
Fix tests
JanKrivanek Oct 11, 2023
d328c42
Localizing strings
JanKrivanek Oct 11, 2023
79339e1
Add tests and documentation for the forward compatibility reading
JanKrivanek Oct 20, 2023
9c2f863
Reflect review comments
JanKrivanek Oct 20, 2023
ab7ca51
Add/Improve tests
JanKrivanek Oct 23, 2023
2b57002
Fix test
JanKrivanek Oct 23, 2023
bb20a69
Improve test
JanKrivanek Oct 23, 2023
cfae1f7
Reflect code review comments
JanKrivanek Oct 24, 2023
4fa413a
Comment removal
JanKrivanek Oct 24, 2023
88c1113
Auto-apply suggestions from code review
JanKrivanek Oct 30, 2023
f3b2088
Apply code review suggestions
JanKrivanek Oct 30, 2023
d9eed77
Rework and simplify the ArchiveFile OM and event
JanKrivanek Oct 30, 2023
9b2cbc2
Clarified comment
JanKrivanek Oct 30, 2023
330f842
Improve documenting comments
JanKrivanek Oct 30, 2023
3b1e76b
Shouldly update adjust
JanKrivanek Nov 2, 2023
3e175ef
Reflect review comments
JanKrivanek Nov 3, 2023
709cdac
Fix issues after integrating with redactor
JanKrivanek Nov 14, 2023
4e2af43
Apply suggestions from code review
JanKrivanek Nov 14, 2023
2a65f64
Merge branch 'proto/binlog-events-offsets' of https://github.com/JanK…
JanKrivanek Nov 14, 2023
986b2f2
Reflect PR comments
JanKrivanek Nov 14, 2023
8ae922b
Reflect PR feedback
JanKrivanek Nov 15, 2023
7655a08
Merge remote-tracking branch 'upstream/main' into proto/binlog-events…
JanKrivanek Nov 23, 2023
95cada7
Regenerate resources
JanKrivanek Nov 23, 2023
2e8df18
Add suppressions for extending the newly introduced interface
JanKrivanek Nov 23, 2023
f5c654b
Reflect PR commnets
JanKrivanek Nov 23, 2023
0c19bfd
Bugfixes
JanKrivanek Nov 23, 2023
bb7696e
Fix log wording
JanKrivanek Dec 8, 2023
61e933d
Cleaning up the api (with single batch of breaking changes in the fea…
JanKrivanek Dec 8, 2023
eb7933a
Preserve version on raw events replay
JanKrivanek Dec 11, 2023
0301224
Reflecting PR comments
JanKrivanek Dec 21, 2023
3c73e15
Merge remote-tracking branch 'upstream/main' into proto/binlog-events…
JanKrivanek Dec 21, 2023
fce1a77
Reintroduce suppressions for the binlog forward compatible reading API
JanKrivanek Dec 21, 2023
b97a080
Fix raw replay and redact perf
JanKrivanek Dec 27, 2023
ad4bf54
Forward compatible reading is off by default
JanKrivanek Jan 9, 2024
38f5791
Bugfix the reading of old version of logs
JanKrivanek Jan 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 96 additions & 2 deletions documentation/wiki/Binary-Log.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ It is a `GZipStream`-compressed binary stream of serialized `BuildEventArgs` obj

## Incrementing the file format

Every .binlog file has the first three bytes that indicate the file version. The current file format version is 9 (`00 00 09`).
Every .binlog file has the first four bytes that indicate the file version. The current file format is indicated in [`BinaryLogger.cs`](/src/Build/Logging/BinaryLogger/BinaryLogger.cs).

When incrementing the file format, keep this in mind:
* Increment the version and add a summary of the changes: https://github.com/dotnet/msbuild/blob/main/src/Build/Logging/BinaryLogger/BinaryLogger.cs#L22
Expand All @@ -94,4 +94,98 @@ When incrementing the file format, keep this in mind:

The format is backwards compatible, i.e. MSBuild will be able to play back .binlog files created with an older version of MSBuild. The Viewer will also be able to open files of any older version. Since the viewer updates automatically and I can push out updates easily, we can consider the Viewer is always able to read all .binlogs.

However MSBuild of version 15.3 won't be able to read .binlogs created with MSBuild version 15.6. This means the format is unfortunately not forwards-compatible. It is not self-describing, i.e. it doesn't carry its schema around for performance and compactness reasons. This is not a problem with a Viewer because Viewer is always up-to-date (there isn't an "old version" of the Viewer unless people go to great lengths to prevent it from auto-updating).
## Forward compatibility reading

From version 18, the binlog contains as well the minimum version of reader that can interpret it (stored in bytes 4 to 8). Support for best effort forward compatibility is added by this version. It is “best effort” only because the binlog format is not self-describing, i.e. it doesn't carry its schema around for performance and compactness reasons.

This is not of a high importance for users of the Viewer because Viewer is always up-to-date (there isn't an "old version" of the Viewer unless people go to great lengths to prevent it from auto-updating).

## Reading API

We recommend usage of `BinaryLogReplayEventSource`. It provides simplified helpers for creating and configuring `BuildEventArgsReader` and subscribing to the events.

```csharp
var logReader = new BinaryLogReplayEventSource()
{
AllowForwardCompatibility = true
};

// Handling of the structured events contained within the log
logReader.AnyEventRaised += (_, e) =>
{
if (e is BuildErrorEventArgs error)
{
//...
}

// ...
};

// Starts the synchronous log reading loop.
logReader.Replay(path_to_binlog_file);

```

### Handling the recoverable reading errors

In compatibility mode (default for `BinaryLogReplayEventSource`. Only supported for binlogs of version 18 and higher) reader is capable of skipping unknown event types and unknown parts of known events (`BuildEventArgsReader` can configure the behavior via 2 separate properties - `SkipUnknownEvents` and `SkipUnknownEventParts`).

The unknown events and event parts are regarded as recoverable errors, since the reader is able to continue reading subsequent records in the binlog. However the specific user logic should have the last call in deciding whether errors are really recoverable (e.g. is presence of unrecognized or unparseable event ok? It might be fine when searching only for specific events - e.g. errors but not acceptable when trying to provide definitive overview of the built).

To allow the calling code to decide - based on the type of error, type of events getting the error, or the number of errors - the `RecoverableReadError` event is exposed (from both `BinaryLogReplayEventSource` and `BuildEventArgsReader`).

```csharp
/// <summary>
/// An event args for <see cref="IBinaryLogReaderErrors.RecoverableReadError"/> event.
/// </summary>
public sealed class BinaryLogReaderErrorEventArgs : EventArgs
{
/// <summary>
/// Type of the error that occurred during reading.
/// </summary>
public ReaderErrorType ErrorType { get; }

/// <summary>
/// Kind of the record that encountered the error.
/// </summary>
public BinaryLogRecordKind RecordKind { get; }

/// <summary>
/// Materializes the error message.
/// Until it's called the error message is not materialized and no string allocations are made.
/// </summary>
/// <returns>The error message.</returns>
public string GetFormattedMessage() => _formatErrorMessage();
}

/// <summary>
/// Receives recoverable errors during reading.
/// Communicates type of the error, kind of the record that encountered the error and the message detailing the error.
/// In case of <see cref="ReaderErrorType.UnknownEventData"/> this is raised before returning the structured representation of a build event
/// that has some extra unknown data in the binlog. In case of other error types this event is raised and the offending build event is skipped and not returned.
/// </summary>
event Action<BinaryLogReaderErrorEventArgs>? RecoverableReadError;
```

Our sample usage of the [Reading API](#reading-api) can be enhanced with recoverable errors handling e.g. as such:

```csharp

// Those can be raised only during forward compatibility reading mode.
logReader.RecoverableReadError += errorEventArgs =>
{
// ...

// e.g. we can decide to ignore the error and continue reading or break reading
// based on the type of the error or/and type of the record or/and the frequency of the error

// Would we decide to completely ignore some errors - we can aid better performance by not materializing the actual error message.
// Otherwise the error message can be materialized via the provided method on the event argument:
Console.WriteLine($"Recoverable reader error: {errorEventArgs.GetFormattedMessage()}");
JanKrivanek marked this conversation as resolved.
Show resolved Hide resolved
};

```

When authoring changes to the specific BuildEventArg types - it is always strongly recommended to **prefer append-only changes**.

This prevents the possibility of collision where some fields are removed in one version and then different fields with same binary size are added in future version. Such a sequence of format changes might not be caught by the decoder and might lead to unnoticed corrupt interpretation of data. For this reason the author of specific OM changes should always check whether there is a possibility of unrecognizable format collision (same binary size, different representation) within binlog versions of a same [minimum reader version support](#forward-compatibility-reading). If this is possible, the [minimum reader version support](#forward-compatibility-reading) should be incremented.
188 changes: 183 additions & 5 deletions src/Build.UnitTests/BinaryLogger_Tests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Reflection;
using System.Text;

using FluentAssertions;
using Microsoft.Build.BackEnd.Logging;
using Microsoft.Build.Evaluation;
using Microsoft.Build.Execution;
Expand Down Expand Up @@ -83,10 +85,21 @@ public BinaryLoggerTests(ITestOutputHelper output)
_logFile = _env.ExpectFile(".binlog").Path;
}

public enum BinlogRoundtripTestReplayMode
{
NoReplay,
Structured,
RawEvents
}

[Theory]
[InlineData(s_testProject)]
[InlineData(s_testProject2)]
public void TestBinaryLoggerRoundtrip(string projectText)
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.NoReplay)]
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.Structured)]
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.RawEvents)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.NoReplay)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.Structured)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.RawEvents)]
public void TestBinaryLoggerRoundtrip(string projectText, BinlogRoundtripTestReplayMode replayMode)
{
var binaryLogger = new BinaryLogger();

Expand All @@ -110,6 +123,36 @@ public void TestBinaryLoggerRoundtrip(string projectText)
project.Build(new ILogger[] { binaryLogger, mockLogFromBuild, serialFromBuild, parallelFromBuild }).ShouldBeTrue();
}

string fileToReplay;
switch (replayMode)
{
case BinlogRoundtripTestReplayMode.NoReplay:
fileToReplay = _logFile;
break;
case BinlogRoundtripTestReplayMode.Structured:
case BinlogRoundtripTestReplayMode.RawEvents:
{
var logReader = new BinaryLogReplayEventSource();
fileToReplay = _env.ExpectFile(".binlog").Path;
if (replayMode == BinlogRoundtripTestReplayMode.Structured)
{
// need dummy handler to force structured replay
logReader.BuildFinished += (_, _) => { };
}

BinaryLogger outputBinlog = new BinaryLogger()
{
Parameters = fileToReplay
};
outputBinlog.Initialize(logReader);
logReader.Replay(_logFile);
outputBinlog.Shutdown();
}
break;
default:
throw new ArgumentOutOfRangeException(nameof(replayMode), replayMode, null);
}

var mockLogFromPlayback = new MockLogger();

var serialFromPlaybackText = new StringBuilder();
Expand All @@ -126,7 +169,10 @@ public void TestBinaryLoggerRoundtrip(string projectText)
parallelFromPlayback.Initialize(binaryLogReader);

// read the binary log and replay into mockLogger2
binaryLogReader.Replay(_logFile);
binaryLogReader.Replay(fileToReplay);
mockLogFromPlayback.Shutdown();
serialFromPlayback.Shutdown();
parallelFromPlayback.Shutdown();

// the binlog will have more information than recorded by the text log
mockLogFromPlayback.FullLog.ShouldContainWithoutWhitespace(mockLogFromBuild.FullLog);
Expand All @@ -140,6 +186,138 @@ public void TestBinaryLoggerRoundtrip(string projectText)
parallelActual.ShouldContainWithoutWhitespace(parallelExpected);
}

/// <summary>
/// This test validate that binlog file content is identical upon replaying.
/// The identity can be defined via 3 ways:
/// * byte-for-byte equality
/// * byte-for-byte equality of unzipped content
/// * structured equality of events
///
/// They are ordered by their strength (the byte-for-byte equality implies the other two, etc.),
/// but we mainly care about the structured equality. If the more strong equalities are broken -
/// the assertions can be simply removed.
/// However the structured equality is important - it guarantees that binlog reading and writing functionality
/// is not dropping or altering any information.
/// </summary>
/// <param name="projectText"></param>
/// <param name="replayMode"></param>
[Theory]
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.Structured)]
[InlineData(s_testProject, BinlogRoundtripTestReplayMode.RawEvents)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.Structured)]
[InlineData(s_testProject2, BinlogRoundtripTestReplayMode.RawEvents)]
public void TestBinaryLoggerRoundtripEquality(string projectText, BinlogRoundtripTestReplayMode replayMode)
JanKrivanek marked this conversation as resolved.
Show resolved Hide resolved
{
var binaryLogger = new BinaryLogger();

binaryLogger.Parameters = _logFile;

// build and log into binary logger
using (ProjectCollection collection = new())
{
Project project = ObjectModelHelpers.CreateInMemoryProject(collection, projectText);
// make sure the project file makes it to the binlog (it has file existence check)
File.WriteAllText(project.FullPath, projectText);
project.Build(new ILogger[] { binaryLogger }).ShouldBeTrue();
File.Delete(project.FullPath);
}

var logReader = new BinaryLogReplayEventSource();
string replayedLogFile = _env.ExpectFile(".binlog").Path;
if (replayMode == BinlogRoundtripTestReplayMode.Structured)
{
// need dummy handler to force structured replay
logReader.BuildFinished += (_, _) => { };
}

BinaryLogger outputBinlog = new BinaryLogger()
{
Parameters = $"LogFile={replayedLogFile};OmitInitialInfo"
};
outputBinlog.Initialize(logReader);
logReader.Replay(_logFile);
outputBinlog.Shutdown();

AssertBinlogsHaveEqualContent(_logFile, replayedLogFile);
// If this assertation complicates development - it can possibly be removed
// The structured equality above should be enough.
AssertFilesAreBinaryEqualAfterUnpack(_logFile, replayedLogFile);
}

private static void AssertFilesAreBinaryEqualAfterUnpack(string firstPath, string secondPath)
{
using var br1 = BinaryLogReplayEventSource.OpenReader(firstPath);
using var br2 = BinaryLogReplayEventSource.OpenReader(secondPath);
const int bufferSize = 4096;

int readCount = 0;
while (br1.ReadBytes(bufferSize) is { Length: > 0 } bytes1)
{
var bytes2 = br2.ReadBytes(bufferSize);

bytes1.SequenceEqual(bytes2).ShouldBeTrue(
$"Buffers starting at position {readCount} differ. First:{Environment.NewLine}{string.Join(",", bytes1)}{Environment.NewLine}Second:{Environment.NewLine}{string.Join(",", bytes2)}");
readCount += bufferSize;
}

br2.ReadBytes(bufferSize).Length.ShouldBe(0, "Second buffer contains bytes after first file end");
}

private static void AssertBinlogsHaveEqualContent(string firstPath, string secondPath)
{
using var reader1 = BinaryLogReplayEventSource.OpenBuildEventsReader(firstPath);
using var reader2 = BinaryLogReplayEventSource.OpenBuildEventsReader(secondPath);

Dictionary<string, string> embedFiles1 = new();
Dictionary<string, string> embedFiles2 = new();

reader1.ArchiveFileEncountered += arg
=> AddArchiveFile(embedFiles1, arg);

// This would be standard subscribe:
// reader2.ArchiveFileEncountered += arg
// => AddArchiveFile(embedFiles2, arg);

// We however use the AddArchiveFileFromStringHandler - to exercise it
// and to assert it's equality with ArchiveFileEncountered handler
string currentFileName = null;
reader2.ArchiveFileEncountered +=
((Action<StringReadEventArgs>)AddArchiveFileFromStringHandler).ToArchiveFileHandler();

int i = 0;
while (reader1.Read() is { } ev1)
{
i++;
var ev2 = reader2.Read();

ev1.Should().BeEquivalentTo(ev2,
$"Binlogs ({firstPath} and {secondPath}) should be equal at event {i}");
}
// Read the second reader - to confirm there are no more events
// and to force the embedded files to be read.
reader2.Read().ShouldBeNull($"Binlogs ({firstPath} and {secondPath}) are not equal - second has more events >{i + 1}");

Assert.Equal(embedFiles1, embedFiles2);

void AddArchiveFile(Dictionary<string, string> files, ArchiveFileEventArgs arg)
{
ArchiveFile embedFile = arg.ArchiveData.ToArchiveFile();
files.Add(embedFile.FullPath, embedFile.Content);
}

void AddArchiveFileFromStringHandler(StringReadEventArgs args)
{
if (currentFileName == null)
{
currentFileName = args.OriginalString;
return;
}

embedFiles2.Add(currentFileName, args.OriginalString);
currentFileName = null;
}
}

[Fact]
public void BinaryLoggerShouldSupportFilePathExplicitParameter()
{
Expand Down
Loading
Loading