Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Simplify file comparisons #112295

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/tasks/Common/FileCache.cs
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ public FileCache(string? cacheFilePath, TaskLoggingHelper log)
Enabled = true;
if (File.Exists(cacheFilePath))
{
_oldCache = JsonSerializer.Deserialize<CompilerCache>(File.ReadAllText(cacheFilePath), s_jsonOptions);
using FileStream fs = File.OpenRead(cacheFilePath);
_oldCache = JsonSerializer.Deserialize<CompilerCache>(fs, s_jsonOptions);
}

_oldCache ??= new();
Expand Down
81 changes: 78 additions & 3 deletions src/tasks/Common/Utils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -214,14 +214,89 @@ public static (int, string) TryRunProcess(
return (process.ExitCode, outputBuilder.ToString().Trim('\r', '\n'));
}

internal abstract class Readable<TContent, TReader> :
IDisposable
where TContent : IEquatable<TContent>
where TReader : IDisposable
{
protected abstract int Read(TContent[] buffer, int offset, int count);
protected TReader _reader;
protected Readable(TReader reader) => _reader = reader;
public void Dispose() => _reader.Dispose();

protected static bool ContentEqual(Readable<TContent, TReader> streamA, Readable<TContent, TReader> streamB, int bufferSize)
{
TContent[] bufferA = new TContent[bufferSize];
TContent[] bufferB = new TContent[bufferSize];

int readA = 0;
int readB = 0;
int consumedA = 0;
int consumedB = 0;

while (true)
{
if (consumedA == readA)
{
readA = streamA.Read(bufferA, 0, bufferSize);
consumedA = 0;
}

if (consumedB == readB)
{
readB = streamB.Read(bufferB, 0, bufferSize);
consumedB = 0;
}

if (readA == 0 && readB == 0)
return true;

if (readA == 0 || readB == 0)
return false;

int overlap = Math.Min(readA - consumedA, readB - consumedB);
if (!bufferA.AsSpan(consumedA, overlap).SequenceEqual(bufferB.AsSpan(consumedB, overlap)))
return false;

consumedA += overlap;
consumedB += overlap;
}
}
}

private sealed class TextFileComparer : Readable<char, StreamReader>
{
private const int _charCount = 8192 / sizeof(char);
private TextFileComparer(string path) : base(new StreamReader(path, Encoding.UTF8, true, _charCount)) { }
protected override int Read(char[] buffer, int offset, int count) => _reader.Read(buffer, offset, count);
public static bool ContentEqual(string filePath1, string filePath2)
{
using var streamA = new TextFileComparer(filePath1);
using var streamB = new TextFileComparer(filePath2);
return ContentEqual(streamA, streamB, bufferSize: _charCount);
}
}

private sealed class BinaryFileComparer : Readable<byte, FileStream>
{
private const int _byteCount = 8192;
private BinaryFileComparer(string path) : base (new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read,
bufferSize: _byteCount, FileOptions.SequentialScan)) {}
protected override int Read(byte[] buffer, int offset, int count) => _reader.Read(buffer, offset, count);
public static bool ContentEqual(string filePath1, string filePath2)
{
using var streamA = new BinaryFileComparer(filePath1);
using var streamB = new BinaryFileComparer(filePath2);
return ContentEqual(streamA, streamB, bufferSize: _byteCount);
}
}

public static bool CopyIfDifferent(string src, string dst, bool useHash)
{
if (!File.Exists(src))
throw new ArgumentException($"Cannot find {src} file to copy", nameof(src));

bool areDifferent = !File.Exists(dst) ||
(useHash && ComputeHash(src) != ComputeHash(dst)) ||
(File.ReadAllText(src) != File.ReadAllText(dst));
Comment on lines -222 to -224
Copy link
Member Author

@lewing lewing Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition appears to be incorrect given that even when useHash is true and they the hashes differ it will fall back to checking the text. It can either be dropped completely to compare the text or it can skip the text check if use hash is true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think the intention was to use the (fast) hash comparison and only fallback to comparing the text if the hash differs.

so this could likely be

bool areSame = File.Exists(dst) && (useHash ? ComputeHash(src) != ComputeHash(dst) : File.ReadAllText(src) != File.ReadAllText(dst);

but I wonder too why we don't just always compare the hash.

Copy link
Member Author

@lewing lewing Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but the way it is implemented it does the following:

  1. reading the file completely to create the hash for both files so there is no benefit to using a (cryptographic) hash rather than just comparing the bytes. (I suppose If the stream hashing is optimized on framework where we aren't unsing the optimized SequenceEquals from core there could be some value there)
  2. always falling back to comparing the decoded string if the hash comparison fails which means in the casse where the files differ and useHash is true we'll read the entirety of both files twice.

So if it was just using ReadAllText as simple way to compare bytes we don't need to do that at all. If on the other hand there was some value in not overwriting the file if the decoded text rather than the bytes then we can just always compare Text. In either case the PR will avoid making large file sized allocations over and over again.

bool areDifferent = !File.Exists(dst) || (useHash && !BinaryFileComparer.ContentEqual(src, dst)) || (!useHash && !TextFileComparer.ContentEqual(src, dst));

if (areDifferent)
File.Copy(src, dst, true);
Expand Down
4 changes: 2 additions & 2 deletions src/tasks/WasmAppBuilder/WasmAppBuilderBaseTask.cs
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ protected virtual void UpdateRuntimeConfigJson()
if (matchingAssemblies.Length > 1)
throw new LogAsErrorException($"Found more than one assembly matching the main assembly name {MainAssemblyName}: {string.Join(",", matchingAssemblies)}");

var rootNode = JsonNode.Parse(File.ReadAllText(RuntimeConfigJsonPath),
new JsonNodeOptions { PropertyNameCaseInsensitive = true });
using FileStream rcs = File.OpenRead(RuntimeConfigJsonPath);
var rootNode = JsonNode.Parse(rcs, new JsonNodeOptions { PropertyNameCaseInsensitive = true });
if (rootNode == null)
throw new LogAsErrorException($"Failed to parse {RuntimeConfigJsonPath}");

Expand Down
Loading