Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please support a modern very high performance version of System.IO.Log #24038

Closed
AceHack opened this issue Nov 3, 2017 · 13 comments
Closed

Please support a modern very high performance version of System.IO.Log #24038

AceHack opened this issue Nov 3, 2017 · 13 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-Meta untriaged New issue has not been triaged by the area owner
Milestone

Comments

@AceHack
Copy link

AceHack commented Nov 3, 2017

I'm looking for an very high performance version Algorithms for Recovery and Isolation Exploiting Semantics (ARIES) like .net standard/core namspace. It does not need to be exactly ARIES but would need to be sufficent to be the start of a number of different data oriented projects that choose .NET. I would use this and I think others would use this to start a new eco system of databases, no sql, messaging, streaming and data oriented .NET core open source projects.

@AceHack
Copy link
Author

AceHack commented Nov 3, 2017

Basically what I'm looking for is the best Write Ahead Log (WAL) possible on a cross platform language.

@karelz
Copy link
Member

karelz commented Nov 3, 2017

Your request seems to be reasonable building component for database-like systems. However, it does not seem to fit the scope of CoreFX/BCL and would be IMO better served as independent library.

In general we consider extending CoreFX with APIs which:

  1. Extend existing APIs/libraries
  2. Expose low-level runtime capabilities (e.g. Threading)
  3. Would be used or exposed by other BCL/CoreFX libraries

See #22228 for examples of APIs we consider out of scope of CoreFX.

@AceHack
Copy link
Author

AceHack commented Nov 3, 2017

There are a few reasons I think this belongs to CoreFX/BCL instead of a library.

  1. It was part of BCL of full .NET framework on windows (This is not a very good reason)
  2. It requires low level below FileStream access to things like overlapped unbuffered IO, I/O completion ports or maybe just the native win32 apis for CLFS (Common Log File System) and on Linux would require below filestream access as well for best performace and true durable compliance with things like write though and no caching so you can be sure things are commited and durable and not in buffers either hardward or software. Also flushing occasionally is not an option because it's so SLOW to operate in that mannor.
  3. To really have the base needed for the performance, quality, and durability guarantees having a high quality team like MS be the curators would be wonderful.
  4. If SQL server would be willing to share some techniques (maybe not) that could provide some latest techniques for high quality transaction log.

Thanks.

@AceHack
Copy link
Author

AceHack commented Nov 4, 2017

Here is some example low level code that would likely be used on windows. This is not a full WAL or ARIES yet but I image it would use a lot of the same constructs. You can see how many low level OS constructs are required and one reason why I think it would be best to be in CoreFX.

This code is to allow for fast sequential writes to a file while making sure every write is "commited" and durable all the way though all buffers including hardware without the need for constant flushing. It also does a lot of work to create page alligned memory buffers i.e. the virtual alloc buffer and figure the sector size of the disk. All of these things are needed for optimal write performance.

Without a better abstraction layer for library writers like mono provided writing new cross platform libraries that need to get into the guts of the different OSs involved is much harder with CoreFX. See https://github.com/dotnet/coreclr/issues/930 for examples of the kind of abstractions I'm talking about.

Also if somehow FileStream is much better in .NET core than full .NET and will allow fast sequential writes while making sure there are no buffers in between all the way to the hardware, page alligned writes, figuring out sector size, then I would love to understand better. Thanks.

Here is some more info on why page and sector alignment are so important.
http://programmingaddicted.blogspot.com/2011/05/unbuffered-overlapped-io-in-net.html
https://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
https://arxiv.org/ftp/cs/papers/0502/0502012.pdf
http://www.installsetupconfig.com/win32programming/windowsfileapis4_13.html

    [SecuritySafeCritical]
    public sealed class UnbufferedFile : IDisposable
    {
        [SecurityCritical]
        private readonly SafeFileHandle _safeFileHandle;
        private readonly int _sectorSize;
        private long _fileSize;
        private volatile bool _disposed;
        public SafeFileHandle SafeHandle
        {
            [SecurityCritical]
            [SecurityPermission(SecurityAction.InheritanceDemand, Flags = SecurityPermissionFlag.UnmanagedCode)]
            get { return _safeFileHandle; }
        }
        public int SectorSize { get { return _sectorSize; } }
        [SecuritySafeCritical]
        public UnbufferedFile(string path)
        {
            _sectorSize = GetSectorSize(path);
            _safeFileHandle = WinAPI.CreateFile(path, FileAccess.ReadWrite, FileShare.None, IntPtr.Zero, FileMode.OpenOrCreate, FileOptions.WriteThrough | FileOptions.Asynchronous | WinAPI.NoBuffering, IntPtr.Zero);
            if (_safeFileHandle.IsInvalid)
                throw new Win32Exception();
            bool successfulBind = false;
            try
            {
                _fileSize = GetFileSize();
                successfulBind = ThreadPool.BindHandle(_safeFileHandle);
            }
            finally
            {
                if (!successfulBind)
                    _safeFileHandle.Close();
            }
            if (!successfulBind)
                throw new IOException("Binding to handle failed.");
        }
        [SecuritySafeCritical]
        public void Dispose()
        {
            _disposed = true;
            if (_safeFileHandle != null) _safeFileHandle.Dispose();
        }
        public long Length
        {
            [SecuritySafeCritical]
            get
            {
                if (_disposed)
                    throw new ObjectDisposedException("UnbufferedFile", "UnbufferedFile already disposed.");
                return _fileSize;
            }
            [SecuritySafeCritical]
            set
            {
                if (_disposed)
                    throw new ObjectDisposedException("UnbufferedFile", "UnbufferedFile already disposed.");
                if (value < 0L)
                    throw new ArgumentOutOfRangeException("value", "File size must be positive");
                if (!WinAPI.SetFilePointerEx(_safeFileHandle, value, IntPtr.Zero, SeekOrigin.Begin))
                    throw new Win32Exception();
                if (!WinAPI.SetEndOfFile(_safeFileHandle))
                    throw new Win32Exception();
                _fileSize = value;
            }
        }
        [SecuritySafeCritical]
        public Task ReadAsync(long position, VirtualAllocBuffer buffer, long offset, int count)
        {
            return ReadAsync(position, buffer, offset, count, new CancellationToken());
        }
        [SecuritySafeCritical]
        public Task ReadAsync(long position, VirtualAllocBuffer buffer, long offset, int count, CancellationToken cancellationToken)
        {
            EnsureReadWriteParameters(position, buffer, offset, count);
            return ReadWriteCoreAsync(_safeFileHandle, position, buffer, offset, count, true, cancellationToken);
        }
        [SecuritySafeCritical]
        public Task WriteAsync(long position, VirtualAllocBuffer buffer, long offset, int count)
        {
            return WriteAsync(position, buffer, offset, count, new CancellationToken());
        }
        [SecuritySafeCritical]
        public Task WriteAsync(long position, VirtualAllocBuffer buffer, long offset, int count, CancellationToken cancellationToken)
        {
            EnsureReadWriteParameters(position, buffer, offset, count);
            return ReadWriteCoreAsync(_safeFileHandle, position, buffer, offset, count, false, cancellationToken);
        }

        [SecurityCritical]
        private static int GetSectorSize(string path)
        {
            string rootPath = Path.GetPathRoot(path);
            if (string.IsNullOrEmpty(rootPath))
                rootPath = Path.GetPathRoot(Environment.CurrentDirectory);
            if (string.IsNullOrEmpty(rootPath))
                throw new InvalidOperationException("Can't get root path");
            uint size;
            uint i;
            WinAPI.GetDiskFreeSpace(rootPath, out i, out size, out i, out i);
            return (int)size;
        }
        [SecurityCritical]
        private long GetFileSize()
        {
            int highSize;
            var lowSize = WinAPI.GetFileSize(_safeFileHandle, out highSize);
            if (lowSize == -1 && Marshal.GetLastWin32Error() != 0)
                throw new Win32Exception();
            return (long)highSize << 32 | (uint)lowSize;
        }
        [SecurityCritical]
        private unsafe static Task ReadWriteCoreAsync(SafeFileHandle fileHandle, long position, VirtualAllocBuffer buffer, long offset, int count, bool isRead, CancellationToken cancellationToken)
        {
            var memoryStartLocation = new IntPtr(unchecked((long)((ulong)buffer.SafeHandle.DangerousGetHandle() + (ulong)offset)));
            var taskCompletionSource = new TaskCompletionSource<object>();
            if (cancellationToken.IsCancellationRequested)
            {
                taskCompletionSource.SetCanceled();
            }
            else
            {
                var cancellationTokenRegistration = default(CancellationTokenRegistration);
                var ioCompletionCallback = new IOCompletionCallback((code, bytes, overlap) =>
                {
                    Overlapped.Free(overlap);
                    cancellationTokenRegistration.Dispose();
                    if (code == 0)
                        taskCompletionSource.TrySetResult(null);
                    if (code == WinAPI.OperationAborted)
                        taskCompletionSource.TrySetCanceled();
                    else
                        taskCompletionSource.TrySetException(new Win32Exception((int)code));
                });
                var overlapped = new Overlapped()
                {
                    OffsetLow = unchecked((int)position),
                    OffsetHigh = unchecked((int)(position >> 32)),
                };
                var nativeOverlapped = overlapped.UnsafePack(ioCompletionCallback, null);
                cancellationTokenRegistration = cancellationToken.Register(() =>
                {
                    if (WinAPI.CancelIoEx(fileHandle, nativeOverlapped))
                        return;
                    var lastError = Marshal.GetLastWin32Error();
                    if (lastError == WinAPI.NotFound)
                        return;
                    taskCompletionSource.TrySetException(new Win32Exception(lastError));
                });
                var completedSynchronously = isRead
                    ? WinAPI.ReadFile(fileHandle, memoryStartLocation, count, IntPtr.Zero, nativeOverlapped)
                    : WinAPI.WriteFile(fileHandle, memoryStartLocation, count, IntPtr.Zero, nativeOverlapped);
                if (completedSynchronously)
                {
                    Overlapped.Free(nativeOverlapped);
                    cancellationTokenRegistration.Dispose();
                    taskCompletionSource.TrySetResult(null);
                }
                else
                {
                    var lastError = Marshal.GetLastWin32Error();
                    if (lastError != WinAPI.IOPending)
                    {
                        cancellationTokenRegistration.Dispose();
                        taskCompletionSource.TrySetException(new Win32Exception(lastError));
                    }
                }
            }
            return taskCompletionSource.Task;
        }
        private void EnsureReadWriteParameters(long position, VirtualAllocBuffer buffer, long offset, int count)
        {
            if (_disposed)
                throw new ObjectDisposedException("UnbufferedFile", "UnbufferedFile already disposed.");
            if (buffer == null)
                throw new ArgumentNullException("buffer", "array cannot be null.");
            if (offset < 0)
                throw new ArgumentOutOfRangeException("offset", "offset cannot be negative.");
            if (count < 0)
                throw new ArgumentOutOfRangeException("count", "count cannot be negative.");
            if (position < 0L)
                throw new ArgumentOutOfRangeException("position", "position cannot be negative.");
            if (count > _fileSize - position)
                throw new ArgumentException("Combination of position and count is too large for file.");
            if (count > buffer.BufferSize - offset)
                throw new ArgumentException("Combination of offset and count is too large for buffer.");
        }
    }
    [SecuritySafeCritical]
    public sealed class VirtualAllocBuffer : IDisposable
    {
        private static readonly ulong MaxAddressSpace = UIntPtr.Size == 4 ? uint.MaxValue : ulong.MaxValue;
        public static readonly int PageSize;
        public static readonly int LargePageSize;
        public static readonly int AllocationGranularity;
        public static readonly bool OSHasLargePageSupport;
        public static readonly bool ProcessHasLargePagePrivilege;
        public static readonly bool CanUseLargePages;
        public static readonly Exception LargePageException;
        [SecurityCritical]
        private readonly SafeVirtualAllocHandle _safeVirtualAllocHandle;
        private readonly long _bufferSize;
        private volatile bool _disposed;
        public SafeVirtualAllocHandle SafeHandle
        {
            [SecurityCritical]
            [SecurityPermission(SecurityAction.InheritanceDemand, Flags = SecurityPermissionFlag.UnmanagedCode)]
            get { return _safeVirtualAllocHandle; }
        }
        [SecuritySafeCritical]
        static VirtualAllocBuffer()
        {
            PageSize = WinAPI.PageSize;
            LargePageSize = WinAPI.LargePageSize;
            AllocationGranularity = WinAPI.AllocationGranularity;
            OSHasLargePageSupport = WinAPI.OSHasLargePageSupport;
            LargePageException = TryAdjustLargePagePrivilege();
            ProcessHasLargePagePrivilege = LargePageException == null;
            CanUseLargePages = OSHasLargePageSupport && ProcessHasLargePagePrivilege;
        }
        public long BufferSize { get { return _bufferSize; } }
        [SecuritySafeCritical]
        public VirtualAllocBuffer() : this(AllocationGranularity) { }
        [SecuritySafeCritical]
        public VirtualAllocBuffer(long bufferSize) : this(bufferSize, false) { }
        [SecuritySafeCritical]
        public VirtualAllocBuffer(long bufferSize, bool useLargePages)
        {
            if (bufferSize < 0)
                throw new ArgumentOutOfRangeException("bufferSize", "bufferSize must be Positive.");
            if ((ulong)bufferSize >= MaxAddressSpace)
                throw new ArgumentOutOfRangeException("bufferSize", "bufferSize is larger than address space.");
            if (useLargePages && !WinAPI.OSHasLargePageSupport)
                throw new InvalidOperationException("useLargePages is not supported on this system.");
            Contract.EndContractBlock();
            _bufferSize = bufferSize;
            _safeVirtualAllocHandle = CreateVirtualAllocHandle(unchecked((UIntPtr)bufferSize), useLargePages);
            _safeVirtualAllocHandle.Initialize(unchecked((ulong)bufferSize));
        }
        [SecuritySafeCritical]
        public void Dispose()
        {
            _disposed = true;
            if (_safeVirtualAllocHandle != null) _safeVirtualAllocHandle.Dispose();
        }
        [SecuritySafeCritical]
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        public void Read(long position, byte[] array, int offset, int count)
        {
            EnsureReadWriteParameters(position, array, offset, count);
            ReadWriteCore(_safeVirtualAllocHandle, position, array, offset, count, true);
        }
        [SecuritySafeCritical]
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        public void Write(long position, byte[] array, int offset, int count)
        {
            EnsureReadWriteParameters(position, array, offset, count);
            ReadWriteCore(_safeVirtualAllocHandle, position, array, offset, count, false);
        }

        [SecurityCritical]
        private static SafeVirtualAllocHandle CreateVirtualAllocHandle(UIntPtr bufferSize, bool tryUseLargePages)
        {
            var allocationType = AllocationType.Reserve | AllocationType.Commit;
            if (tryUseLargePages && CanUseLargePages) allocationType |= AllocationType.LargePages;
            var safeVirtualAllocHandle = WinAPI.VirtualAlloc(IntPtr.Zero, bufferSize, allocationType, MemoryProtection.ReadWrite);
            if (safeVirtualAllocHandle.IsInvalid)
                throw new Win32Exception();
            return safeVirtualAllocHandle;
        }
        [SecurityCritical]
        private static Win32Exception TryAdjustLargePagePrivilege()
        {
            GenericSafeHandle tokenHandle;
            if (!WinAPI.OpenProcessToken(WinAPI.GetCurrentProcess(), TokenAccessLevels.AdjustPrivileges | TokenAccessLevels.Query, out tokenHandle))
                return new Win32Exception();
            try
            {
                LUID luid;
                if (!WinAPI.LookupPrivilegeValue(null, "SeLockMemoryPrivilege", out luid))
                    return new Win32Exception();
                var tokenPrivileges = new TokenPrivileges() { PrivilegeCount = 1, Luid = luid, Attributes = WinAPI.PrivilegeEnabled };
                WinAPI.AdjustTokenPrivileges(tokenHandle, false, ref tokenPrivileges, 0, IntPtr.Zero, IntPtr.Zero);
                if (Marshal.GetLastWin32Error() != 0)
                    return new Win32Exception();
            }
            finally
            {
                if (tokenHandle != null) tokenHandle.Dispose();
            }
            return null;
        }
        [SecurityCritical]
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        private static void ReadWriteCore(SafeVirtualAllocHandle safeHandle, long position, byte[] array, int offset, int count, bool isRead)
        {
            var mustReleaseSafeHandle = false;
            var memoryStartLocation = new IntPtr(unchecked((long)((ulong)safeHandle.DangerousGetHandle() + (ulong)position)));
            RuntimeHelpers.PrepareConstrainedRegions();
            try
            {
                safeHandle.DangerousAddRef(ref mustReleaseSafeHandle);
                if (mustReleaseSafeHandle)
                {
                    if (isRead)
                        Marshal.Copy(memoryStartLocation, array, offset, count);
                    else
                        Marshal.Copy(array, offset, memoryStartLocation, count);
                }
            }
            finally
            {
                if (mustReleaseSafeHandle) safeHandle.DangerousRelease();
            }
            if (!mustReleaseSafeHandle)
                throw new InvalidOperationException("Problem accessing memory.");
        }
        private void EnsureReadWriteParameters(long position, byte[] array, int offset, int count)
        {
            if (_disposed)
                throw new ObjectDisposedException("VirtualAllocBuffer", "VirtualAllocBuffer already disposed.");
            if (array == null)
                throw new ArgumentNullException("array", "array cannot be null.");
            if (offset < 0)
                throw new ArgumentOutOfRangeException("offset", "offset cannot be negative.");
            if (count < 0)
                throw new ArgumentOutOfRangeException("count", "count cannot be negative.");
            if (position < 0L)
                throw new ArgumentOutOfRangeException("position", "position cannot be negative.");
            if (count > _bufferSize - position)
                throw new ArgumentException("Combination of position and count is too large for buffer.");
            if (count > array.Length - offset)
                throw new ArgumentException("Combination of offset and count is too large for array.");
        }
    }
    [SecurityCritical]
    [SecurityPermission(SecurityAction.InheritanceDemand, UnmanagedCode = true)]
    public sealed class SafeVirtualAllocHandle : SafeBuffer
    {
        public static SafeVirtualAllocHandle InvalidHandle
        {
            [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
            get { return new SafeVirtualAllocHandle() { handle = IntPtr.Zero }; }
        }
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        private SafeVirtualAllocHandle() : base(true) { }
        [SecurityCritical]
        protected override bool ReleaseHandle()
        {
            return WinAPI.VirtualFree(base.handle, UIntPtr.Zero, FreeType.Release);
        }
    }
    [SecurityCritical]
    public sealed class GenericSafeHandle : SafeHandleZeroOrMinusOneIsInvalid
    {
        public static GenericSafeHandle InvalidHandle
        {
            [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
            get { return new GenericSafeHandle() { handle = IntPtr.Zero }; }
        }
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        private GenericSafeHandle() : base(true) { }
        [SecurityCritical]
        protected override bool ReleaseHandle()
        {
            return WinAPI.CloseHandle(base.handle);
        }
    }
    [SuppressUnmanagedCodeSecurity]
    [SecurityCritical]
    class WinAPI
    {
        public const int PrivilegeEnabled = 2;
        public const FileOptions NoBuffering = (FileOptions)0x20000000;
        public const uint OperationAborted = 995;
        public const uint IOPending = 997;
        public const uint NotFound = 1168;
        public static readonly int PageSize;
        public static readonly int LargePageSize;
        public static readonly int AllocationGranularity;
        public static readonly bool OSHasLargePageSupport;
        static WinAPI()
        {
            SystemInfo sysInfo;
            GetSystemInfo(out sysInfo);
            PageSize = checked((int)sysInfo.PageSize);
            AllocationGranularity = checked((int)sysInfo.AllocationGranularity);
            LargePageSize = checked((int)GetLargePageMinimum());
            OSHasLargePageSupport = LargePageSize != 0;
        }
        [DllImport("kernel32.dll")]
        private static extern UIntPtr GetLargePageMinimum();
        [DllImport("kernel32.dll", SetLastError = true)]
        private static extern void GetSystemInfo(out SystemInfo lpSystemInfo);
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        [DllImport("kernel32.dll", SetLastError = true)]
        public static extern SafeVirtualAllocHandle VirtualAlloc(IntPtr address, [In] UIntPtr numBytes, [In] AllocationType commitOrReserve, [In] MemoryProtection pageProtectionMode);
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        [DllImport("kernel32.dll", SetLastError = true)]
        public static extern bool VirtualFree(IntPtr address, [In] UIntPtr numBytes, [In] FreeType pageFreeMode);
        [DllImport("kernel32.dll", SetLastError = true)]
        public static extern SafeFileHandle CreateFile([In] string fileName, [In] FileAccess desiredAccess, [In] FileShare shareMode, [In] IntPtr securityAttrs, [In] FileMode creationDisposition, [In] FileOptions flagsAndAttributes, [In] IntPtr templateFile);
        [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
        public static extern int memcmp([In] byte[] b1, [In] byte[] b2, [In] long count);
        [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
        public static extern IntPtr memset([In] byte[] dest, [In] int c, [In] int count);
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        [DllImport("advapi32.dll", SetLastError = true)]
        public static extern bool OpenProcessToken([In] IntPtr processToken, [In] TokenAccessLevels desiredAccess, out GenericSafeHandle tokenHandle);
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        [DllImport("kernel32.dll", SetLastError = true)]
        public static extern IntPtr GetCurrentProcess();
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        [DllImport("kernel32.dll", SetLastError = true)]
        [return: MarshalAs(UnmanagedType.Bool)]
        public static extern bool CloseHandle([In] IntPtr handle);
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        [DllImport("advapi32.dll", SetLastError = true)]
        public static extern bool LookupPrivilegeValue([In] string systemName, [In] string name, out LUID luid);
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        [DllImport("advapi32.dll", SetLastError = true)]
        public static extern bool AdjustTokenPrivileges([In] GenericSafeHandle tokenHandle, [In] bool disableAllPrivileges, [In] ref TokenPrivileges newState, [In] uint bufferLength, IntPtr previousState, IntPtr returnLength);
        [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
        public static extern bool GetDiskFreeSpace([In] string rootPathName, out uint sectorsPerCluster, out uint bytesPerSector, out uint numberOfFreeClusters, out uint totalNumberOfClusters);
        [DllImport("kernel32.dll", SetLastError = true)]
        public static extern int GetFileSize([In] SafeFileHandle file, out int highSize);
        [DllImport("kernel32.dll", SetLastError = true)]
        public static extern bool SetEndOfFile([In] SafeFileHandle file);
        [DllImport("kernel32.dll", SetLastError = true)]
        public static unsafe extern bool ReadFile([In] SafeFileHandle handle, IntPtr bytes, [In] int numBytesToRead, IntPtr numBytesReadMustBeZero, NativeOverlapped* overlapped);
        [DllImport("kernel32.dll", SetLastError = true)]
        public static unsafe extern bool WriteFile([In] SafeFileHandle handle, IntPtr bytes, [In] int numBytesToWrite, IntPtr numBytesWrittenMustBeZero, NativeOverlapped* overlapped);
        [DllImport("kernel32.dll", SetLastError = true)]
        public static unsafe extern bool CancelIoEx([In] SafeFileHandle handle, NativeOverlapped* lpOverlapped);
        [DllImport("kernel32.dll", SetLastError = true)]
        public static extern bool SetFilePointerEx([In] SafeFileHandle handle, [In] long distanceToMove, IntPtr newFilePointer, [In] SeekOrigin moveMethod);
    }
    struct TokenPrivileges
    {
        public int PrivilegeCount;
        public LUID Luid;
        public int Attributes;
    }

    struct LUID
    {
        public uint LowPart;
        public uint HighPart;
    }
    struct SystemInfo
    {
        public int OemId;
        public int PageSize;
        public IntPtr MinimumApplicationAddress;
        public IntPtr MaximumApplicationAddress;
        public IntPtr ActiveProcessorMask;
        public int NumberOfProcessors;
        public int ProcessorType;
        public int AllocationGranularity;
        public short ProcessorLevel;
        public short ProcessorRevision;
    }
    [Flags]
    enum FreeType : uint
    {
        Release = 0x8000,
    }
    [Flags]
    enum AllocationType : uint
    {
        Commit = 0x1000,
        Reserve = 0x2000,
        LargePages = 0x20000000,
    }
    [Flags]
    enum MemoryProtection : uint
    {
        ReadWrite = 0x04,
    }
    public enum SeekOrigin
    {
        Begin = 0,
        Current = 1,
        End = 2,
    }

[EDIT] Moved code into details, added C# syntax highlighting by @karelz

@karelz
Copy link
Member

karelz commented Nov 4, 2017

Here's my view on your reasons to add the library into CoreFX:

  1. System.IO.Log was part of .NET Framework
    • This is a good argument for porting the code over "as is", but it does not require making the code very high-performance implementation. If there is enough upvotes, we can definitely consider bringing it into future versions of .NET Framework Compatibility Pack - see Ship .NET Framework compatibility pack #23974.
  2. It requires low-level file I/O
    • Any library can make low-level OS API calls. That by itself does not mean it has to be part of CoreFX.
  3. Microsoft ownership guarantees high quality
    • This is common (mostly hidden) reason of asks to add APIs into CoreFX repo (and other MS repos). Obviously, it would not scale if we said yes to every such addition - CoreFX would become the kitchen sink repo of everything low-level enough. That's why we have to prioritize based on customer demand (e.g. upvotes) and make sure added libraries are in line with the rest of CoreFX repo.
    • Also note that there are PLENTY of high quality libraries maintained by other companies or community - see e.g. list of .NET Foundation projects or most popular NuGet libraries. Delivering a high quality library is hard and lots of work, but there are MANY communities and companies, not just Microsoft, capable of delivering such value.
  4. SQL Server team could contribute
    • I can't speak on SQL Server organization behalf. It depends on their business priorities. Note that their ability to contribute should not be limited to only Microsoft-owned repos.

I would recommend to split this issue into 2 separate items:

  • CoreFX issue to port System.IO.Log from .NET Framework as is for compatibility reasons. We would monitor demand via upvotes / replies.
  • CoreFX-independent effort to build high-performance ARIES library, probably with database folks involvement. It might be good to check if there were similar efforts in the past / in progress.

@sakno
Copy link
Contributor

sakno commented Aug 4, 2019

I'm voting for Write Ahead Log as a part of .NET because I have implementation of Raft Consensus Algorithm for ASP.NET Core which requires to have persistent replication log. At this moment the log is represented by interface and its actual implementation is delegated to the consumer.

@danmoseley
Copy link
Member

@davidfowl what are thoughts from ASPNET perspective

@sakno
Copy link
Contributor

sakno commented Nov 7, 2019

I would like to share implementation of Write Ahead Log that is suitable for general purpose use as a proof of concept. Sources are here and licensed under MIT.
What was done:

  • Search, drop, append, commit core operations for manipulations with log entries
  • Implementation is fully asynchronous
  • Reduced memory allocations: they are replaced by renting where possible
  • Thread safety
  • Parallel reads
  • Consumer able to define its own binary format for log records
  • Configurable caches for fast lookup
  • Log compaction using snapshotting approach
  • Logical partitioning
  • O(1) random access to log records
    The implementation relies only on existing .NET Standard API so it is portable across OSes and .NET runtimes.

What was not done (and probably won't):

  • Low-level optimizations that was mentioned by @AceHack because of inability to port them across different platforms

Originally, it was developed for log replication as described by Raft consensus algorithm so it has some specific API such as Raft node state. However, you can just ignore this part of API.
I can say that development of fully general-purpose WAL is not so easy because underlying database engine may require some specific features.
Also, I don't have benchmarks because various C# implementations of oss databases have very specific log engines so their performance are not comparable.

@hiteshmadan
Copy link

@AceHack @sakno Please checkout FasterLog from here: https://github.com/microsoft/FASTER

FASTER is a really fast cross-platform embedded key-value store, and its' internal log implementation was upgraded into a public API in the last few weeks. Super impressive performance characteristics imho - try out the FasterLogSample csproj on a machine with an NVMe SSD and hopefully you'll be impressed with what you see!

cc @badrishc

(disclaimer - I've been using it for a work project and actively collaborating with the main author)

@AceHack
Copy link
Author

AceHack commented Dec 3, 2019

@hiteshmadan This seems pretty amazing, thanks for pointing this out.

@danmoseley
Copy link
Member

@badrishc
Copy link

badrishc commented Dec 4, 2019

Submitted a PR as suggested, here. To expand a bit, FasterLog (docs) is a latch-free concurrent persistent log library for .NET. It supports concurrent appends, group commit, multiple persistent iterators (each of which may itself be concurrently used), random reads, and log truncation from head. It can run over sharded and tiered storage backend devices (we have out-of-the-box device implementations for local storage and Azure Page Blobs). Our benchmarks are able to easily saturate 2+ GB/sec on a single local NVMe SSD, and 4+ GB/sec when run over two (sharded) SSDs.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@stephentoub stephentoub modified the milestones: 5.0, Future Feb 15, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@stephentoub
Copy link
Member

Closing as being addressed by an existing library in the ecosystem.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-Meta untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

9 participants