-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large parts cannot be written on .NET Core due to OutOfMemoryException #807
Comments
I closed the issue by mistake, so I am reopening it... I was trying to find more information about this issue and I found some issues with System.IO.Packaging... Is there some workaround to temporarily fix my problem? I really need to generate large excel files in .NET Core. Related issues: |
This is definitely dotnet/runtime#1544. I am unsure of a good workaround at this time. |
I am having the same problem. Has someone found a workaround for this issue? Thank you in advance |
Hello @twsouthwick. Did you see the workaround that was supposed in dotnet/runtime#1544 discussion? |
Hi @Viktor-36, how have you fixed or workaround this issue?I need to generate large excel files in .NET Core too. Hi @twsouthwick, have you read the message from @nike61? |
We are also running into that exact same issue, even though we call |
Our company is currently being hit hard by this issue for both WordProcessing, Presentation and Spreadsheet. We are seeing big spikes of memory consumption on our production environment when modifying the OpenXml of certain documents. This gives significant problems for our enterprise customers. |
@sorensenmatias @clement911 @rsdelapaz Sorry for the spam, I got something working and thought perhaps it might help someone looking for a workaround. I don't love my solution but, it did work. Since I wasn't able to write to a second
What I found was I was only able to use Outside of the acrobatics, the limitation of this approach seems to be that you can only have 1 large sheet (but from the examples I've seen, many people are only writing one large sheet per spreadsheet so this might help someone). |
Here's more of a working sample for .NET Core that should get people started on the workaround. I just wrote out a 500,000 row dataset and the memory footprint stayed pretty low. There are a few places where some extension methods are used (like SafeLeft), you can remove those and put in what you need (GetCell in particular isn't super clean, keep in mind, proof of concept). What you'll be interested really is the order of the The biggest limitation I can see is that you can only write one large sheet to the document (after that ReadWrite is required, I could never get a second Write only stream to work). https://gist.github.com/blakepell/8fe938624f1dad8c28ff93a334687d77 |
Thanks @blakepell, this is very helpful. Unfortunately, doing Write instead of Read/Write would require a huge refactoring (months) for our use case. So our hopes is still that the root issue here could be fixed. |
One other note, with the workaround approach I posted you can add additional sheets later in |
@blakepell 's solution gave us an error ("Cannot retrieve parts of writeonly container") until we updated to the latest version of the |
Any Updates on that? |
Is there any update on when it will be fixed? My .NET Core Migration for a few projects is on hold for the last two years due to this issue. I need to generate a large excel file, and it throws the exception "Stream was too long" |
Well, i was able to fix it, but i don't remember what i did, what i can say, is that back then, i was able to fix this issue doing something like this: for some reason, if you first write the data to a WorksheetPart, close the stream, and then write the Sheet attributing the WorksheetPart to it, it runs a lot smoother Note that im using System.IO.Packaging there, as far as i read (and remember) there was some issues with OpenXML internal Packaging on .net5 to 6, and some people show me this workaround using the native System.IO.Packaging, we need to verify that's still broken on .net 7 though Doing this, i was able to generate aprox 1M rows excel using 1-2GiB of ram, with a high CPU usage though, but a LOT faster, but also note that im using a IAsyncEnumerable in order to stream data to the file, i was using RabbitMq for streaming batches of 1000 lines each message, so its a good idea of streaming it from the database as well /// <summary>
/// Writes .xlsx file to a given path using the provided data stream.
/// </summary>
/// <param name="filePath"></param>
/// <param name="columns"></param>
/// <param name="dataStream"></param>
public async ValueTask GenerateFile(
string filePath,
IDictionary<string, string> columns,
IAsyncEnumerable<IEnumerable<T>> dataStream)
{
var worksheetPartId = await WriteInitialDocument(filePath, columns, dataStream);
await WriteSheetToDocumentAsync(filePath, worksheetPartId);
}
private async Task<string> WriteInitialDocument(
string filePath,
IDictionary<string, string> columns,
IAsyncEnumerable<IEnumerable<T>> dataStream)
{
await using var fileStream = File.Create(filePath);
using var package = Package.Open(fileStream, FileMode.Create, FileAccess.Write);
using var excel = SpreadsheetDocument.Create(package, SpreadsheetDocumentType.Workbook);
excel.AddWorkbookPart();
ApplyStyles(excel);
var worksheetPart = excel.WorkbookPart.AddNewPart<WorksheetPart>();
var workbookId = excel.WorkbookPart.GetIdOfPart(worksheetPart);
await WriteStreamDataToFileAsync(columns, dataStream, worksheetPart);
return workbookId;
}
/// <summary>
/// Finishes the process of writing data files.
/// </summary>
/// <param name="filePath"></param>
/// <param name="worksheetPartId"></param>
private static async Task WriteSheetToDocumentAsync(string filePath, string worksheetPartId)
{
await using var fileStream = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite, FileShare.None);
using var package = Package.Open(fileStream, FileMode.Open, FileAccess.ReadWrite);
using var excel = SpreadsheetDocument.Open(package);
if (excel.WorkbookPart is null)
throw new InvalidOperationException("Workbook part cannot be null!");
var xmlWriter = OpenXmlWriter.Create(excel.WorkbookPart);
xmlWriter.WriteStartElement(new Workbook());
xmlWriter.WriteStartElement(new Sheets());
xmlWriter.WriteElement(new Sheet {Id = worksheetPartId, Name = "Sheet 1", SheetId = 1});
// For Sheets
xmlWriter.WriteEndElement();
// For Workbook
xmlWriter.WriteEndElement();
xmlWriter.Close();
xmlWriter.Dispose();
}
/// <summary>
/// Streams and writes data to file.
/// </summary>
/// <param name="columns"></param>
/// <param name="dataStream"></param>
/// <param name="worksheetPart"></param>
private async Task WriteStreamDataToFileAsync(
IDictionary<string, string> columns,
IAsyncEnumerable<IEnumerable<T>> dataStream,
OpenXmlPart worksheetPart)
{
using var xmlWriter = OpenXmlWriter.Create(worksheetPart);
xmlWriter.WriteStartElement(new Worksheet());
xmlWriter.WriteStartElement(new SheetData());
var rowIndex = 1;
var attributes = new List<OpenXmlAttribute> {new OpenXmlAttribute("r", string.Empty, rowIndex.ToString())};
xmlWriter.WriteStartElement(new Row(), attributes);
var columnList = columns.Values.ToList();
foreach (var column in columnList)
{
var cellValue = column.StripHexadecimalSymbols();
var cell = new Cell {CellValue = new CellValue(cellValue), StyleIndex = 1, DataType = CellValues.String};
xmlWriter.WriteElement(cell);
}
// For Header Row
xmlWriter.WriteEndElement();
var columnKeysList = columns.Keys.ToList();
await foreach (var itemEnumerable in dataStream)
{
var itemList = itemEnumerable.ToList();
foreach (var t in itemList)
{
rowIndex += 1;
// Starts a new row.
xmlWriter.WriteStartElement(
new Row(),
new List<OpenXmlAttribute> {new("r", string.Empty, rowIndex.ToString())});
foreach (var columnKey in columnKeysList)
{
var cellValue = _cellFormatterStrategy.GetCellValue(ReportFormats.Xlsx, t, columnKey);
var cell = _cellFormatterStrategy.GetXlsxCell(columnKey, cellValue);
xmlWriter.WriteElement(cell);
}
// Closes the row.
xmlWriter.WriteEndElement();
}
}
// For SheetData
xmlWriter.WriteEndElement();
// For Worksheet
xmlWriter.WriteEndElement();
} |
I'd like to get this fixed (at least a work around) for v3.0. I've created a set of abstractions that allow for more control over things and I think we could automate some of the work arounds here (at least in an opt-in way). For the abstractions, see: #1295. My thoughts would be to model what @M4urici0GM did, but in a more transparent way. Of course, it would be better to have this fixed in the underlying package model, but that hasn't gone anywhere in too many years. My initial thoughts to implementing this would be: (1) Provide an abstraction of IPackage that would intercept calls to GetStream and write them to some temporary location The abstractions I have should allow building this, except we'd need a way to "Reload" the underlying package. Building off of the abstractions, I'm thinking of enabling the following: public interface IPackageFeature
{
IPackage Package { get; }
+ bool CanReload { get; }
+ void Reload();
} This could automatically be supported for files opened with paths or streams, but if a package is given, then it would not be supported (since we didn't manage the package) without additional information from a user. |
I made a hacky solution for this. Obviously not production-ready, but the idea itself, I think, is good. Instead of creating the EDIT: Run it with @Viktor-36 's parameters form original post (100k rows, 40 cells each, with text "test"). The standard save used up 2GB or RAM, the hacky save used 321MB. |
Created branches with support for the common data types and one with has the |
If anyone is facing the some problem nowadays - the workaround by @blakepell doesn't work in newer releases. |
@twsouthwick Is this issue will be resolved in v3.0? When can we expect v3.0? We tested with beta v3, still this issue exists |
Unfortunately, I wasn't able to get it to work and had other priorities. I'll move this to v3.1 to see what can be done |
I was wondering the reason why blakepell's workaround can't work anymore in OpenXml 2.20 and up? It seems a validation check was added preventing the use of I tried a piece of blakepell's workaround but I came across a validation check that throws if the zip package has no read access. In v2.20.0: https://github.com/dotnet/Open-XML-SDK/blob/release/v2.20.0/src/DocumentFormat.OpenXml/Packaging/OpenXmlPackage.cs:
Or if I try in OpenXml 3.0.0, Package.cs checks and throws with
Why was this validation check added when it wasn't there in previous versions? Would removing that check in newer versions of OpenXml cause other issues? |
@carlossanlop Would you have any thoughts on my comment above? #807 (comment) This seems related to the following dotnet issues that have been open for a long time. These have been causing memory usage issues with MemoryStream whenever we generate a large Excel file using OpenXml, that in turn uses System.IO.Packaging (which gets down to System.IO.Compression, method OpenInUpdateMode in ZipArchiveEntry.cs which keeps uncompressed data in a MemoryStream), and we're hoping this might be resolved on the .NET side. |
.Net team still haven't resolved this issue. Is there any workaround or something? |
@kotofsky the only workaround (i know so far, is the one i've published above) |
@blakepell I've expanded your workaround. Just generate worksheetPartId and create WorksheetPart using it. There are also no restrictions on the number of document sheets with this approach
Sorry, for text translation i used Google translator)))) |
To get @blakepell solution to work, is it necessary to use only the older version 2.16.0? @twsouthwick Will this issue be resolved in version 3.2.0? |
closing as duplicate |
Description
I was using ClosedXML to create large excel files (100k rows with 63 colums) and I faced issue with OutOfMemoryException. I found some examples of OpenXML using SAX, so I tried to switch to OpenXML, but it didn't help me. I tried to remove all my code (reading from DB, etc.) to try if that works, but I still get OutOfMemoryException.
My code is based on http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/
Information
EDIT: Same code is working fine in .NET Framework 4.7.2 with same DocumentFormat.OpenXml version.
Repro
This is simple code that I use at the moment. I am testing with RAM limited to 500 MB (testing purpose). I don't think that this code can consume so much RAM.
Observed
Expected
Excel file in filePath with 100k rows and 40 columns with string "test" in all cells.
The text was updated successfully, but these errors were encountered: