-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WordprocessingDocument.Open is very slow #628
Comments
Thanks for the issue. What OS are you running on? |
Here's a benchmark version of it: using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using DocumentFormat.OpenXml.Packaging;
namespace OpenSample
{
[CoreJob]
public class OpenBenchmark
{
[Benchmark]
public WordprocessingDocument Open()
{
var path = ...;
return WordprocessingDocument.Open(path, false);
}
}
class Program
{
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<OpenBenchmark>();
}
}
} And the results:
I agree that seems like a lot of time. I'll look into it. |
I tried it on windows 10 and MacOS. Results are same. |
I'm new to C# so i don't know how to profile it and find the problem, but after looking for source code it seems to me that |
ah that is a good observation. I'll have to see if there's a way to lazily load those (I've never actually explored that scenario) |
@amanbolat Sorry for the delay. I've got some time now to look into some of these issues. However, lazily loading this is trickier than I thought. Have you tried loading the package into memory and then accessing it that way? If streaming from disk, it will have some overhead from disk IO. You could try the following: using(var ms = new MemoryStream())
{
using(var fs = File.OpenRead("path to file"))
{
fs.CopyTo(ms);
}
ms.Position(0);
// Open document from memory stream
} |
@twsouthwick Hi. Yes, I tried it and result is same, it will take about 1 minute. I tested on my new MacBook with ssd so the overhead shouldn’t be so big, I think so. |
Ok. I'll try to make this a higher priority, but to set expectations, I work on this as a side project for work so I'm able to do feature work here somewhat sporadically. If you have any desire to take a stab at it, feel free and I'm happy to provide pointers |
Thank you. |
@amanbolat, try the following method from my CodeSnippets repository: public static MemoryStream ReadAllBytesToMemoryStream(string path)
{
byte[] buffer = File.ReadAllBytes(path);
var destStream = new MemoryStream(buffer.Length);
destStream.Write(buffer, 0, buffer.Length);
destStream.Seek(0, SeekOrigin.Begin);
return destStream;
} I've benchmarked a number of ways to clone Office Open XML documents and this came out as the fastest. |
@ThomasBarnekow method doesn't matter. Try to open my file and you will see that it will take about 1 or 2 minutes. Link to the file: https://drive.google.com/file/d/1_InQLbZ19KCUgkuePAiLXvUuLcZl6Qu7/view?usp=sharing |
Stale issue message |
I'm going to open as I recently made a change to System.IO.Packaging that may help this and I want to verify. See dotnet/runtime#35978 |
FYI the fix that I got into System.IO.Packaging fixes this! Before:
After:
Since the package is still in preview, we won't be upgrading it for the project at this time. Note, since the major version of this is changing, we won't actually be able to bring it into the repo until v3.0 (due to semantic versioning). That'll probably happen sometime soon, but not for a bit. However, you may manually bring in 5.0.0-preview.6.20305.6+ of the package to get the benefit (although it will only help you if you are on .NET Core) |
I'm going to close this for now, as it is fixed where it needs to be. Once we move to v3.0, we'll update to the latest System.IO.Packaging so it'll come by default. |
Description
WordprocessingDocument.Open
is very very slow when reading big .docx document.i'm trying to read 10 mb sized .docx document and it takes about 1 minute just to open it.
Information
Repro
Link to the file: https://drive.google.com/file/d/1_InQLbZ19KCUgkuePAiLXvUuLcZl6Qu7/view?usp=sharing
Uploaded to GitHub: 10mb_file.docx
Observed
I put to lines of
Console.WriteLine
so the time between "Creating filter" and "Creating BodyReader" is about 1 min. It doesn't matter if i opening file from memory stream or just giving it a real path to the file.Expected
Instant open expected :)
The text was updated successfully, but these errors were encountered: