-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One illegal uri in Relationship will destroy the document parsing #715
Comments
Maybe we can define an interface and a PackageAdapt class. And the developer can input the IPackage and custom processing data. And we add the API that can input the IPackage package. public partial class PresentationDocument
{
// The New API
public static PresentationDocument Open(DocumentFormat.OpenXml.Packaging.IPackage package)
{
return PresentationDocument.Open(package);
}
public static PresentationDocument Open(System.IO.Packaging.Package package)
{
IPackage packageAdapt = new PackageAdapt(package);
return PresentationDocument.Open(packageAdapt);
}
} And this is the IPackage code public interface IPackage
{
/// <summary>
/// Gets the FileAccess with which the package was opened. This is a read only property.
/// This property gets set when the package is opened.
/// </summary>
/// <value>FileAccess</value>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
FileAccess FileOpenAccess { get; }
/// <summary>
/// The package properties are a subset of the standard OLE property sets
/// SummaryInformation and DocumentSummaryInformation, and include such properties
/// as Title and Subject.
/// </summary>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
PackageProperties PackageProperties { get; }
/// <summary>
/// Creates a new part in the package. An empty stream corresponding to this part will be created in the
/// package. If a part with the specified uri already exists then we throw an exception.
/// This methods will call the CreatePartCore method which will create the actual PackagePart in the package.
/// </summary>
/// <param name="partUri">Uri of the PackagePart that is to be added</param>
/// <param name="contentType">ContentType of the stream to be added</param>
/// <returns></returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is readonly, it cannot be modified</exception>
/// <exception cref="ArgumentNullException">If partUri parameter is null</exception>
/// <exception cref="ArgumentNullException">If contentType parameter is null</exception>
/// <exception cref="ArgumentException">If partUri parameter does not conform to the valid partUri syntax</exception>
/// <exception cref="InvalidOperationException">If a PackagePart with the given partUri already exists in the Package</exception>
PackagePart CreatePart(Uri partUri, string contentType);
/// <summary>
/// Creates a new part in the package. An empty stream corresponding to this part will be created in the
/// package. If a part with the specified uri already exists then we throw an exception.
/// This methods will call the CreatePartCore method which will create the actual PackagePart in the package.
/// </summary>
/// <param name="partUri">Uri of the PackagePart that is to be added</param>
/// <param name="contentType">ContentType of the stream to be added</param>
/// <param name="compressionOption">CompressionOption describing compression configuration
/// for the new part. This compression apply only to the part, it doesn't affect relationship parts or related parts.
/// This parameter is optional. </param>
/// <returns></returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is readonly, it cannot be modified</exception>
/// <exception cref="ArgumentNullException">If partUri parameter is null</exception>
/// <exception cref="ArgumentNullException">If contentType parameter is null</exception>
/// <exception cref="ArgumentException">If partUri parameter does not conform to the valid partUri syntax</exception>
/// <exception cref="ArgumentOutOfRangeException">If CompressionOption enumeration [compressionOption] does not have one of the valid values</exception>
/// <exception cref="InvalidOperationException">If a PackagePart with the given partUri already exists in the Package</exception>
PackagePart CreatePart(Uri partUri,
string contentType,
CompressionOption compressionOption);
/// <summary>
/// Returns a part that already exists in the package. If the part
/// Corresponding to the URI does not exist in the package then an exception is
/// thrown. The method calls the GetPartCore method which actually fetches the part.
/// </summary>
/// <param name="partUri"></param>
/// <returns></returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is write only, information cannot be retrieved from it</exception>
/// <exception cref="ArgumentNullException">If partUri parameter is null</exception>
/// <exception cref="ArgumentException">If partUri parameter does not conform to the valid partUri syntax</exception>
/// <exception cref="InvalidOperationException">If the requested part does not exists in the Package</exception>
PackagePart GetPart(Uri partUri);
/// <summary>
/// This is a convenient method to check whether a given part exists in the
/// package. This will have a default implementation that will try to retrieve
/// the part and then if successful, it will return true.
/// If the custom file format has an easier way to do this, they can override this method
/// to get this information in a more efficient way.
/// </summary>
/// <param name="partUri"></param>
/// <returns></returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is write only, information cannot be retrieved from it</exception>
/// <exception cref="ArgumentNullException">If partUri parameter is null</exception>
/// <exception cref="ArgumentException">If partUri parameter does not conform to the valid partUri syntax</exception>
bool PartExists(Uri partUri);
/// <summary>
/// This method will do all the house keeping required when a part is deleted
/// Then the DeletePartCore method will be called which will have the actual logic to
/// do the work specific to the underlying file format and will actually delete the
/// stream corresponding to this part. This method does not throw if the specified
/// part does not exist. This is in conformance with the FileInfo.Delete call.
/// </summary>
/// <param name="partUri"></param>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is readonly, it cannot be modified</exception>
/// <exception cref="ArgumentNullException">If partUri parameter is null</exception>
/// <exception cref="ArgumentException">If partUri parameter does not conform to the valid partUri syntax</exception>
void DeletePart(Uri partUri);
/// <summary>
/// This returns a collection of all the Parts within the package.
/// </summary>
/// <returns></returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is writeonly, no information can be retrieved from it</exception>
PackagePartCollection GetParts();
/// <summary>
/// Closes the package and all the underlying parts and relationships.
/// Calls the Dispose Method, since they have the same semantics
/// </summary>
void Close();
/// <summary>
/// Flushes the contents of the parts and the relationships to the package.
/// This method will call the FlushCore method which will do the actual flushing of contents.
/// </summary>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is readonly, it cannot be modified</exception>
void Flush();
/// <summary>
/// Creates a relationship at the Package level with the Target PackagePart specified as the Uri
/// </summary>
/// <param name="targetUri">Target's URI</param>
/// <param name="targetMode">Enumeration indicating the base uri for the target uri</param>
/// <param name="relationshipType">PackageRelationship type, having uri like syntax that is used to
/// uniquely identify the role of the relationship</param>
/// <returns></returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is readonly, it cannot be modified</exception>
/// <exception cref="ArgumentNullException">If parameter "targetUri" is null</exception>
/// <exception cref="ArgumentNullException">If parameter "relationshipType" is null</exception>
/// <exception cref="ArgumentOutOfRangeException">If parameter "targetMode" enumeration does not have a valid value</exception>
/// <exception cref="ArgumentException">If TargetMode is TargetMode.Internal and the targetUri is an absolute Uri </exception>
/// <exception cref="ArgumentException">If relationship is being targeted to a relationship part</exception>
PackageRelationship CreateRelationship(Uri targetUri, TargetMode targetMode, string relationshipType);
/// <summary>
/// Creates a relationship at the Package level with the Target PackagePart specified as the Uri
/// </summary>
/// <param name="targetUri">Target's URI</param>
/// <param name="targetMode">Enumeration indicating the base uri for the target uri</param>
/// <param name="relationshipType">PackageRelationship type, having uri like syntax that is used to
/// uniquely identify the role of the relationship</param>
/// <param name="id">String that conforms to the xsd:ID datatype. Unique across the source's
/// relationships. Null is OK (ID will be generated). An empty string is an invalid XML ID.</param>
/// <returns></returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is readonly, it cannot be modified</exception>
/// <exception cref="ArgumentNullException">If parameter "targetUri" is null</exception>
/// <exception cref="ArgumentNullException">If parameter "relationshipType" is null</exception>
/// <exception cref="ArgumentOutOfRangeException">If parameter "targetMode" enumeration does not have a valid value</exception>
/// <exception cref="ArgumentException">If TargetMode is TargetMode.Internal and the targetUri is an absolute Uri </exception>
/// <exception cref="ArgumentException">If relationship is being targeted to a relationship part</exception>
/// <exception cref="System.Xml.XmlException">If parameter "id" is not a valid Xsd Id</exception>
/// <exception cref="System.Xml.XmlException">If an id is provided in the method, and its not unique</exception>
PackageRelationship CreateRelationship(Uri targetUri, TargetMode targetMode, string relationshipType, string id);
/// <summary>
/// Deletes a relationship from the Package. This is done based on the
/// relationship's ID. The target PackagePart is not affected by this operation.
/// </summary>
/// <param name="id">The ID of the relationship to delete. An invalid ID will not
/// throw an exception, but nothing will be deleted.</param>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is readonly, it cannot be modified</exception>
/// <exception cref="ArgumentNullException">If parameter "id" is null</exception>
/// <exception cref="System.Xml.XmlException">If parameter "id" is not a valid Xsd Id</exception>
void DeleteRelationship(string id);
/// <summary>
/// Returns a collection of all the Relationships that are
/// owned by the package
/// </summary>
/// <returns></returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is write only, no information can be retrieved from it</exception>
PackageRelationshipCollection GetRelationships();
/// <summary>
/// Returns a collection of filtered Relationships that are
/// owned by the package
/// The filter string is compared with the type of the relationships
/// in a case sensitive and culture ignorant manner.
/// </summary>
/// <returns></returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is write only, no information can be retrieved from it</exception>
/// <exception cref="ArgumentNullException">If parameter "relationshipType" is null</exception>
/// <exception cref="ArgumentException">If parameter "relationshipType" is an empty string</exception>
PackageRelationshipCollection GetRelationshipsByType(string relationshipType);
/// <summary>
/// Retrieve a relationship per ID.
/// </summary>
/// <param name="id">The relationship ID.</param>
/// <returns>The relationship with ID 'id' or throw an exception if not found.</returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is write only, no information can be retrieved from it</exception>
/// <exception cref="ArgumentNullException">If parameter "id" is null</exception>
/// <exception cref="System.Xml.XmlException">If parameter "id" is not a valid Xsd Id</exception>
/// <exception cref="InvalidOperationException">If the requested relationship does not exist in the Package</exception>
PackageRelationship GetRelationship(string id);
/// <summary>
/// Returns whether there is a relationship with the specified ID.
/// </summary>
/// <param name="id">The relationship ID.</param>
/// <returns>true iff a relationship with ID 'id' is defined on this source.</returns>
/// <exception cref="ObjectDisposedException">If this Package object has been disposed</exception>
/// <exception cref="IOException">If the package is write only, no information can be retrieved from it</exception>
/// <exception cref="ArgumentNullException">If parameter "id" is null</exception>
/// <exception cref="System.Xml.XmlException">If parameter "id" is not a valid Xsd Id</exception>
bool RelationshipExists(string id);
}
internal class PackageAdapt : IPackage
{
public PackageAdapt(Package package)
{
Package = package;
}
private System.IO.Packaging.Package Package { get; }
/// <inheritdoc />
public FileAccess FileOpenAccess => Package.FileOpenAccess;
/// <inheritdoc />
public PackageProperties PackageProperties => Package.PackageProperties;
/// <inheritdoc />
public PackagePart CreatePart(Uri partUri, string contentType)
{
return Package.CreatePart(partUri, contentType);
}
/// <inheritdoc />
public PackagePart CreatePart(Uri partUri, string contentType, CompressionOption compressionOption)
{
return Package.CreatePart(partUri, contentType, compressionOption);
}
/// <inheritdoc />
public PackagePart GetPart(Uri partUri)
{
return Package.GetPart(partUri);
}
/// <inheritdoc />
public bool PartExists(Uri partUri)
{
return Package.PartExists(partUri);
}
/// <inheritdoc />
public void DeletePart(Uri partUri)
{
Package.DeletePart(partUri);
}
/// <inheritdoc />
public PackagePartCollection GetParts()
{
return Package.GetParts();
}
/// <inheritdoc />
public void Close()
{
Package.Close();
}
/// <inheritdoc />
public void Flush()
{
Package.Flush();
}
/// <inheritdoc />
public PackageRelationship CreateRelationship(Uri targetUri, TargetMode targetMode, string relationshipType)
{
return Package.CreateRelationship(targetUri, targetMode, relationshipType);
}
/// <inheritdoc />
public PackageRelationship CreateRelationship(Uri targetUri, TargetMode targetMode, string relationshipType, string id)
{
return Package.CreateRelationship(targetUri, targetMode, relationshipType, id);
}
/// <inheritdoc />
public void DeleteRelationship(string id)
{
Package.DeleteRelationship(id);
}
/// <inheritdoc />
public PackageRelationshipCollection GetRelationships()
{
return Package.GetRelationships();
}
/// <inheritdoc />
public PackageRelationshipCollection GetRelationshipsByType(string relationshipType)
{
return Package.GetRelationshipsByType(relationshipType);
}
/// <inheritdoc />
public PackageRelationship GetRelationship(string id)
{
return Package.GetRelationship(id);
}
/// <inheritdoc />
public bool RelationshipExists(string id)
{
return Package.RelationshipExists(id);
}
} |
Not sure what the |
@twsouthwick This is my link dotnet/runtime#26084 (comment) |
@twsouthwick The Package is already abstract but not all the API can override. And we can use the IPackage to define the custom Package that can handle the exception |
I think what you'll want to do is open a new issue in that repo with the request to add it. However, it may be better to make the methods that are needed to override virtual rather than a whole new interface, and doesn't really pertain to this issue itself (ie ZipPackage and friends are internal and that's where the actual logic for opening packages and the Uri creation occurs) |
Considering Excel opens such files with no warnings and even "Open XML SDK 2.5 Productivity Tool" reports no warnings, I would consider this issue critical. |
Yes, I agree this is critical. I'm trying to push an API into System.IO.Packaging to allow this. There isn't a good way for us to handle this as the URI parsing is deep within the packaging APIs. The reason the productivity tool works is it is probably targeting 4.0 (even when run on 4.8 it will be quirked with the old behavior). This would be seen when targeting .NET Framework 4.5+, or any .NET Core. Any apps you create that target .NET 4.0, while running on .NET 4.8, will also work. |
I have a prototype of a tool that can be used to sanitize these malformed URIs I'd like to add for the next minor version. I'm interested though in what are the scenarios you all have seen that cause this? I'm especially interested in any where the malformed uri needs to remain there. The examples I've seen are hyperlinks that can just be removed or changed to be appropriately formed Uris. |
So in this file, what do you want to do with the hyperlinks? They're invalid mailto links - would you want to update them to something valid, or just remove them? |
The easiest is to treat all hyperlinks as string. I am already doing a workaround. |
Sorry, please delete this file, it cannot be posted on a public place. |
We can't change things to use hyperlinks as strings internally as they use Uris in public. What workaround are you using? |
private void OpenDocument()
{
// TODO remove this workaround once the issue is fixed
// https://github.com/OfficeDev/Open-XML-SDK/issues/715
SpreadsheetDocument result = null;
try
{
result = SpreadsheetDocument.Open(this._fileStream, false);
}
catch (OpenXmlPackageException e)
{
if (e.ToString().Contains("Invalid Hyperlink", StringComparison.Ordinal))
{
var stream = new MemoryStream();
try
{
this._fileStream.Position = 0;
this._fileStream.CopyTo(stream);
this._fileStream.Dispose();
this._fileStream = stream;
this.FixInvalidUri(this._fileStream, brokenUri =>
{ return new Uri("http://broken-link/"); });
}
catch
{
stream.Dispose();
throw;
}
result = SpreadsheetDocument.Open(this._fileStream, false);
}
}
this._spreadsheetDocument = result;
}
private void FixInvalidUri(Stream fs, Func<string, Uri> invalidUriHandler)
{
XNamespace relNs = "http://schemas.openxmlformats.org/package/2006/relationships";
using (ZipArchive za = new ZipArchive(fs, ZipArchiveMode.Update, true))
{
foreach (var entry in za.Entries.ToList())
{
if (!entry.Name.EndsWith(".rels", StringComparison.Ordinal))
{
continue;
}
bool replaceEntry = false;
XDocument entryXDoc = null;
using (var entryStream = entry.Open())
{
try
{
entryXDoc = XDocument.Load(entryStream);
if (entryXDoc.Root != null && entryXDoc.Root.Name.Namespace == relNs)
{
var urisToCheck = entryXDoc
.Descendants(relNs + "Relationship")
.Where(r => r.Attribute("TargetMode") != null && (string)r.Attribute("TargetMode") == "External");
foreach (var rel in urisToCheck)
{
var target = (string)rel.Attribute("Target");
if (target != null)
{
try
{
Uri uri = new Uri(target);
}
catch (UriFormatException)
{
Uri newUri = invalidUriHandler(target);
rel.Attribute("Target").Value = newUri.ToString();
replaceEntry = true;
}
}
}
}
}
catch (XmlException)
{
continue;
}
}
if (replaceEntry)
{
var fullName = entry.FullName;
entry.Delete();
var newEntry = za.CreateEntry(fullName);
using (StreamWriter writer = new StreamWriter(newEntry.Open()))
using (XmlWriter xmlWriter = XmlWriter.Create(writer))
{
entryXDoc.WriteTo(xmlWriter);
}
}
}
}
} |
So your workaround is to rewrite the Uri? Do you rewrite them back later? |
No. It is better than a file that I cannot read at all. |
No to both? |
Not rewiring back as I don't need them for my purposes (someone else might need them though). |
Ok, so the sanitizer I'm looking at adding in would be a way to rewrite it, which is what you're already doing. |
For anyone interested, I've opened a PR to explore an API that performs functionality similar to what @abelykh0 discusses and linked to in stackoverflow |
OMG thank you for posting that @abelykh0 I did not want to write that. |
I'm going to reopen this to see if we can fix it better for a v3.0.0. I'm thinking we will go with @lindexi suggestion of creating an abstraction of package we can control more. I believe we can then handle things a bit better. |
@twsouthwick Thank you and I write the CompatiblePackage.cs which can ignore some error. And I publish our application with the CompatiblePackage.cs three months ago, and I find that more than half of the problems were fixed. |
cc @jahav |
@lindexi looks like you're implementing a new version of |
@twsouthwick This type fixes some problems, but its design has many flaws, which is why I haven't submitted its code here yet. |
See #1295 for the package abstraction being proposed |
I believe we can solve this in a better way similar to #807 (comment): (1) Override IPackageFeature to identify malformed relationships ** If the package is read only, we can copy it behind the scenes and operate on a copy of it so that we can manage these illegal uris ** This will be easy for File/Stream overloads.... not so easy for Package |
For those interested, I have support for this in #1322 following the pattern I described above. |
Description
The PPTX document that include an illegal uri in Relationship will make the
System.IO.Packaging.InternalRelationshipCollection.ProcessRelationshipAttributes
throw an exception toOpenXmlPart.Load
.And the
OpenXmlPart.Load
can not catch the exception and it will break thePresentationDocument.Open
.Information
Repro
Here is the hyperlink.pptx file : https://1drv.ms/p/s!AiKjiQqRWKThlv5zkY4HoRvvJ3Ppdg?e=3kfdNU
Observed
The PresentationDocument.Open throw the UriFormatException exception
Because the ppt\slides_rels\slide1.xml.rels contain this string
As you can see, the
Target
is not an uri.Expected
We can design an exception handle API, and we can handle some illegal document.
See #38 #274 #297 #298
And the #298 only add more information but can not tolerate errors.
And just as @twsouthwick says, we can not fix this in the OpenXML SDK project #297 (comment) , but I think we can tolerate some errors
The text was updated successfully, but these errors were encountered: