Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODT: relative links #3524

Closed
HT2313 opened this issue Mar 22, 2017 · 9 comments
Closed

ODT: relative links #3524

HT2313 opened this issue Mar 22, 2017 · 9 comments

Comments

@HT2313
Copy link

HT2313 commented Mar 22, 2017

Hi,
I'm using pandoc version 1.19.2.1.

When generating ODT file with relative link to some document that link is wrongly calculated.

Generating the ODT file

Let's have following directory structure:

test/
  |- attachments/
      |- test.txt
  |- links.xml
  |- links.html

For the following DocBook file (links.xml):

<?xml version="1.0" encoding="UTF-8"?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
  <simpara>
    <link xl:href="./attachments/test.txt">link to txt file</link>
  </simpara>
</article>

Or HTML file(lnks.html):

<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
  <body>
    <p>
      <a href="./attachments/test.txt">link to txt file</a>
    </p>
  </body>
</html>

Generated ODT file, which is created in test directory, has following content:

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" office:version="1.2">
  <office:font-face-decls>
    <style:font-face style:name="Courier New" style:font-family-generic="modern" style:font-pitch="fixed" svg:font-family="'Courier New'" />
  </office:font-face-decls>
  <office:automatic-styles>
  </office:automatic-styles>
  <office:body>
    <office:text>
      <text:p text:style-name="Text_20_body">
        <text:a xlink:type="simple" xlink:href="./attachments/test.txt" office:name="">
          <text:span text:style-name="Definition">link to txt file</text:span>
        </text:a>
      </text:p>
    </office:text>
  </office:body>
</office:document-content>

While these seems to be correct when generated file is opened with LibreOffice the link is unusable.
To have correct link to test.txt file LibreOffice (version: 5.3.1.2) expect following structure:

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" office:version="1.2">
  <office:font-face-decls>
    <style:font-face style:name="Courier New" style:font-family-generic="modern" style:font-pitch="fixed" svg:font-family="'Courier New'" />
  </office:font-face-decls>
  <office:automatic-styles>
  </office:automatic-styles>
  <office:body>
    <office:text>
      <text:p text:style-name="Text_20_body">
        <text:a xlink:type="simple" xlink:href="../attachments/test.txt" office:name="">
          <text:span text:style-name="Definition">link to txt file</text:span>
        </text:a>
      </text:p>
    </office:text>
  </office:body>
</office:document-content>

So the file path is not ./attachments/test.txt but it is ../attachments/test.txt.

Generating from ODT file

Let's have following directory structure:

test/
  |- attachments/
      |- test.txt
  |- links.odt

Where llinks.odt is file created with LibreOffice and contains link to test.txt file:

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:officeooo="http://openoffice.org/2009/office" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:drawooo="http://openoffice.org/2010/draw" xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org/TR/css3-text/" office:version="1.2">
  <office:scripts/>
  <office:font-face-decls>
    <style:font-face style:name="FreeSans1" svg:font-family="FreeSans" style:font-family-generic="swiss"/>
    <style:font-face style:name="Liberation Serif" svg:font-family="&apos;Liberation Serif&apos;" style:font-family-generic="roman" style:font-pitch="variable"/>
    <style:font-face style:name="Liberation Sans" svg:font-family="&apos;Liberation Sans&apos;" style:font-family-generic="swiss" style:font-pitch="variable"/>
    <style:font-face style:name="Droid Sans Fallback" svg:font-family="&apos;Droid Sans Fallback&apos;" style:font-family-generic="system" style:font-pitch="variable"/>
    <style:font-face style:name="FreeSans" svg:font-family="FreeSans" style:font-family-generic="system" style:font-pitch="variable"/>
  </office:font-face-decls>
  <office:automatic-styles>
    <style:style style:name="T1" style:family="text">
      <style:text-properties officeooo:rsid="000d9397"/>
    </style:style>
    <style:style style:name="T2" style:family="text">
      <style:text-properties officeooo:rsid="000e0511"/>
    </style:style>
  </office:automatic-styles>
  <office:body>
    <office:text>
      <text:sequence-decls>
        <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Table"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Text"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
      </text:sequence-decls>
      <text:p text:style-name="Standard">
        <text:a xlink:type="simple" xlink:href="../attachments/test.txt" text:style-name="Internet_20_link" text:visited-style-name="Visited_20_Internet_20_Link">
          <text:span text:style-name="T1">Test</text:span>
        </text:a>
      </text:p>
    </office:text>
  </office:body>
</office:document-content>

Following is the content of HTML file generated with command:
pandoc -s -f odt -t html test/links.odt -o test/links.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<p><a href="../attachments/test.txt">Test</a></p>
</body>
</html>

And expected is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<p><a href="./attachments/test.txt">Test</a></p>
</body>
</html>
@aggsol
Copy link

aggsol commented Jul 3, 2018

I have broken relative links when converting from docbook to odt. Is this the same issue?

@rpuntaie
Copy link

I'm using pandoc 2.2.3.2.

Here an example to reproduce the problem on windows:

    echo "`a <./a.odt#am>`_" | pandoc -f rst -t odt -o b.odt
    echo "in a" |pandoc -f rst -t odt -o a.odt

    start b.odt

It works when the first line is replaced with this

    echo "`a <../a.odt#am>`_" |pandoc -f rst -t odt -o b.odt

This seems like a LibreOffice error, but they cannot correct it because there are already too many documents out there.

So I think pandoc should consider this special behavior of LibreOffice as ODT spec
and manipulate the linked file path such that it works.

There is no problem with docx.

@maxnikulin
Copy link

Does anyone have an idea why full path with the file name is treated as xml:base not just the directory, so ../ prefix is required for a link to a file in the same directory? Perhaps some clause in open document format spec or a comment in open/libreoffice bug tracker may clarify a reason. Can it be related to some kind of embedded or attached documents? Subdocuments from a master document or external OLE objects are referenced through ../ as well. Links to embedded OLE objects are created with just anchor part (#...).

@jgm
Copy link
Owner

jgm commented Oct 24, 2022

Related: https://bz.apache.org/ooo/show_bug.cgi?id=98211

Key point:

The original issue that was reported is that relative hyperlinks that reference
a local file are wrong. Actually, they are not wrong. They are correct. How
relative URIs behave is defined in the ODF specification. One may look at
section 2.7 in
http://www.oasis-open.org/committees/download.php/35090/OpenDocument-v1.2-part3-cd1.odt
or 17.5 of
http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.odt

Essentially, relative URIs are resolved as like they would if the ODF document
would not be a zip file, but a folder with the name of the zip file. Which
means, that an URI "./styles.xml" contained in the file "content.xml" which is
in the root of the zip file references the file "styles.xml" in the root of the
zip file. To reference a file outside the zip file, one has to add "../".

@jgm
Copy link
Owner

jgm commented Oct 24, 2022

Here is the section 2.7 from ODT spec:

Usage of IRIs Within Packages

Within the files contained in a package, relative IRIs may be used to
reference other files within the same package.

OpenDocument Package Consumers shall resolve relative IRIs that occur
within a file of a package as follows:

  • The file entry path is the file name of the file within the Zip file
    which contains the relative IRI, including its relative path.
  • The package base IRI is the base IRI which would be established for
    the package itself as defined in §5.1 of [RFC3986].

  • A file entry base IRI is constructed by interpreting the file entry
    path as a relative IRI, and by resolving this relative IRI to an
    absolute one as defined in §5.2 of [RFC3986] using the package base
    IRI
    as base URI.

  • Relative IRI references shall be resolved as defined in §5.2 of
    [RFC3986] using the file entry base IRI as base URI.

  • If a relative IRI

    • matches the rule for “relative-ref” defined in §4.2 of [RFC3986],
      and

    • matches the constrains for “relative-path” references defined in
      §4.2 of [RFC3986], and

    • if no path segments that are removed from the output buffer during
      the execution of the “Remove Dot Segments” step (§5.2.5 of
      [RFC3986]) have its origin in the package base IRI,

then the relative IRI shall interpreted as a package file entry
reference
. A package file entry reference is a reference to a file
within the same package as the file containing the relative reference.
The relative path of this file within the ZIP file is determined by the
following procedure:

  • The package base IRI is removed from the absolute IRI to which the
    relative IRI has been resolved.

  • If the resulting relative IRI starts with a “/” character (U+002F
    SOLIDUS), then the “/” character is removed.

  • A fragment identifier, if it does exist, is removed.

  • The resulting relative IRI is interpreted as a file name within the
    package, that is, as the name of a file including its relative path
    within the Zip file.

  • If a fragment identifier has been removed in a previous step, it may
    be resolved as defined for the media type of the referenced file.

Note: File whose relative path starts with “META-INF/” are
considered to be part of the OpenDocument package rather than of the
content stored within the package. Therefore, different rules regarding
the resolution of relative IRIs may apply. In particular the base URI
for the resolution of relative IRIs may be the package base IRI rather
than the file entry base IRI.

@jgm
Copy link
Owner

jgm commented Oct 24, 2022

If this is all correct, then the fix is simple: we just need to add a ../ to all relative URIs in xref attributes.

@jgm jgm closed this as completed in 9496ce8 Oct 24, 2022
@jgm
Copy link
Owner

jgm commented Oct 24, 2022

Nice to fix this five-year-old bug!

jgm added a commit that referenced this issue Oct 24, 2022
Revise commit 9496ce8
so it doesn't change image links. (These should have already been
adjusted.)

See #3524.
@maxnikulin
Copy link

@jgm, thank you for the fix of the ODT writer.

Doesn't the reader need a similar change? Despite the issue is tagged only as "writer", the original report describes a case of ODT to HTML conversion as well.

@jgm
Copy link
Owner

jgm commented Oct 25, 2022

Oh yes, then it probably does!

jgm added a commit that referenced this issue Oct 25, 2022
Don't alter the link if the path is empty.
jgm added a commit that referenced this issue Oct 25, 2022
ODT adds a `../` to relative links (see #3524); this needs to be
removed when converting from ODT.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants