Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessibility issue: tables are not outputting <thead> and <th> tags #126

Open
Dan503 opened this issue Jul 25, 2017 · 10 comments
Open

Accessibility issue: tables are not outputting <thead> and <th> tags #126

Dan503 opened this issue Jul 25, 2017 · 10 comments

Comments

@Dan503
Copy link

Dan503 commented Jul 25, 2017

In Microsoft Word, you can define if you wish your table to contain a < thead > and first column < th > tags through this toolbar:

Image of Word document toolbar

However these settings are being ignored by Mammoth when a word document is being processed. Instead it strips all table headings out and simply outputs a table of basic table cells.

For the following table, I used the settings used in the image above.

Example table

This is the output that I was expecting Mammoth to output:

<table>
    <thead>
        <tr>
            <th>
                <p>Name</p>
            </th>
            <th>
                <p>Number</p>
            </th>
            <th>
                <p>Year</p>
            </td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <th>
                <p>Thing</p>
            </th>
            <td>
                <p>123</p>
            </td>
            <td>
                <p>2017</p>
            </td>
        </tr>
        <tr>
            <th>
                <p>Other thing</p>
            </th>
            <td>
                <p>458</p>
            </td>
            <td>
                <p>2016</p>
            </td>
        </tr>
    </tbody>
</table>

This is the markup I got though:

<table>
    <tbody>
        <tr>
            <td>
                <p>Name</p>
            </td>
            <td>
                <p>Number</p>
            </td>
            <td>
                <p>Year</p>
            </td>
        </tr>
        <tr>
            <td>
                <p>Thing</p>
            </td>
            <td>
                <p>123</p>
            </td>
            <td>
                <p>2017</p>
            </td>
        </tr>
        <tr>
            <td>
                <p>Other thing</p>
            </td>
            <td>
                <p>458</p>
            </td>
            <td>
                <p>2016</p>
            </td>
        </tr>
    </tbody>
</table>

Mammoth version: v1.4.2
OS: Windows 10
node.js version: 6.9.4

@mwilliamson
Copy link
Owner

mwilliamson commented Jul 25, 2017 via email

@Dan503
Copy link
Author

Dan503 commented Jul 25, 2017

Here is a minimal example word document: mammoth-table-issue.docx

@mwilliamson
Copy link
Owner

mwilliamson commented Jul 25, 2017

To support this, it looks like w:tbl/w:tblPr/w:tblLook/@w:firstRow and w:tbl/w:tblPr/w:tblLook/@w:firstColumn needs to be read.

It's also worth noting that thead and th tags should be created if you mark rows as being repeated header rows.

@Dan503
Copy link
Author

Dan503 commented Aug 8, 2017

I'm just wondering, is this bug likely to be fixed by the 1st of September?

My company has a site going live in a few months and it depends on this bug being fixed for it to pass accessibility.

@mwilliamson
Copy link
Owner

Adding support should be reasonably straightforward, but I'm not sure when I'll get time to work on this (since it's just a side-project). In other words, I wouldn't rely on it.

@Dan503
Copy link
Author

Dan503 commented Aug 14, 2017

I'm planning on doing the fix myself as a pull request.

Can you help point me in the right direction so I know where to apply the fix?

@mwilliamson
Copy link
Owner

There are two main places you'd need to look at. One is the code that parses the document in lib/docx/body-reader.js. The existing code that handles table headers is probably a good feature to look at for a rough idea of how to implement this. For header rows, you probably want to reuse the same property i.e. isHeader on table rows, plus add a property to handle header columns. You then need to update the conversion to HTML in lib/document-to-html.js. Header rows will already be handled by the existing code, but you'd need to add support for header columns.

Each module should be covered by tests. The test directory structure should mirror the directory structure of the code under test, so hopefully they're reasonably straightforward to navigate around. Again, looking for the existing support for table headers is probably a good place to start.

Dan503 pushed a commit to Dan503/mammoth.js that referenced this issue Aug 15, 2017
This adds table heading support using the "Header Row" and "First Column" settings in Microsoft Word Table design tools.
@grantstead
Copy link

@mwilliamson , I'm facing exactly the same issue on the python implementation. How can I go about getting a fix for it there? Is the JS code comparable that I could migrate it, or would the approach be different?

Thanks!
Grant

P.S.: Great library by the way! Thanks for implementing it.

@mwilliamson
Copy link
Owner

The Python implementation is fairly similar to the JavaScript implementation, but it's worth noting that they (should!) have the same level of support for tables.

@ulfgj
Copy link

ulfgj commented Nov 18, 2024

Was there a solution in the end for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants