Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support conversion of header rows in tables without th tag #83

Merged

Conversation

huuyafwww
Copy link
Contributor

Hello !
As I stated in the title, I would like to see support for conversion of header rows in tables without th tags.
The original test code was implemented as not supporting the conversion of header rows in tables with missing th tags, so I would also like to know what the intent was.
If there are any problems with this fix, please let me know.

@huuyafwww
Copy link
Contributor Author

This is what I have modified.

sample1

missing th, without tbody and thead.

<table>
  <tr>
    <td>key</td>
    <td>name</td>
  </tr>
  <tr>
    <td>hoge</td>
    <td>foo</td>
  </tr>
</table>

before

|  |  |
| --- | --- |
| key | name |
| hoge | foo |

after

| key | name |
| --- | --- |
| hoge | foo |

sample2

missing th, with tbody, without thead.

<table>
  <tbody>
    <tr>
      <td>key</td>
      <td>name</td>
    </tr>
    <tr>
      <td>hoge</td>
      <td>foo</td>
    </tr>
  </tbody>
</table>

before

|  |  |
| --- | --- |
| key | name |
| hoge | foo |

after

| key | name |
| --- | --- |
| hoge | foo |

sample3

missing th, with tbody and thead.

<table>
  <thead>
    <tr>
      <td>key</td>
      <td>name</td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>hoge</td>
      <td>foo</td>
    </tr>
  </tbody>
</table>

before

| key | name |
| hoge | foo |

after

| key | name |
| --- | --- |
| hoge | foo |

@SurajDonthi
Copy link

@matthewwithanm This is a much-needed bug fix for Markdown tables. Please merge this PR asap.

@huuyafwww
Copy link
Contributor Author

huuyafwww commented Apr 15, 2023

@SurajDonthi

Thanks for your support!
This modification is needed as a markdown table, though,
I needed this fix right away, so I used the following code instead, based on "Creating a Custom Converter" in README.md.

from markdownify import MarkdownConverter

class CustomMarkdownConverter(MarkdownConverter):

  def convert_tr(self, el, text, convert_as_inline):
    cells = el.find_all(['td', 'th'])
    is_headrow = all([cell.name == 'th' for cell in cells])
    is_headrow = (
      all([cell.name == 'th' for cell in cells])
      or (not el.previous_sibling and not el.parent.name == 'tbody')
      or (not el.previous_sibling and el.parent.name == 'tbody' and len(el.parent.parent.find_all(['thead'])) < 1)
    )
    overline = ''
    underline = ''
    if is_headrow and not el.previous_sibling:
      # first row and is headline: print headline underline
      underline += '| ' + ' | '.join(['---'] * len(cells)) + ' |' + '\n'
    elif (not el.previous_sibling
      and (el.parent.name == 'table'
          or (el.parent.name == 'tbody'
            and not el.parent.previous_sibling))):
      # first row, not headline, and:
      # - the parent is table or
      # - the parent is tbody at the beginning of a table.
      # print empty headline above this row
      overline += '| ' + ' | '.join([''] * len(cells)) + ' |' + '\n'
      overline += '| ' + ' | '.join(['---'] * len(cells)) + ' |' + '\n'
    return overline + '|' + text + '\n' + underline


def convert(html, **options):
  return CustomMarkdownConverter(**options).convert(html)


html = '''
<table>
  <tr>
    <td>key</td>
    <td>name</td>
  </tr>
  <tr>
    <td>hoge</td>
    <td>foo</td>
  </tr>
</table>
'''

convert(
  html,
  # Options for markdownify,
)

@AlexVonB AlexVonB merged commit e4df412 into matthewwithanm:develop Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants