Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add table 'rowspan' support #121

Open
ffolkes1911 opened this issue Apr 11, 2024 · 1 comment
Open

Add table 'rowspan' support #121

ffolkes1911 opened this issue Apr 11, 2024 · 1 comment

Comments

@ffolkes1911
Copy link

Had a quick look at the code and it seems that there's support for 'colspan' attribute, but not 'rowspan'. Any plans to add support?

HTML example
<!DOCTYPE html>
<html>
<head>
<style>
table, th, td {
  border: 1px solid black;
}
</style>
</head>
<body>

<h1>The td rowspan attribute</h1>

<table>
  <tr>
    <th>Month</th>
    <th>Savings</th>
    <th>Savings for holiday!</th>
  </tr>
  <tr>
    <td>January</td>
    <td>$100</td>
    <td rowspan="2">$50</td>
  </tr>
  <tr>
    <td>February</td>
    <td>$80</td>
  </tr>
</table>

</body>
</html>
Parsed MD table
The td rowspan attribute
========================


| Month | Savings | Savings for holiday! |
| --- | --- | --- |
| January | $100 | $50 |
| February | $80 |
Desired MD output
The td rowspan attribute
========================


| Month | Savings | Savings for holiday! |
| --- | --- | --- |
| January | $100 | $50 |
| February | $80 | |
@andrewDoing
Copy link

I had this issue as well, and I was able to get the desired behavior with a customization.

Requires:

  • pandas
  • tabulate
  • html5lib
import pandas as pd

class MyMarkdownConverter(MarkdownConverter):
    """A custom MarkdownConverter.

    This class is a subclass of the MarkdownConverter class from the markdownify library.
    It overrides the convert_table, convert_th, convert_tr, convert_td, convert_thead, and convert_tbody methods
    to provide a No-Op for the <th>, <tr>, <td>, <thead>, and <tbody> tags, respectively.

    For <table> tags, it converts the table to a DataFrame and then converts the DataFrame to Markdown.
    This gives us the desired behavior of handling rowspan, which markdownify does not handle.
    """

    def convert_table(self, el, text, convert_as_inline):
        try:
            df = pd.read_html(StringIO(str(el)))[0]
            # replace nan with empty string
            df = df.fillna("")
        except Exception as e:
            print(f"Error converting table to DataFrame: {str(el)}")
            print(e)

        # Convert DataFrame to Markdown
        return df.to_markdown(index=False)

    def convert_th(self, el: NavigableString, text, convert_as_inline):
        """This method is empty because we want a No-Op for the <th> tag."""
        # return the html as is
        return str(el)

    def convert_tr(self, el: NavigableString, text, convert_as_inline):
        """This method is empty because we want a No-Op for the <tr> tag."""
        return str(el)

    def convert_td(self, el: NavigableString, text, convert_as_inline):
        """This method is empty because we want a No-Op for the <td> tag."""
        return str(el)

    def convert_thead(self, el: NavigableString, text, convert_as_inline):
        """This method is empty because we want a No-Op for the <thead> tag."""
        return str(el)

    def convert_tbody(self, el: NavigableString, text, convert_as_inline):
        """This method is empty because we want a No-Op for the <tbody> tag."""
        return str(el)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants