Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added convert_all func #163

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Divyesh06
Copy link

Make it possible for custom markdown classes to apply a particular rule to all tags.

Usage

class MarkdownConverter(MarkdownConverter):
    def convert_all(self, el, text, convert_as_inline):
        #Do something with text
        return text

Example - Handling Text Align

class CustomMarkdownConverter(MarkdownConverter):
    def convert_all(self,el,text, convert_as_inline):
        alignment = ""
        if 'style' in el.attrs and 'text-align' in el.attrs['style']:
            style = el.attrs['style']
            if "text-align" in style:
                alignment = style.split("text-align:")[1].split(";")[0].strip()
        
        if alignment:
            return f"[align={alignment}]{text}[/align]"
        return text

converter = CustomMarkdownConverter()
converter.convert('<p style="text-align: center"> Hello World </p>')

@chrispy-snps
Copy link
Collaborator

@Divyesh06 - your code implements convert_all() as a preprocessing function, such that the tag-specific conversion function is still called afterwards.

What is the use case you have in mind for this?

@Divyesh06
Copy link
Author

Divyesh06 commented Jan 2, 2025

@chrispy-snps I just listed the example of text-align. So let's say I want a custom markdown syntax for aligning text. I will find it hard to do it without the convert_all function. Because I will want align to work with all kinds of HTML tags (like paragraphs, headings, links, etc.) and need to write separate convert functions for all these tags.

With convert_all, I can make this work easily with all kinds of tags.

Of course, this is just an example. But I think it can be used for all kinds of custom markdown syntax that require some CSS style to be added.

@chrispy-snps
Copy link
Collaborator

@Divyesh06 - at the point where your convert_all function is called, all the contents of the child elements are already rendered into a monolithic Markdown text string. Are you saying that you would write code that would go into this string and reformat it?

Can you give me a specific example of what would be passed as input to your formatting function, and what the function would return? You used the example of align - can you show an example of how this would work?

@Divyesh06
Copy link
Author

@chrispy-snps

Yes, I already mentioned that in the usage. Simply putting, it works just like other convert functions like convert_a, convert_p etc. It just doesn't need a specific tag to activate and works for all HTML tags.

Let me further explain the example here.

First of all, we are just creating a custom markdown converter class which extends the MarkdownConverter class. Here, we have added a convert_all function with logic to detect text-align CSS property

class CustomMarkdownConverter(MarkdownConverter):
    def convert_all(self,el,text, convert_as_inline):
        alignment = ""
        if 'style' in el.attrs and 'text-align' in el.attrs['style']:
            style = el.attrs['style']
            if "text-align" in style:
                alignment = style.split("text-align:")[1].split(";")[0].strip()
        
        if alignment:
            return f"[align={alignment}]{text}[/align]"
        return text

converter = CustomMarkdownConverter()

Now if I pass a string that has different HTML tags with text-align like -

converter.convert("""
<p style="text-align: center"> Hello World </p>
<b style="text-align: center"> Hello World </b>
""")

The output I get will be

[align=center]Hello World[/align]

**[align=center] Hello World [/align]**

See how I didn't need to write separate logic for p and b tags? That's due to convert_all

Without convert_all, I would have to do something like -

class CustomMarkdownConverter(MarkdownConverter):
    def convert_p():
         ... #Logic for converting text align for p tag
         
    def convert_b():
         ... #Logic for converting text align for b tag
         
    def convert_h1():
         ... #Logic for converting text align for h1 tag
         
    #And this goes on forever

@chrispy-snps
Copy link
Collaborator

chrispy-snps commented Jan 3, 2025

Thanks @Divyesh06, I understand now.

Here are my thoughts:

  • I suggest including "preprocess" in the function name, as the tag-specific conversion function is still called.
  • We should probably implement a matching postprocessing function.
  • I would move the computation of the function variables to be class instance variables, as they only need to be derived from the configuration once.

Here is my suggestion for how the inner loop might look:

if not children_only:
    if self.preprocess_convert_fn:                                         # <----
        text = self.preprocess_convert_fn(node, text, convert_as_inline)   # <----

    convert_fn = getattr(self, 'convert_%s' % node.name, None)
    if convert_fn and self.should_convert_tag(node.name):
        text = convert_fn(node, text, convert_as_inline)

    if self.postprocess_convert_fn:                                        # <----
        text = self.postprocess_convert_fn(node, text, convert_as_inline)  # <----

@Divyesh06
Copy link
Author

Divyesh06 commented Jan 4, 2025

@chrispy-snps Makes sense. I will implement those changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants