-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whitespace tags stripped from plaintext values #142
Comments
thanks for filing! ...and ugh, looks like it's mf2py. i assume this isn't the expected result (v1.0.5): mf2py.parse(doc="""
<article class="h-entry">
<div class="e-content p-name">foo bar<br />baz <br><br> baj</div>
</article>""", url='http://x/') results in: {
"items": [{
"type": ["h-entry"],
"properties": {
"content": [{
"html": "foo bar<br/>baz <br/><br/> baj",
"value": "foo barbaz baj"
}],
"name": ["foo barbaz baj"]
}
}
]
} @kevinmarks @kartikprabhu any thoughts? |
btw @kevinmarks @kartikprabhu mf2py's release management might deserve a bit of love. https://pypi.org/project/mf2py/ just says Project Description UNKNOWN, and the last release tagged in github is 1.0.0 from oct 2015. :P https://github.com/microformats/mf2py/releases |
also for background, the post that inspired this is https://aaronparecki.com/2018/03/30/18/ , and php-mf2 correctly converts its |
my first repro (above) was mf2py 1.0.5, beautifulsoup4 4.4.1, html5lib 0.9999999. i tried just now on microformats/mf2py master tip, beautifulsoup4 4.6.0, and html5lib 1.0.1. same result. |
@snarfed The pypi releases are owned by @tommorris currently so he would have to transfer ownersip or something to do those releases. |
thanks for looking! aha, 2 and 8 on https://pin13.net/mf2/whitespace.html do indeed look like this bug. @kartikprabhu sounds like you're skeptical of fixing this in mf2py? or you just think it will be difficult? or you'd want to see it in the parsing spec first? @aaronpk, @tantek, @kevinmarks, thoughts? as for pypi, understood. ask @tommorris to transfer ownership! I'm sure he will, since he did for the repo. people generally install from pypi, not github (usually), so we definitely need to be able to continue releasing there. |
@snarfed I am not skeptical of fixing this; there is a whitespace algorithm by @Zegnat, but TBH it looks like a lot of DOM tree parsing work. For pypi there is already an issue open microformats/mf2py#93 |
aha, got it. understood. thanks for the explanation and link, and props to @Zegnat for writing it. not sure where that leaves us, but let me know if you need anything else from me! |
@snarfed whitespace rules are now in experimental version https://github.com/kartikprabhu/mf2py/tree/experimental |
The example @snarfed used in #142 (comment) "items": [
{
"type": [
"h-entry"
],
"properties": {
"content": [
{
"html": "foo bar<br/>baz <br/><br/> baj",
"value": "foo bar\nbaz\n\nbaj"
}
],
"name": [
"foo bar\nbaz\n\nbaj"
]
}
}
] |
@kartikprabhu yay, agreed, my tests pass with that new code too. can't wait for a release! interestingly though, that branch fails a couple other of my tests. looks like implied name now includes <body class="h-entry">
<div class="p-author h-card">
<a href="http://li/nk">my name</a>
<img class="u-photo" src="http://pic/ture" />
</div>
</body> results in name (btw long lived dev branches are scary, but that's a separate conversation. :P) |
@snarfed yes that is correct according to updated |
huh, ok. seems ugly, but understood! |
sadly this didn't make it into the recent mf2py 1.1.0 release. ah well. next one hopefully!
|
for snarfed/bridgy#828, #145, #142, etc.
fixes #142, fixes #145, fixes snarfed/bridgy#756, for snarfed/bridgy#828
details in #828, snarfed/granary#142, snarfed/granary@a989c3e, etc.
details in #828, snarfed/granary#142, snarfed/granary@a989c3e, etc.
details in #828, snarfed/granary#142, snarfed/granary@a989c3e, etc.
done! whitespace tags are now correctly converted in jsonfeed titles. example: https://granary.io/url?input=html&output=jsonfeed&url=https%3A%2F%2Faaronparecki.com%2F2018%2F07%2F19%2F26%2F note that granary still generates a jsonfeed title for that post, even though it's technically a note (?). |
I recently switched my website to include
<br>
tags instead of newlines for my notes. This means I now have HTML markup likeIt appears that when Granary is converting this to JSON Feed (haven't checked other formats), it is stripping the tags completely instead of converting them to whitespace, so the example above would appear as
HelloWorld
in the JSON Feed.This then causes a problem when comparing the text value to the HTML value, and Granary thinks they are different so it creates a
title
for the note. Then my posts appear smushed in Micro.blog.I think Granary should recognize that a
tag is meaningful and replace that with a newline so that the plaintext conversion works properly.
The text was updated successfully, but these errors were encountered: