-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve JF2 references #20
Comments
Just had a look at the code underlining Xray and… wow. 1008 lines of code to parse and massage an incoming feed to generate the results you are seeing here. 🤯 This project started as a straight forward implementation of I’d like to say this is something that I can look to improve, but not at the cost of working on Indiekit – even more so given including references is an option that’s disabled by default. Perhaps there’s a way of breaking this apart and looking to make smaller, incremental improvements (the list of empty children with only Open to suggestions… maybe this is something to put to the IndieWeb community to see if anyone would like to contribute parsing improvements? |
I agree that this is much more of a nice to have than some of the key indiekit work. I imagine that aiming to normalize all messy content would be an impossible task. I'd need to look more at xray and see if there is any set of agreed upon parsing specs to come up with a list of target improvements. I'll probably keep iterating on my own massaging logic I'm using with my Indiekit instance. I'd love to also include some Metaformats logic too to parse meta tags on sites without mf2. So far the main things I've added are finding the main h-entry and author but I'm sure I'll discover more as I reply to more sites with Indiekit. |
I split out a couple more specific tasks, however if you think these should be handled externally I can look at adding this kind of logic to a different library. |
Looking at the authorship spec you linked to in #22, I spotted It has’t been updated for 7 years, and unfortunately written in TypeScript, but maybe that could be used, or adapted for use here? |
Before I push for or try to contribute features to this library, I want check to make sure mftojf2 would be the best place for some of this functionality. Ideally we'd avoid different libraries downloading and parsing pages multiple times but some reworking of MF2 objects by different libraries could be composed. Like you mentioned earlier, I would like to see a little more collaboration and coordination of efforts around node libraries but I also wouldn't want to tie things to another package that no one has capacity to maintain. Relevant Libraries
Microformat parsing features
Opportunities to reuse logicLet me know what you think but I'd love to see this type of functionality we're discussing pushed to other libraries and used more flexibly by the node community. mf2utilsjs for cleaning up microformatsTurns out there is more on npm than I initially though. I hadn't seen mf2utilsjs before. Since it ports the well used python package, I think it has a lot of potential for being a really useful package. I would probably want to check with the maintainer to see if they are up for more community involvement. But assuming it is a reliable library, I could see a microformats-parser > mf2utilsjs > mf2tojf2 combo working well. Leave metaformats to initial parserSomething like Metaformats might be better as a feature of microformats-parser since that would require the fetching and parsing raw HTML to get meta tags. Implementing it in another library would duplicate that fetching work. I don't think that should be enabled by default, but microformats-parser already has a set of |
If you wanted to submit a PR that used mf2utiljs to clean up incoming Microformats to use in references, I think that would be really useful, and potentially solve this issue! I wonder if its a case of parsing the Microformats returned here with mf2utiljs: mf2tojf2/lib/fetch-references.js Line 37 in 3a0817d
|
From what I can tell mf2utiljs was a one-off personal project. I haven't gotten any response about being open to community involvement. I think we might need to implement the authorship and main entry algorithms separately. I'm considering creating a library but would prefer to avoid creating a separate package if it could be avoided. |
If you’d like to contribute a PR to add them to this project, I think that could work. These algorithms do seem to fall into the category of converting mf2 to JF2. At some point I also think it would make sense to ask about moving this project to the @microformats organisation, much like the new Node Microformats parser was, meaning this project can live alongside that project and mf2tojf2.py. |
Is your feature request related to a problem?
I've noticed that the data returned by references isn't as normalized as I'd like, leading to lots of extra properties and missing author properties. When I compare the output of https://xray.p3k.app/ to the references, xray is able to handle the output while Indiekit is less organized.
Example 1: https://jamesg.blog/2023/04/18/source-code-folder-names/
Example 2: https://aaronparecki.com/2023/04/24/8/lawyer
Describe the solution you’d like
I'd like the references to show a much simpler model, including the entry at the top level with author data included.
I'm guessing the solution probably rests in the mf2tojf2 package.
X-Ray Output 1: https://jamesg.blog/2023/04/18/source-code-folder-names/
X-Ray Output 2: https://aaronparecki.com/2023/04/24/8/lawyer
Describe alternatives you’ve considered
I'm currently trying to normalize the input in my post template function but I think it would be helpful to the community to have shared logic.
Additional context
No response
The text was updated successfully, but these errors were encountered: