Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lemmy #56

Merged
merged 10 commits into from
Jul 1, 2023
Merged

feat: lemmy #56

merged 10 commits into from
Jul 1, 2023

Conversation

Teqed
Copy link
Contributor

@Teqed Teqed commented Jun 29, 2023

Parses /comment/ and /post/ URLs for comment IDs to use with getComment to obtain the parent post_id and then uses getComments to find all related comment URLs under ap_id.

@Teqed Teqed marked this pull request as draft June 29, 2023 04:25
@nanos
Copy link
Owner

nanos commented Jun 29, 2023

Thanks for this @Teqed

Really great work, and really appreciate it!

Can't wait for this to be ready to merge. Let me know if you need help with anything!

@Teqed
Copy link
Contributor Author

Teqed commented Jun 29, 2023

@nanos Thank you for writing FediFetcher! 👍

Working: Context of posts seen in the timeline.
In progress: Backfilling user profiles -- returned error is Extra data: line 1 column 4 (char 3), I will have to pick back up here later.

get_all_known_context_urls was returning None for the URL until I slightly refactored it b7ef2be (#56). I am still not sure what was happening here.

@Teqed Teqed force-pushed the feat/lemmy branch 2 times, most recently from 989ab44 to b7ef2be Compare June 29, 2023 07:14
@nanos
Copy link
Owner

nanos commented Jun 29, 2023

get_all_known_context_urls was returning None for the URL until I slightly refactored it b7ef2be (#56). I am still not sure what was happening here.

yeah, I must admit that I never truly understood that part 😆 it's something I just inherited from the original author, and never bothered to simplify / rewrite.

@Teqed
Copy link
Contributor Author

Teqed commented Jun 30, 2023

Included are a few commits which help prevent FediFetcher from exiting ungracefully when encountering issues with unexpected types, missing properties, or unusual URLs. Not a comprehensive pass for robustness but a few spots that were helpful while writing this feature.

For future federation features, it should be noted that Pixelfed profiles don't use a subdirectory in their path, ex. https://pixelfed.social/dansup instead of something like https://pixelfed.social/u/dansup. The way the current regex is matching makes it likely to match against any currently-unmatched subdirectories instead of the user's actual name. A quick fix is to make sure Pixelfed profile matches are attempted last, though I'm sure a more sophisticated regex is possible. I've left a cautionary comment for the time being.

For federation with Kbin instances, there is the minor issue of similar profile URLs to Lemmy (ex. https://kbin.social/u/admin) that would have to be parsed separately somehow. However, reading their API documentation does not reveal to me any way to fetch comments by username. You can search for posts by magazine but AFAIK user profiles are not available as magazines. However, this may change, as they've said:

This is a very early beta version, and a lot of features are currently broken or in active development, such as federation.

Finally, included in these commits are the final pieces needed to backfill user profiles, followed communities, and "posts" from Lemmy. Testing has been done via GitHub action against my Mastodon v4.1.2+glitch instance which has content from relevant instances.

@Teqed Teqed marked this pull request as ready for review June 30, 2023 06:24
@nanos
Copy link
Owner

nanos commented Jun 30, 2023

Thanks for your hard word @Teqed !

This is a fairly large PR, so I'm going to go through that with a bit of a fine toothed comb over the weekend, but it does look solid on firsts glance.

A quick fix is to make sure Pixelfed profile matches are attempted last

Personally, I think relying on a specific sequence is totally acceptable here.

Though I did think about using the /.well-known/nodeinfo endpoint to determine server software, but I'm not sure how widely implemented that is outside of mastodon either.

@Teqed
Copy link
Contributor Author

Teqed commented Jun 30, 2023

Though I did think about using the /.well-known/nodeinfo endpoint to determine server software, but I'm not sure how widely implemented that is outside of mastodon either.

This is a good idea and you inspired me to do some quick research:

Making a request at https://{server}}/.well-known/nodeinfo and then accessing ["links"][0]["href"] on the JSON gets you:
https://{mastodon}/nodeinfo/2.0
https://{lemmy}/nodeinfo/2.0.json
https://{kbin}/nodeinfo/2.0
https://{pixelfed}/api/nodeinfo/2.0.json <-- Note: /api/ subdirectory
https://{pleroma}/nodeinfo/2.0.json
https://{peertube}/nodeinfo/2.0.json

A request for that JSON (2.0 schema here) gets you ["software"]["name"] containing the name of the service. Working with all six of the services listed above.

This gives me some more ideas on how to go about choosing API endpoints based on NodeInfo instead of URL parsing. I imagine that if you went this route, it'd be preferable to keep a cache of already-identified APIs so that you don't repeatedly make the same request in the same run. I may submit another PR if I find this worthwhile to do.

Copy link
Owner

@nanos nanos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work. thanks so much!

@nanos nanos merged commit f7d0150 into nanos:main Jul 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants