Feature Request: add BDfR as a new extractor for archiving Reddit content #778

pirate · 2021-07-02T01:52:24Z

Discussed in #754

^{Originally posted by BlipRanger May 24, 2021}
Just wanted to make a quick mention of BDfR as a cool project that might make for a good starting point for the unrolling of reddit comments/posts as mentioned in the roadmap. They currently support grabbing a variety of media types from the post as well as the comments/text in a separate (json) file. I've been working on an addon for it lately and I think it's a pretty great project with well-maintained code. If nothing else, they have really good examples of working with reddit data which could be useful! Just wanted to bring that to your attention!

I'd love to add BDfR as an extractor for Reddit content (and something similar for Twitter too #345) but am somewhat swamped with work and travel for the near future.

If you @BlipRanger or anyone else wants to add it as an extractor (matching the style of our other extractors, e.g. archivebox/extractors/media.py is a great example to copy), I'd be happy to review PRs!

We have some good instructions for contributing a new extractor and getting started with ArchiveBox development in general:

The text was updated successfully, but these errors were encountered:

pirate · 2023-10-20T20:12:08Z

We use Mercury (recently renamed postlight) as an extractor already, and they're rapidly adding extractors on their side for many different kinds of sites, so we should get these improvements with no effort required on the archivebox side:

Reddit threads: Add comments to Reddit extractor postlight/parser#746
HN threads: custom parser for HackerNews (news.ycombinator.com) postlight/parser#745
Twitter threads: support twitter postlight/parser#622

rmelotte · 2024-06-24T15:30:55Z

It looks like the postlight project has no recent activity unfortunately (no PR reviews at least)...
Is there any plan to replace it with something else, or integrate the existing Reddit and HN PRs in a different way?

pirate changed the title ~~Feature Request: add BDfR as an an extractor for downloading Reddit content~~ Feature Request: add BDfR as a new extractor for archiving Reddit content Jul 2, 2021

pirate removed the type: enhancement label Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: add BDfR as a new extractor for archiving Reddit content #778

Feature Request: add BDfR as a new extractor for archiving Reddit content #778

pirate commented Jul 2, 2021 •

edited

Loading

pirate commented Oct 20, 2023

rmelotte commented Jun 24, 2024

Feature Request: add BDfR as a new extractor for archiving Reddit content #778

Feature Request: add BDfR as a new extractor for archiving Reddit content #778

Comments

pirate commented Jul 2, 2021 • edited Loading

Discussed in #754

pirate commented Oct 20, 2023

rmelotte commented Jun 24, 2024

pirate commented Jul 2, 2021 •

edited

Loading