Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add BDfR as a new extractor for archiving Reddit content #778

Open
pirate opened this issue Jul 2, 2021 Discussed in #754 · 2 comments
Open

Feature Request: add BDfR as a new extractor for archiving Reddit content #778

pirate opened this issue Jul 2, 2021 Discussed in #754 · 2 comments
Labels
good first ticket help wanted size: medium status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet touches: configuration touches: dependencies/packaging Issues or changes that add/remove/affect dependencies touches: docs

Comments

@pirate
Copy link
Member

pirate commented Jul 2, 2021

Discussed in #754

Originally posted by BlipRanger May 24, 2021
Just wanted to make a quick mention of BDfR as a cool project that might make for a good starting point for the unrolling of reddit comments/posts as mentioned in the roadmap. They currently support grabbing a variety of media types from the post as well as the comments/text in a separate (json) file. I've been working on an addon for it lately and I think it's a pretty great project with well-maintained code. If nothing else, they have really good examples of working with reddit data which could be useful! Just wanted to bring that to your attention!

I'd love to add BDfR as an extractor for Reddit content (and something similar for Twitter too #345) but am somewhat swamped with work and travel for the near future.

If you @BlipRanger or anyone else wants to add it as an extractor (matching the style of our other extractors, e.g. archivebox/extractors/media.py is a great example to copy), I'd be happy to review PRs!

We have some good instructions for contributing a new extractor and getting started with ArchiveBox development in general:

@pirate pirate added status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet size: medium touches: configuration good first ticket help wanted touches: dependencies/packaging Issues or changes that add/remove/affect dependencies touches: docs is: enhancement labels Jul 2, 2021
@pirate pirate changed the title Feature Request: add BDfR as an an extractor for downloading Reddit content Feature Request: add BDfR as a new extractor for archiving Reddit content Jul 2, 2021
@pirate
Copy link
Member Author

pirate commented Oct 20, 2023

We use Mercury (recently renamed postlight) as an extractor already, and they're rapidly adding extractors on their side for many different kinds of sites, so we should get these improvements with no effort required on the archivebox side:

@rmelotte
Copy link

It looks like the postlight project has no recent activity unfortunately (no PR reviews at least)...
Is there any plan to replace it with something else, or integrate the existing Reddit and HN PRs in a different way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first ticket help wanted size: medium status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet touches: configuration touches: dependencies/packaging Issues or changes that add/remove/affect dependencies touches: docs
Projects
None yet
Development

No branches or pull requests

2 participants