-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds an unauthenticated Bluesky archiver #160
Conversation
msramalho
commented
Dec 23, 2024
- avoids ATProto
- leaves a few TODOs in reusable download_from_url method
from .instagram_api_archiver import InstagramAPIArchiver | ||
from .bluesky_archiver import BlueskyArchiver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general comment here (not relevant to this PR) but I don't like how we import all archivers in the init.py file. Ideally, we should only load dependencies (archivers, feeders etc.) that are enabled in the orchestration.
Reason: I came across a bug whereby an archiver was broken (snscrape was broken), and even though I hadn't enabled it, it was still being loaded and throwing an error. Imagine if an archiver breaks for whatever reason (or more likely, an archiver's dependencies break), it would be much easier to just disable the archiver than telling the user to find an __init__.py
file and 'change the python code'.
This is essentially how django works. You create a list of INSTALLED_APPS
and then the archiver tries to install them one by one. We already have this list of 'installed apps' (it's called steps in our orchestration). So it's just a matter of on startup, trying to load each of them. Then when they can't be loaded throwing an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, please open an issue for those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, under 'backlog'
(Images are stored in media/images)
Use 'python -m unittest' from the project root to run
LGTM. Tested and works. One issue I found was the media wasn't getting downloaded for this url: https://bsky.app/profile/colborne.bsky.social/post/3lec2bqjc5s2y For some reason, bsky uses 2 different places to store images. Under
Looking at the bksy source, it seems that embed can be one of:
I've modified the code to look at both these paths for images. I also took the liberty of adding in some unit tests to test the 3 different types of media (single image, multiple image, video). The unit tests might want to be abstracted out to another PR. But it's a framework on how unit testing may work. One big question we need to discuss: do we want to the unit tests to make live requests to our online platforms? Advantages: allows us to test the platforms as well, and to make sure the APIs/webpages haven't broken or changed |
I agree with merging this unit tests changes even if we refactor how those are setup etc, thanks for adding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.