-
-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for BlueSky? #4438
Comments
Heya, I have a few invite codes left over if @mikf wants one to implement this :) |
I've implemented some basics for this in an unrelated project. Chitose makes it relatively easy, but I'm not sure what the contribution guidelines are for new library dependencies. @mikf ? Logic is basically def login(self, instance="bsky.social"):
rc = netrc.netrc()
(BSKY_USER, _, BSKY_PASSWD) = rc.authenticators(instance)
self.api = chitose.BskyAgent(service=f'https://{instance}')
self.api.login(BSKY_USER, BSKY_PASSWD)
logging.info(f"Logged into {instance} as {BSKY_USER}")
def getPostMedia(self, json_obj) -> typing.Iterable[typing.Tuple[str, str]]:
for image_def in json_obj.get('embed', {}).get('images', []):
src_url = image_def['fullsize']
name = posixpath.split(src_url)[-1].replace('@', '.')
yield (name, src_url)
def bskyGetThread(self, post_reference: PostReference) -> dict:
thread_response = self.api.get_post_thread(uri=self.bskyTupleToUri(post_reference))
thread_response = json.loads(thread_response)
return thread_response
def getSkeetJsonApi(self, post_reference: PostReference, reason=""):
try:
thread_response = self.bskyGetThread(post_reference)
thread_response['thread']['post']['id'] = post_reference.post_id
logging.info(f"Downloaded new {self.NOUN_POST} for {post_reference} ({reason})")
# print(thread_response)
json_obj = thread_response['thread']['post']
return json_obj
except urllib.error.HTTPError as e: # type: ignore[attr-defined]
logging.error(e.headers)
logging.error(e.fp.read())
raise e
except Exception:
raise |
Bluesky is now open to the public, FYI: https://www.pcmag.com/news/twitter-alternative-bluesky-makes-posts-publicly-viewable |
Not for every account tho, you can manually set if you want your posts to be publicly viewable or only for people who are logged in. |
BlueSky posts are always public. You can request for your profile to be hidden from the unauthenticated human-friendly web interface, but that doesn't make it private. It will always be readable via public API. |
I've added a bunch of |
Starting experimentation with the version 1.26.7 on bluesky with username, password and cookies.
Using this post as a test: https://bsky.app/profile/toomanyboners.bsky.social/post/3khucm2ygso2z Picture downloaded with gallery-dl results in a 1470 x 1260 JPG. Opening picture in browser with "Open in new tab" gives a picture of 1000 x 857 JPG: https://cdn.bsky.app/img/feed_thumbnail/plain/did:plc:zyctzyihzisjnrdoiw75xvhm/bafkreie534wdizaua3psm66jflig66hqc5itocukqk4ajsu7pk6ic23aii@jpeg Clicking on picture to open it up in browser and "Open in new tab" gives a picture of 2000 x 1714 JPG: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:zyctzyihzisjnrdoiw75xvhm/bafkreie534wdizaua3psm66jflig66hqc5itocukqk4ajsu7pk6ic23aii@jpeg Removing |
I just tested the same link, same parameters except for not providing cookies. Downloading the link the first time gave me the 2000 × 1714 JPG file. All subsequent downloads of the same link using the same exact settings gave me the 1470 × 1260 JPG file. |
Each image uploaded to bluesky has 3 different versions (at least, haven't found more at this point).
(from https://bsky.app/profile/mikf.bsky.social/post/3kkn2rkvdls2v) gallery-dl is currently downloading everything in edit: |
That's confusing and a bit annoying how bluesky is doing this image resizing. You know it's bad when Twitter is more consistent with filesizes than this new alternative. So, if a 3000 x 3000 pic is uploaded, it'll always be downsized to 2000 x 2000 with no way to get the true original size, all after having to put up with image conversation and severe decompression. |
and reduce default depth and parentHeight values
allow extracting 'user' metadata and make 'facets' extraction optional
Both https://bsky.app/search?q=QUERY and https://bsky.app/search/QUERY are recognized as search URLs, where QUERY gets forwarded unmodified as 'q' parameter for app.bsky.feed.searchPosts . User searches are not supported yet.
But I think this is something Bluesky does when uploading the images, no? Ie., I don’t think Bluesky stores the original image anywhere, only their (potentially downscaled) JPEG image.
Is there a reason for asking for/using login information at all? Better rate limits? (As @qub1750ul mentioned earlier, all Bluesky posts (incl. images) are always public, so there’s no need for logging in to access them.) |
Seems like it. Bluesky does not store the originally uploaded image.
Certain (private) feeds, like You don't need to login if all you want to do is download a user's media. |
I just updated to gain access to the Bluesky functionality, although have a question, as my Python Script I run for a bot uses a separate downloader function, when I attempt to run it using the usual works-with-everything command (and using the example post above) gallery-dl --get-urls --no-download --option search-endpoint=graphq1 https://bsky.app/profile/toomanyboners.bsky.social/post/3khucm2ygso2z Rather than outputting the actual image url as https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:zyctzyihzisjnrdoiw75xvhm/bafkreie534wdizaua3psm66jflig66hqc5itocukqk4ajsu7pk6ic23aii@jpeg which is what shows up in a browser, it spits out a blob Granted I am seeing similarities in the URL's so a bit of rewriting the URL could accomplish what I need to pass over to my separate downloading function, just wondering if there's any command-line flags when using the --get-urls and --no-download function to instead output the correct https://cdn.bsky.app/img/feed_fullsize/plain/did: url instead of https://bsky.social/xrpc/com.atproto.sync.getBlob?did=did: ? |
@quentinwolf See #4438 (comment) There is currently no such option, but I'd think original resolution is better than the upscaled-to-2000px version. |
The
|
Now that's not true, at least for the bsky.app instance it isn't https://bsky.social/xrpc/com.atproto.sync.getBlob?did=did:plc:cslxjqkeexku6elp5xowxkq7&cid=bafkreifhy4gmtrfp3ax7wx2l7ojabjabhcnxvieumend3iu3ghlpp4fuiq is not the same file I uploaded. Or is this URL somehow wrong, e.g. wrong CID? |
I think it’s true in that it’s the “complete original blob, as uploaded” by bsky.app to their storage backend, even if not by the user to bsky.app, hence also my earlier comment about Bluesky’s handling of uploaded images. I haven’t looked at the what’s going on in the browser, but the JPEGifying and (potential) downscaling could even be happening browserside (I know that there are JavaScript libraries that do this anyway) so the original‐original might never touch any bsky.app infrastructure at all. |
Even JPEG files that don't get downscaled are modified: |
Requesting for unique ID to be added, a string with numbers/letters unique to the account |
Bluesky's equivalent to Twitter's user IDs as unchanging, unique IDs are DIDs. Each user has a handle and a DID, and both can be used with gallery-dl.
A user's DID can found at
It is also included in |
Doesn't work for archives because of the colon
|
Then replace |
That worked. I want to request the equivalent of Mastodon extractor's |
I don't think dots are allowed in the username. Can there be a version of |
Dots are allowed if you use a custom domain name. I know this for a fact cause I've done so with an alt account of mine (NSFW so can't post the name here). EDIT: I mean sub-domains with this, as example "@sub.epiclper.com" would be a valid Blue Sky username. |
In Bluesky/the AT protocol, usernames are domain names (or as the documentation says: |
Using the code ` |
* save cookies to tempfile, then rename avoids wiping the cookies file if the disk is full * [deviantart:stash] fix 'index' metadata (mikf#5335) * [deviantart:stash] recognize 'deviantart.com/stash/…' URLs * [gofile] fix extraction * [kemonoparty] add 'revision_count' metadata field (mikf#5334) * [kemonoparty] add 'order-revisions' option (mikf#5334) * Fix imagefap extrcator * [twitter] add 'birdwatch' metadata field (mikf#5317) should probably get a better name, but this is what it's called internally by Twitter * [hiperdex] update URL patterns & fix 'manga' metadata (mikf#5340) * [flickr] add 'contexts' option (mikf#5324) * [tests] show full path for nested values 'user.name' instead of just 'name' when testing for "user": { … , "name": "…", … } * [bluesky] add 'instance' metadata field (mikf#4438) * [vipergirls] add 'like' option (mikf#4166) * [vipergirls] add 'domain' option (mikf#4166) * [gelbooru] detect returned favorites order (mikf#5220) * [gelbooru] add 'date_favorited' metadata field * Update fapello.py get fullsize image instead resized * fapello.py Fullsize image by remove ".md" and ".th" in image url, it will download fullsize of images * [formatter] fix local DST datetime offsets for ':O' 'O' would get the *current* local UTC offset and apply it to all 'datetime' objects it gets applied to. This would result in a wrong offset if the current offset includes DST and the target 'datetime' does not or vice-versa. 'O' now determines the correct local UTC offset while respecting DST for each individual 'datetime'. * [subscribestar] fix 'date' metadata * [idolcomplex] support new pool URLs * [idolcomplex] fix metadata extraction - replace legacy 'id' vales with alphanumeric ones, since the former are no longer available - approximate 'vote_average', since the real value is no longer available - fix 'vote_count' * [bunkr] remove 'description' metadata album descriptions are no longer available on album pages and the previous code erroneously returned just '0' * [deviantart] improve 'index' extraction for stash files (mikf#5335) * [kemonoparty] fix exception for '/revision/' URLs caused by 03a9ce9 * [steamgriddb] raise proper exception for deleted assets * [tests] update extractor results * [pornhub:gif] extract 'viewkey' and 'timestamp' metadata (mikf#4463) mikf#4463 (comment) * [tests] use 'datetime.timezone.utc' instead of 'datetime.UTC' 'datetime.UTC' was added in Python 3.11 and is not defined in older versions. * [gelbooru] add 'order-posts' option for favorites (mikf#5220) * [deviantart] handle CloudFront blocks in general (mikf#5363) This was already done for non-OAuth requests (mikf#655) but CF is now blocking OAuth API requests as well. * release version 1.26.9 * [kemonoparty] fix KeyError for empty files (mikf#5368) * [twitter] fix pattern for single tweet (mikf#5371) - Add optional slash - Update tests to include some non-standard tweet URLs * [kemonoparty:favorite] support 'sort' and 'order' query params (mikf#5375) * [kemonoparty] add 'announcements' option (mikf#5262) mikf#5262 (comment) * [wikimedia] suppress exception for entries without 'imageinfo' (mikf#5384) * [docs] update defaults of 'sleep-request', 'browser', 'tls12' * [docs] complete Authentication info in supportedsites.md * [twitter] prevent crash when extracting 'birdwatch' metadata (mikf#5403) * [workflows] build complete docs Pages only on gdl-org/docs deploy only docs/oauth-redirect.html on mikf.github.io/gallery-dl * [docs] document 'actions' (mikf#4543) or at least attempt to * store 'match' and 'groups' in Extractor objects * [foolfuuka] improve 'board' pattern & support pages (mikf#5408) * [reddit] support comment embeds (mikf#5366) * [build] add minimal pyproject.toml * [build] generate sdist and wheel packages using 'build' module * [build] include only the latest CHANGELOG entries The CHANGELOG is now at a size where it takes up roughly 50kB or 10% of an sdist or wheel package. * [oauth] use Extractor.request() for HTTP requests (mikf#5433) Enables using proxies and general network options. * [kemonoparty] fix crash on posts with missing datetime info (mikf#5422) * restore LD_LIBRARY_PATH for PyInstaller builds (mikf#5421) * remove 'contextlib' imports * [pp:ugoira] log errors for general exceptions * [twitter] match '/photo/' Tweet URLs (mikf#5443) fixes regression introduced in 40c0553 * [pp:mtime] do not overwrite '_mtime' for None values (mikf#5439) * [wikimedia] fix exception for files with empty 'metadata' * [wikimedia] support wiki.gg wikis * [pixiv:novel] add 'covers' option (mikf#5373) * [tapas] add 'creator' extractor (mikf#5306) * [twitter] implement 'relogin' option (mikf#5445) * [docs] update docs/configuration links (mikf#5059, mikf#5369, mikf#5423) * [docs] replace AnchorJS with custom script use it in rendered .rst documents as well as in .md ones * [text] catch general Exceptions * compute tempfile path only once * Add warnings flag This commit adds a warnings flag It can be combined with -q / --quiet to display warnings. The intent is to provide a silent option that still surfaces warning and error messages so that they are visible in logs. * re-order verbose and warning options * [gelbooru] improve pagination logic for meta tags (mikf#5478) similar to 494acab * [common] add Extractor.input() method * [twitter] improve username & password login procedure (mikf#5445) - handle more subtasks - support 2FA - support email verification codes * [common] update Extractor.wait() message format * [common] simplify 'status_code' check in Extractor.request() * [common] add 'sleep-429' option (mikf#5160) * [common] fix NameError in Extractor.request() … when accessing 'code' after an requests exception was raised. Caused by the changes in 566472f * [common] show full URL in Extractor.request() error messages * [hotleak] download files with 404 status code (mikf#5395) * [pixiv] change 'sanity_level' debug message to a warning (mikf#5180) * [twitter] handle missing 'expanded_url' fields (mikf#5463, mikf#5490) * [tests] allow filtering extractor result tests by URL or comment python test_results.py twitter:+/i/web/ python test_results.py twitter:~twitpic * [exhentai] detect CAPTCHAs during login (mikf#5492) * [output] extend 'output.colors' (mikf#2566) allow specifying ANSI colors for all loglevels (debug, info, warning, error) * [output] enable colors by default * add '--no-colors' command-line option --------- Co-authored-by: Luc Ritchie <luc.ritchie@gmail.com> Co-authored-by: Mike Fährmann <mike_faehrmann@web.de> Co-authored-by: Herp <asdf@qwer.com> Co-authored-by: wankio <31354933+wankio@users.noreply.github.com> Co-authored-by: fireattack <human.peng@gmail.com> Co-authored-by: Aidan Harris <me@aidanharr.is>
Has video support been implemented yet? |
@legowerewolf since v1.27.5 (7d6520e) |
Ah, thank you. Winget wasn't showing the update for me for some reason. |
Any plans to support hashtags? |
I can't seem to |
Huh? Yeah, would surprise me if this worked. This whole The top-level domain name can be anything, and thus can't be automagically detected by gallery-dl. But I digress.. What's |
It is a valid redirect. Edit: with bsky, you can use domains as handles. Of course, I guess this wouldn't work for an account like |
I see what you mean. Okay, this is something that could easily be changed on gallery-dl's side, but it is not yet, obviously. But why would you even need this? Right now, you have this:
So, simply always prepend |
I only began to make that connection after my first comment. It really isn't needed, lol oops 🫠 |
While it doesn’t seem to be there yet, the idea for Bluesky is that it should be possible to host different AT protocol servers/sites (e.g., like Bluesky), so this might be needed eventually. (Not saying you should add it now, just a heads-up that it’s something that’s not entirely unlikely to happen in the future. :)) |
The site is still invite-only for now, but I’m willing to provide an invite code as soon as I get a new one (should be in ~5 days).
The text was updated successfully, but these errors were encountered: