Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kemono] Invert download order of revisions #5334

Closed
taskhawk opened this issue Mar 15, 2024 · 4 comments
Closed

[Kemono] Invert download order of revisions #5334

taskhawk opened this issue Mar 15, 2024 · 4 comments

Comments

@taskhawk
Copy link

Would it be possible to invert the download order of revisions for Kemono? Now that I have taken a look at revisions, I tried to organize the files in a specific way but couldn't due to the current download order where newest revisions get downloaded first.

I'm only interested in the files. I'm writing the file hash in the archive to prevent duplicates, and I'm using the revision_index in the filename to note the revision it came from like a version number. Something like this for simplicity:

"kemonoparty": {
      "archive": "~/kemono.sqlite3",
      "archive-format": "{subcategory}_{user}_{id}_{hash}",
      "directory": ["kemono", "{subcategory}", "{username}"],
      "filename": {
        "revision_index > 1": "{id}-{num:>02}{revision_index:?v//}.{extension}",
        ""                  : "{id}-{num:>02}.{extension}"
      },
      "skip": true,
      "revisions": "unique"
    }

With the current download order the files get processed like this:

./kemono/fanbox/artist/555555-01v3.png
./kemono/fanbox/artist/555555-01v2.png
./kemono/fanbox/artist/555555-01.png

With this config if all three files are actually the same I end up with just one file named 555555-01v3.png which is a bit weird because it really came from the first "version". On the other hand if the revision download order was inverted it would be processed like this:

./kemono/fanbox/artist/555555-01.png
./kemono/fanbox/artist/555555-01v2.png
./kemono/fanbox/artist/555555-01v3.png

And if the files are the same I should end up with 555555-01.png which seems more appropriate. If the files are different then the result should be the same with either download order. It's really for this kind of scenario where it's necessary.

@mikf
Copy link
Owner

mikf commented Mar 15, 2024

Would a value representing the total number of revisions (revision_count?) also work, so that it would be possible to enumerate revisions in reverse? ({revision_count - revision_index + 1} in an f-string)

I could also just add a order-revisions option that reverses the revision list before it gets processed, but that might have unintended effects on other revision:… values. Not sure.

Maybe I'll just do both.

@taskhawk
Copy link
Author

Would a value representing the total number of revisions (revision_count?) also work, so that it would be possible to enumerate revisions in reverse? ({revision_count - revision_index + 1} in an f-string)

Yes, that would work too. When I tried workarounds I actually thought of doing that but found out there was no revision_count. This is probably the easiest fix. It just needs to take into account the "revisions": "unique" option.

I could also just add a order-revisions option that reverses the revision list before it gets processed, but that might have unintended effects on other revision:… values. Not sure.

Not entirely sure either but I would think that as long as the default order remained the same as it is now it shouldn't affect current configs.

@mikf
Copy link
Owner

mikf commented Mar 15, 2024

Added revision_count metadata (1418c0c) and an order-revisions option (03a9ce9).

@taskhawk
Copy link
Author

Thank you!

JackTildeD added a commit to JackTildeD/gallery-dl-forked that referenced this issue Apr 24, 2024
* save cookies to tempfile, then rename

avoids wiping the cookies file if the disk is full

* [deviantart:stash] fix 'index' metadata (mikf#5335)

* [deviantart:stash] recognize 'deviantart.com/stash/…' URLs

* [gofile] fix extraction

* [kemonoparty] add 'revision_count' metadata field (mikf#5334)

* [kemonoparty] add 'order-revisions' option (mikf#5334)

* Fix imagefap extrcator

* [twitter] add 'birdwatch' metadata field (mikf#5317)

should probably get a better name,
but this is what it's called internally by Twitter

* [hiperdex] update URL patterns & fix 'manga' metadata (mikf#5340)

* [flickr] add 'contexts' option (mikf#5324)

* [tests] show full path for nested values

'user.name' instead of just 'name' when testing for
"user": { … , "name": "…", … }

* [bluesky] add 'instance' metadata field (mikf#4438)

* [vipergirls] add 'like' option (mikf#4166)

* [vipergirls] add 'domain' option (mikf#4166)

* [gelbooru] detect returned favorites order (mikf#5220)

* [gelbooru] add 'date_favorited' metadata field

* Update fapello.py

get fullsize image instead resized

* fapello.py Fullsize image

by remove ".md" and ".th" in image url, it will download fullsize of images

* [formatter] fix local DST datetime offsets for ':O'

'O' would get the *current* local UTC offset and apply it to all
'datetime' objects it gets applied to.
This would result in a wrong offset if the current offset includes
DST and the target 'datetime' does not or vice-versa.

'O' now determines the correct local UTC offset while respecting DST for
each individual 'datetime'.

* [subscribestar] fix 'date' metadata

* [idolcomplex] support new pool URLs

* [idolcomplex] fix metadata extraction

- replace legacy 'id' vales with alphanumeric ones, since the former are
  no longer available
- approximate 'vote_average', since the real value is no longer
  available
- fix 'vote_count'

* [bunkr] remove 'description' metadata

album descriptions are no longer available on album pages
and the previous code erroneously returned just '0'

* [deviantart] improve 'index' extraction for stash files (mikf#5335)

* [kemonoparty] fix exception for '/revision/' URLs

caused by 03a9ce9

* [steamgriddb] raise proper exception for deleted assets

* [tests] update extractor results

* [pornhub:gif] extract 'viewkey' and 'timestamp' metadata (mikf#4463)

mikf#4463 (comment)

* [tests] use 'datetime.timezone.utc' instead of 'datetime.UTC'

'datetime.UTC' was added in Python 3.11
and is not defined in older versions.

* [gelbooru] add 'order-posts' option for favorites (mikf#5220)

* [deviantart] handle CloudFront blocks in general (mikf#5363)

This was already done for non-OAuth requests (mikf#655)
but CF is now blocking OAuth API requests as well.

* release version 1.26.9

* [kemonoparty] fix KeyError for empty files (mikf#5368)

* [twitter] fix pattern for single tweet (mikf#5371)

- Add optional slash
- Update tests to include some non-standard tweet URLs

* [kemonoparty:favorite] support 'sort' and 'order' query params (mikf#5375)

* [kemonoparty] add 'announcements' option (mikf#5262)

mikf#5262 (comment)

* [wikimedia] suppress exception for entries without 'imageinfo' (mikf#5384)

* [docs] update defaults of 'sleep-request', 'browser', 'tls12'

* [docs] complete Authentication info in supportedsites.md

* [twitter] prevent crash when extracting 'birdwatch' metadata (mikf#5403)

* [workflows] build complete docs Pages only on gdl-org/docs

deploy only docs/oauth-redirect.html on mikf.github.io/gallery-dl

* [docs] document 'actions' (mikf#4543)

or at least attempt to

* store 'match' and 'groups' in Extractor objects

* [foolfuuka] improve 'board' pattern & support pages (mikf#5408)

* [reddit] support comment embeds (mikf#5366)

* [build] add minimal pyproject.toml

* [build] generate sdist and wheel packages using 'build' module

* [build] include only the latest CHANGELOG entries

The CHANGELOG is now at a size where it takes up roughly 50kB or 10% of
an sdist or wheel package.

* [oauth] use Extractor.request() for HTTP requests (mikf#5433)

Enables using proxies and general network options.

* [kemonoparty] fix crash on posts with missing datetime info (mikf#5422)

* restore LD_LIBRARY_PATH for PyInstaller builds (mikf#5421)

* remove 'contextlib' imports

* [pp:ugoira] log errors for general exceptions

* [twitter] match '/photo/' Tweet URLs (mikf#5443)

fixes regression introduced in 40c0553

* [pp:mtime] do not overwrite '_mtime' for None values (mikf#5439)

* [wikimedia] fix exception for files with empty 'metadata'

* [wikimedia] support wiki.gg wikis

* [pixiv:novel] add 'covers' option (mikf#5373)

* [tapas] add 'creator' extractor (mikf#5306)

* [twitter] implement 'relogin' option (mikf#5445)

* [docs] update docs/configuration links (mikf#5059, mikf#5369, mikf#5423)

* [docs] replace AnchorJS with custom script

use it in rendered .rst documents as well as in .md ones

* [text] catch general Exceptions

* compute tempfile path only once

* Add warnings flag

This commit adds a warnings flag

It can be combined with -q / --quiet to display warnings.
The intent is to provide a silent option that still surfaces
warning and error messages so that they are visible in logs.

* re-order verbose and warning options

* [gelbooru] improve pagination logic for meta tags (mikf#5478)

similar to 494acab

* [common] add Extractor.input() method

* [twitter] improve username & password login procedure (mikf#5445)

- handle more subtasks
- support 2FA
- support email verification codes

* [common] update Extractor.wait() message format

* [common] simplify 'status_code' check in Extractor.request()

* [common] add 'sleep-429' option (mikf#5160)

* [common] fix NameError in Extractor.request()

… when accessing 'code' after an requests exception was raised.

Caused by the changes in 566472f

* [common] show full URL in Extractor.request() error messages

* [hotleak] download files with 404 status code (mikf#5395)

* [pixiv] change 'sanity_level' debug message to a warning (mikf#5180)

* [twitter] handle missing 'expanded_url' fields (mikf#5463, mikf#5490)

* [tests] allow filtering extractor result tests by URL or comment

python test_results.py twitter:+/i/web/
python test_results.py twitter:~twitpic

* [exhentai] detect CAPTCHAs during login (mikf#5492)

* [output] extend 'output.colors' (mikf#2566)

allow specifying ANSI colors for all loglevels
(debug, info, warning, error)

* [output] enable colors by default

* add '--no-colors' command-line option

---------

Co-authored-by: Luc Ritchie <luc.ritchie@gmail.com>
Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
Co-authored-by: Herp <asdf@qwer.com>
Co-authored-by: wankio <31354933+wankio@users.noreply.github.com>
Co-authored-by: fireattack <human.peng@gmail.com>
Co-authored-by: Aidan Harris <me@aidanharr.is>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants