sync upstream #1

mo-han · 2020-05-26T01:22:37Z

No description provided.

Extract all sta.sh items in a single extractor run. Don't spawn a new StashExtractor for each individual sta.sh item to preserve the current requests.Session and its opened TCP connections.

'/extended_fetch' as well as Deviation webpages now again contain Deviation UUIDs needed to grab Deviation info through the OAuth API, meaning cookies are no longer necessary to grab original files. The only instance were cookies are still needed are scraps marked as "mature", since those entries are hidden for public users. (#655, #657, #660)

- add a 2 second wait time between requests to deviantart.com - catch 403 "Request blocked" errors and wait for 3 minutes until retrying

(closes #665)

"Request blocked" can also happen on sta.sh and for *any* HTTP request directed at deviantart.com

- allow 'until' to be a datetime object - do "time calculations" with UTC timestamps - set a default 'reason'

Calls to config.clear() from other tests are removing the API credentials set when importing mastodon.py for the first time.

i.e. /g4/data.php?id=… - get filename & extension from Content-Disposition header - handle all downloadable file types (docx, swf, etc)

'q_\d+' would sometimes also replace something in the 'token' query parameter, invalidating the URL.

There are currently no situations where forwarding gallery-dl's cookies to youtube-dl is necessary, and it only causes problems when forcing youtube-dl for Twitter video downloads while logged in.

- move code into its own function - add enumeration index to filenames - dump responses regardless of status code

* [imagechest] Add new extractor for ImageChest * [imagechest] Fix flake8 compliance issues

The webtoons extractor can extract episode and entire comic (all episodes) from webtoons.com. All the logic of the extractors should be trivial except for a couple of kludges needed: - `ageGatePass' cookie is always set to avoid possible redirect and stop of extraction, especially in the comic extractor - The image URLs returned by the episode extractor could not be fetched directly and the `Referer:' HTTP header needs to be passed to fetch them Close #593.

- support changing values for 'k' - use XML parser to get request parameters (some input fields are now embedded in an HTML comment)

use 'pool:<pool id>' as search tag to get pool posts

Wrap all loggers used by job, extractor, downloader, and postprocessor objects into a (custom) LoggerAdapter that provides access to the underlying job, extractor, pathfmt, and kwdict objects and their properties. __init__() signatures for all downloader and postprocessor classes have been changed to take the current Job object as their first argument, instead of the current extractor or pathfmt. (#574, #575)

- fix episode listings for french comics - allow input URLs without explicit scheme - add 'lang'/'language' metadata - use str.format() instead of '+' to assemble URLs

Downloading the pre-rendered versions should be a better default than .zip files with individual frames.

close #778

Allow its value to be a JSON object / Python dict that specifies a mapping from invalid/unwanted input characters to specific output characters. For example {"/": "-", "*": "+"} will transform "foo / ***bar***" into "foo - +++bar+++" (closes #662, #755)

- include 'igneous' and 'hath_perks' in Exhentai cookies - add an example of how to write DeviantArt description to file - add a 'path-restrict' mapping from invalid characters in Windows paths to Unicode alternatives (taken from #662)

mo-han and others added 30 commits March 31, 2020 21:59

Add metadata to hentainexus: circle, event, title_conventional. (#661)

6f81cac

[hentainexus] reduce line length (flake8) & update test

fe96f99

change Travis badge URL to .com

a0111ed

[weibo] accept status URLs with non-numeric IDs (#664)

699036e

[piczel] fix extraction for single images

c034159

[deviantart] detect stash folders (fixes #659)

e2fc4ea

[deviantart] improve sta.sh extraction

5c27b25

Extract all sta.sh items in a single extractor run. Don't spawn a new StashExtractor for each individual sta.sh item to preserve the current requests.Session and its opened TCP connections.

[deviantart] handle "Request blocked" errors (#655)

ff7c0b7

- add a 2 second wait time between requests to deviantart.com - catch 403 "Request blocked" errors and wait for 3 minutes until retrying

[mastodon] add access tokens for mastodon.social and baraag.net

2587296

(closes #665)

[deviantart] apply HTTP request limits in more places

f9a590f

"Request blocked" can also happen on sta.sh and for *any* HTTP request directed at deviantart.com

[hiperdex] fix extraction

762c758

[oauth] use the new name for 'DeviantartAPI' (fixes #670)

5d7404a

improve Extractor.wait()

d02f7c1

- allow 'until' to be a datetime object - do "time calculations" with UTC timestamps - set a default 'reason'

[mastodon] handle rate limits

220c06b

[mastodon] use 'combine_dict()' to combine extractor info dicts

4ae8a25

[mastodon] update OAuth credentials for pawoo.net (#665)

88fca0a

add tests for Extractor.wait()

04bd047

add tests for "Extractors" in oauth.py (#670)

3b50c4f

[myportfolio] fix extraction of galleries without title

9e7dfc0

[aryion] add gallery and post extractors (#390, #673)

6143050

read config files from PyInstaller exe directory (closes #682)

300264f

ensure keys for mastodon instances are available during tests

406449b

Calls to config.clear() from other tests are removing the API credentials set when importing mastodon.py for the first time.

[aryion] include path in default directory format (#390)

96b78bc

[aryion] use generic download URLs (#390)

dc65f7d

i.e. /g4/data.php?id=… - get filename & extension from Content-Disposition header - handle all downloadable file types (docx, swf, etc)

[downloader:http] don't overwrite existing '_mtime' fields

38bc643

[aryion] fix malformed 'last-modified' headers (#390)

6c531be

add optional 'utcoffset' argument to 'parse_datetime()'

a0f4c29

[aryion] adjust 'date' to UTC time

cf4cef3

[deviantart] fix JPEG quality replacement pattern

bae1e8e

'q_\d+' would sometimes also replace something in the 'token' query parameter, invalidating the URL.

mikf and others added 29 commits May 12, 2020 20:17

[downloader:ytdl] change 'forward-cookies' default to 'false'

dba87ca

There are currently no situations where forwarding gallery-dl's cookies to youtube-dl is necessary, and it only causes problems when forcing youtube-dl for Twitter video downloads while logged in.

improve '--write-pages' (#737)

f8f95e6

- move code into its own function - add enumeration index to filenames - dump responses regardless of status code

reuse connection adapters from parent extractors

a1e739b

[downloader:ytdl] fix file extensions when merging into mkv

f8661c6

Fix typo: defaut → default. (#754)

b7ebf51

[imagechest] Add new extractor for ImageChest (#750)

7b5711e

* [imagechest] Add new extractor for ImageChest * [imagechest] Fix flake8 compliance issues

fix/improve Cloudflare bypass code (#728, #757)

d17e962

- support changing values for 'k' - use XML parser to get request parameters (some input fields are now embedded in an HTML comment)

reset filenames on empty file extensions (#733)

abbd8fb

[gelbooru] simplify and fix pool extraction

9b46359

use 'pool:<pool id>' as search tag to get pool posts

[sexcom] replace 404ed test

846d3a2

[imagechest] simplify code (#750)

ab11b1c

[webtoons] fixes and simplifications (#593, #761)

0378d07

- fix episode listings for french comics - allow input URLs without explicit scheme - add 'lang'/'language' metadata - use str.format() instead of '+' to assemble URLs

Updated README to include additional Windows installation method (#763)

4df2cad

add 'text.ensure_http_scheme()'

6294e2c

add global WINDOWS bool

c878764

add global SENTINEL object

3201fe3

[danbooru] change default for 'ugoira' to 'false'

e19f665

Downloading the pre-rendered versions should be a better default than .zip files with individual frames.

readd 'session' to base downloader class (fixes #768)

34929f6

[webtoons] make archive_fmt unique (#779)

bcac31b

close #778

extend 'path-restrict' option

bc53302

Allow its value to be a JSON object / Python dict that specifies a mapping from invalid/unwanted input characters to specific output characters. For example {"/": "-", "*": "+"} will transform "foo / ***bar***" into "foo - +++bar+++" (closes #662, #755)

add a few more examples to gallery-dl-example.conf

7003e61

- include 'igneous' and 'hath_perks' in Exhentai cookies - add an example of how to write DeviantArt description to file - add a 'path-restrict' mapping from invalid characters in Windows paths to Unicode alternatives (taken from #662)

[imgur] fix extraction of animated images without 'mp4' entry

b6cee3e

[imgur] treat 't/unmuted' URLs as galleries

6bcdb26

include redirects and headers in --write-pages dumps (#737)

a363da4

move dump_response() into a separate function (#737)

15c3d29

implement a 'path-replace' option (#662, #755)

ddc253c

write OAuth token to cache by default (#616)

dfcf2a2

mo-han merged commit f4f57db into mo-han:master May 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync upstream #1

sync upstream #1

mo-han commented May 26, 2020

sync upstream #1

sync upstream #1

Conversation

mo-han commented May 26, 2020