-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kemono] Re-download a post's text content when it is edited #3800
Comments
That won't work if you want it to be I mean, it is going to do what you want it to - detecting changes in either |
"archive": "~/gallery-dl/gallery-dl-kemono-postprocessor.sqlite",
"archive-prefix": "\fF {category}",
"archive-format": "_{service}_{user}_{id}_{hash_sha1(content)[:10]}", UPD: "archive-format": "_{service}_{user}_{id}_{hash_sha1(content + (embed and embed['url'] or ''))[:10]}", |
Also I suggest to save the Then you can concatenate all HTML files into one file and open it in with a browser. For example, alias catahtml='cat *.html > "$TEMP/_temp-catahtml-result.html"; start "$TEMP/_temp-catahtml-result.html"; sleep 0; exit;' Then use my userscript to parse the URLs from the HTML file. Although, if you will only parse them programmatically, just saving the only text value inside a |
Fill free to use. "postprocessors": [
{
"name": "mtime",
"event": "post"
},
{
"name": "metadata",
"event": "post",
"mode": "custom",
"directory": "metadata",
"filename": "\fF [{category}] {user}—{id}—{title}—{hash_sha1(content + (embed['url'] if embed else ''))[:10]}.html",
"content-format": "\fT ~/gallery-dl/templates/kemono.html",
"archive": "~/gallery-dl/gallery-dl-kemono-postprocessor.sqlite",
"archive-prefix": "\fF {category}",
"archive-format": "_{service}_{user}_{id}_{hash_sha1(content + (embed['url'] if embed else ''))[:10]}",
"mtime": true
}
]
<div class="post" id="{id}" data-added="{added}" data-published="{published}" data-edited="{edited}">
<h4>
<a href="https://kemono.party/{service}/user/{user}/post/{id}">{title}</a><span class="id" style="color: gray;"><i> #{id}</i></span>
</h4>
<div class="content">{content}</div>
<div class="content embed" title="{embed[subject]:?//}">{embed[url]:?//}</div>
<br>
<div class="date"><i>{date:%Y.%m.%d %H:%M:%S}</i></div>
<hr>
</div> |
BTW, is it possible to use f-string in templates? The custom formatting syntax is too difficult to write and read. |
Is it better to hash the text data? Aren't the two methods practically the same? Is there a scenario where storing the hash value of the text data beats storing the text data itself? That's a cool idea to combine all of the HTML files into one, I'll do that too. Is there a difference in saving the archive file as sqlite, sqlite3 or db? |
Storing a hash will reduce the DB size, also you can use it in the filename as a postfix, see the example above. (To generate unique filenames for each post edit, in other case metadata file for the new post edit will overwrite the old one.) To prevent collisions for a post (since you store both hash and post id) I think even 10 chars of SHA-1 hash is enough.
In any case it will be a SQLite DB file. |
Thank you. I'll change it to sqlite and use your postprocessor config, that's a neat way to reduce the size of the archive file. Does gallery-dl support xxhash64? The latter is much faster than SHA-1 and the value is much smaller. I'm assuming I can use SHA-1 for everything, for example, a Twitter name |
Any text key. #3679 with using
It's the wrong use. |
As a side note: You can use the In 1.25.0 it overwrites the default set of functions, but it will add them in the next version (#3773, a1ca240). |
Thanks for the help everyone. I'm using AlttiRi's postprocessor and I changed the filename to |
Or is there any workaround? Something like |
Will your postprocessor still work if I remove |
Not at the moment, but this is easy enough to add. Would |
Ah, I forgot, that inside f-string there is an extra syntax. In my example |
Thanks, that worked. This is what I've currently got, please let me know if I'm missing anything because I've made quite some changes. Does filter work? Also, @AlttiRi, do you have a
|
You cannot use multiple |
I changed it to |
The blacklist syntax is documented here |
Yeah, it's fine.
I find this filter questionable. I would save every post. Then just parse the links that you need. It's why I have in my userscript 2 search/filter inputs: the first one works similar to your regex — to list only known sites, the second one for the reverse mode to list every link except the input values. |
I forgot about those, do you know of a list of file sharing domains? I add any such domain that I see to the filter and I know I'm missing a lot. I do actually save every post with another run, the reason for the filter is that imo it's faster and mainly because some artists will split the link into different chunks, e.g. "https :// mega [dot] nz". |
BTW, here is how it looks (Twitter's example): For example, the result single HTML file (created with gallery-dl's postprocessor) for a tweet: It has a description as well as other meta information: Here is a screenshot of all HTML files are concated into one file with the opened popup of my HrefTaker userscript to parse links ( To concat HTML files I use a bash function is defined in alias catahtml=fun_cat_html
function fun_cat_html {
current_date_time=$(date +"%Y.%m.%d-%H.%M.%S")
cat *.html > "$TEMP/_temp-cat-html-result-$current_date_time.html";
start "$TEMP/_temp-cat-html-result-$current_date_time.html";
sleep 0;
exit;
} Just type Just a cross post of this AlttiRi/twitter-click-and-save#36 (comment) |
I've added
archive
andarchive-format
to this postprocessor and I'm wondering if this will re-download a post's text content whencontent
and/orembed[url]
is edited? I'm especially not sure ifarchive-format
should be{content}|{embed[url]}
or if I should be using two postprocessors forcontent
andembed[url]
respectively inarchive-format
.The text was updated successfully, but these errors were encountered: