Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other Danbooru-based sites? Specifically danbooru.2chan.jp #179

Open
Joebugg opened this issue May 5, 2024 · 4 comments
Open

Other Danbooru-based sites? Specifically danbooru.2chan.jp #179

Joebugg opened this issue May 5, 2024 · 4 comments

Comments

@Joebugg
Copy link

Joebugg commented May 5, 2024

Is your feature request related to a problem? Please describe.
Just an unimplemented site.

Describe the solution you'd like
Ability to automatically save metadata and not just right-click to save functionality.

Describe alternatives you've considered
Adding the support myself, then submitting changes. Saving page with images (builtin browser function).

Additional context
I was doing an image search on Google and found source of ancient images no where else. I noticed that this Japanese Danbooru-based site wasn't supported. I'm not sure even how popular it is, or if worth supporting. I just figured you might want to know about it, and other users might be able to benefit.

http://jun.2chan.net/script/ seems to be source code of the site's PHP, if that is useful. No, wait, haha, this is from 2005. There's no way this site would still be functioning with code that old and not be getting hacked every day.

BTW: Github still has that annoying issue (testing-library/user-event#1075 and related) that you have to press tab to get to "submit" button and press enter instead of just left-clicking. Otherwise it times out.

@Simon-Tesla
Copy link
Owner

Simon-Tesla commented May 5, 2024

Due to the way site plugins are implemented currently, each plugin needs to register as a handler for a specific URL or set of URLs (e621 is registered as the handler for e926 as well, for instance), so even if multiple sites use the same underlying booru software, each one would need to be registered individually.

It might make sense to see if there's a good way to allow Raccoony plugins to register as a more generic handler with some basic code to run that looks for some quick 'tell' in the DOM to recognize a given type of site so that it can scrape richer metadata off any site that uses that booru software, but it'd take a bit of doing.

@Joebugg
Copy link
Author

Joebugg commented May 5, 2024

TBF, looking at that sight closer, I think they just kept the code as simple as possible. So it might not be far from the actual PHP source they use. I could see them doing just security fixes, under the theory that complexity builds bugs. Also, the Rule of Lazy. ;)

So, I checked e621 and there's all of 2 posts that mention that site as a source, and 1 is ancient. This is definitely not a popular booru outside of Japan!
I doubt code written for recent boorus will even work correctly on that site. Ironically, it's too basic? It just has this links as endpoints (right term?):

https://danbooru.2chan.jp/index.php?page=post&s=view&id= <post ID>

html body div#content div#post-view div#right-col.content div div#note-container img#image
(For https://danbooru.2chan.jp/images/<load distribution hash>/<file data hash>.EXT

https://danbooru.2chan.jp/index.php?page=post&s=view&id=<post ID>#

https://danbooru.2chan.jp/index.php?page=history&type=tag_history&id=<post ID>

html body div#content div#post-view div#right-col.content div div#note-container div#c<number>
(Annoyingly, the comments are inside the image container?)

html body div#content div#post-view div.sidebar div#tag_list  (Tags)

Not sure if the notes/tag history is worth worrying about. "Posted on" is followed by upload date. This is not in the header, but the tags are. The artist name is just another tag and doesn't have a category like e621 uses. I could just list all the artist tags in a file and search if they're in the tags list but that seems like a bad idea. There is literally no list of artist tags. It's all just tossed in like beach sand.

XPaths:

//*[@id="image"]  (Normal link, sometimes 'original' is the same as viewed)
/html/body/div[3]/div/div[2]/div/div/a[2]  (If 'original' link exists it might not be the same as viewed)
//*[@id="c<number>"] (Comments)
//*[@id="tag_list"]  (Tags)

From looking at the source of the page, it looks like the relevant metadata to save would be the tags, the posted date, image URL/s, and comments? I guess we wouldn't get an artist tag. Unavoidable it seems unless they have a list of them somewhere on the site. :( The comment fields are c followed by 1-2 digits. c3 c4...c40 and so on. There's not actually that many comments for the entire site so not losing much!

@Joebugg
Copy link
Author

Joebugg commented May 5, 2024

LOL, at this point I'd say it's simpler to just save the HTML to a text file. Yeesh, this one doesn't actually have much to parse.

@Simon-Tesla
Copy link
Owner

Yeah, that's often the problem with a lot of these sorts of sites, there's not a ton of metadata to scrape in the first place. Looks like the most you'd get out of this is the list of tags, in terms of structured data Raccoony currently supports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants