Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Novels: how to handle them ? #493

Open
MikeZeDev opened this issue Feb 5, 2024 · 0 comments
Open

Novels: how to handle them ? #493

MikeZeDev opened this issue Feb 5, 2024 · 0 comments
Labels
META Meta issues : Discussions , notes

Comments

@MikeZeDev
Copy link
Contributor

MikeZeDev commented Feb 5, 2024

MANGA/NOVEL detection

=======================

  • Fundamentally a Novel isnt much different than a Manga. (ID, title, chapters)
  • The differences only appears at the FetchPages and FetchImages level (which obviously wont be called FetchImages)

If the website is known to only hosts Novels we can use a generic decorator to extract text .
The problem relies in Madara, Mangastream , HeanCMS, whatever : where you can have Mangas & Novels in the same list and the difference can be found when getting "Pages" only.

  • For Madara you have a particular CSS element to tell content is a bunch of html. For Mangastream i've seen a isNovel JS variable but we can use a proper CSS as well i guess. In HeanCMS api returns html content of the novel chapter, or a list of pictures in case of a manga chapter.

That is a fundamental problem : At which level are we able to tell "this is a novel" or "this is manga"? It varies from a website to another. and many website handle both as similar content, until its time to display "the pages".

What to do with content

=======================

So we got the html. From the page or the api but we got it. Now what?

  • There is the bloat removing step. I think by default we can remove scripts tags, and the "onxxx" attributes. Then there are needs to remove bloat depending on websites.

  • Some novel chapters comes with pictures. Should we download them too? In that case, should we fix the html with the downloaded image paths?

  • User can choose to save it as html, or as a picture?

  • In case of saving as html, how do we handle theming? Should we deliver themed html templates that the user can choose? How to handle previewing? Just previewing it as picture is safer i think.

  • Are we still using html2canvas to generate picture from text?

  • more questions incoming

@MikeZeDev MikeZeDev added the META Meta issues : Discussions , notes label Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
META Meta issues : Discussions , notes
Projects
None yet
Development

No branches or pull requests

1 participant