-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An option to remove chapter title #1375
Comments
Can you share an example of a chapter this actually happens in? |
Here you go. I had also linked the novel used below. Also the downloaded novels from royalroad and webnovel have this p class = and div data ejs, respectively with random text. Some time ago it would only have the novel text only. If possible, please correct them to show novel text only. |
For future reference, data-ejs attributes were removed from webnovel in PR #1363. These changes aren't currently in the live build, and some junk data does still persist. They should, however be included in the build linked here: #1368 (comment) I'll check to see if something similar can be done for RR, but scrubbing classes isn't as cut & dry as removing entire attributes. |
Removed random identifier generated className.
@Kiradien @xeolod
|
As dteviot said above, that is probably the best way, I played around with a config to do the same and it could be a bit funky - especially due to author notes. I've pushed for PR on the cleanup code, however. |
Tested it on webnovel, almost all the junk data is removed. One div data ejs attribute still exists, but removed it using regex. |
Test versions for Firefox and Chrome with Kiradien's Royal Road cleanup have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. |
Remove title text when story text starts with copy of title. See: dteviot/WebToEpub#1375
Try this script to remove duplicated title text. let titleNode = dom.querySelector("h1")?.firstChild;
let titleText = titleNode?.data;
let filter = (node) => {
return (node !== titleNode) && (node.data == titleText)
? NodeFilter.FILTER_ACCEPT
: NodeFilter.FILTER_SKIP;
};
let walker = dom.createTreeWalker(
dom.body,
NodeFilter.SHOW_TEXT,
filter
);
let node = walker.firstChild()?.parentNode;
if (node != null) {
console.log(node.outerHTML);
node.remove();
return true;
}
return false; Tested with:
For my notes: 24 minutes work |
Thanks, it's working. |
@xeolod |
Is your feature request related to a problem? Please describe.
When downloading from royalroad, many novels have their chapter title in their body too, so the downloaded chapters have two titles in them, one from its chapter title and another from its body text.
Describe the solution you'd like
An option to remove chapter title. Just like remove author notes option.
Describe alternatives you've considered
I tried adding manual parser, but I couldn't make it work properly, so I am requesting this option.
Additional context
This option (Remove chapter title) might work for all hosts as it woud nullify the chapter title only.
The text was updated successfully, but these errors were encountered: