An option to remove chapter title #1375

xeolod · 2024-07-09T08:50:21Z

Is your feature request related to a problem? Please describe.
When downloading from royalroad, many novels have their chapter title in their body too, so the downloaded chapters have two titles in them, one from its chapter title and another from its body text.

Describe the solution you'd like
An option to remove chapter title. Just like remove author notes option.

Describe alternatives you've considered
I tried adding manual parser, but I couldn't make it work properly, so I am requesting this option.

Additional context
This option (Remove chapter title) might work for all hosts as it woud nullify the chapter title only.

Kiradien · 2024-07-09T19:13:30Z

Can you share an example of a chapter this actually happens in?

xeolod · 2024-07-10T15:58:16Z

Here you go. I had also linked the novel used below.

Also the downloaded novels from royalroad and webnovel have this p class = and div data ejs, respectively with random text. Some time ago it would only have the novel text only. If possible, please correct them to show novel text only.

Kiradien · 2024-07-10T17:48:49Z

Also the downloaded novels from royalroad and webnovel have this p class = and div data ejs, respectively with random text. Some time ago it would only have the novel text only. If possible, please correct them to show novel text only.

For future reference, data-ejs attributes were removed from webnovel in PR #1363. These changes aren't currently in the live build, and some junk data does still persist. They should, however be included in the build linked here: #1368 (comment)

I'll check to see if something similar can be done for RR, but scrubbing classes isn't as cut & dry as removing entire attributes.
Either way, I'll give both of these a shot; I have a few ideas for both of these issues...

Removed random identifier generated className.

dteviot · 2024-07-10T19:27:26Z

@Kiradien @xeolod
I'm going to suggest that doing the "double title removal" might be better as a post processing step using EpubEditor.
Logic might be something like:

Find the H1 header, then the text in it.
Search for any other text nodes with the same text.
If any found, delete their enclosing element.

Kiradien · 2024-07-10T20:34:50Z

As dteviot said above, that is probably the best way, I played around with a config to do the same and it could be a bit funky - especially due to author notes. I've pushed for PR on the cleanup code, however.

xeolod · 2024-07-11T05:59:33Z

For future reference, data-ejs attributes were removed from webnovel in PR #1363. These changes aren't currently in the live build, and some junk data does still persist. They should, however be included in the build linked here: #1368 (comment)

Tested it on webnovel, almost all the junk data is removed. One div data ejs attribute still exists, but removed it using regex.

#1375 RoyalRoad Cleanup

dteviot · 2024-07-13T02:46:35Z

Test versions for Firefox and Chrome with Kiradien's Royal Road cleanup have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing.

Remove title text when story text starts with copy of title. See: dteviot/WebToEpub#1375

dteviot · 2024-07-19T08:32:32Z

@xeolod

Try this script to remove duplicated title text.

let titleNode = dom.querySelector("h1")?.firstChild;
let titleText = titleNode?.data;
let filter = (node) => {
    return (node !== titleNode) && (node.data == titleText)
        ? NodeFilter.FILTER_ACCEPT
        : NodeFilter.FILTER_SKIP;
};

let walker = dom.createTreeWalker(
  dom.body,
  NodeFilter.SHOW_TEXT,
  filter
);
let node = walker.firstChild()?.parentNode;
if (node != null) {
    console.log(node.outerHTML);
    node.remove();
    return true;
}
return false;

Tested with:

https://www.royalroad.com/fiction/59948/desolate-fate, chapters 1, 2, 3

For my notes: 24 minutes work

xeolod · 2024-08-05T05:41:09Z

Thanks, it's working.

dteviot · 2024-08-23T08:33:52Z

@xeolod
Updated version (0.0.0.167) has been submitted to Firefox and Chrome stores.
Firefox version is available now.
Chrome might be available in a few hours to 21 days.

Kiradien added a commit to Kiradien/WebToEpub that referenced this issue Jul 10, 2024

dteviot#1375 RoyalRoad Cleanup

674c9ef

Removed random identifier generated className.

Kiradien added a commit that referenced this issue Jul 11, 2024

Merge pull request #1376 from Kiradien/RoyalRoad_class_cleanup

890e54b

#1375 RoyalRoad Cleanup

dteviot added a commit to dteviot/EpubEditor that referenced this issue Jul 19, 2024

Add script to remove duplicated title

5af7e03

Remove title text when story text starts with copy of title. See: dteviot/WebToEpub#1375

dteviot added the Status: In Progress label Jul 19, 2024

Kiradien mentioned this issue Aug 8, 2024

Close issues #1406

Closed

Kiradien added Status: Completed and removed Status: In Progress labels Aug 8, 2024

dteviot closed this as completed Aug 23, 2024

Leifman35 mentioned this issue Sep 29, 2024

Royalroad 'Double chapter title' or removing title option #1528

Closed

dteviot mentioned this issue Jan 6, 2025

Problem #1622

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An option to remove chapter title #1375

An option to remove chapter title #1375

xeolod commented Jul 9, 2024

Kiradien commented Jul 9, 2024

xeolod commented Jul 10, 2024

Kiradien commented Jul 10, 2024

dteviot commented Jul 10, 2024

Kiradien commented Jul 10, 2024

xeolod commented Jul 11, 2024

dteviot commented Jul 13, 2024

dteviot commented Jul 19, 2024

xeolod commented Aug 5, 2024

dteviot commented Aug 23, 2024

An option to remove chapter title #1375

An option to remove chapter title #1375

Comments

xeolod commented Jul 9, 2024

Kiradien commented Jul 9, 2024

xeolod commented Jul 10, 2024

Kiradien commented Jul 10, 2024

dteviot commented Jul 10, 2024

Kiradien commented Jul 10, 2024

xeolod commented Jul 11, 2024

dteviot commented Jul 13, 2024

dteviot commented Jul 19, 2024

xeolod commented Aug 5, 2024

dteviot commented Aug 23, 2024