Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change youtube/index.mjs to not confuse chatgpt or run out of tokens #205

Merged
merged 2 commits into from
Apr 14, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions src/content-script/site-adapters/youtube/index.mjs
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
import { cropText } from '../../../utils'
import { config } from '../index.mjs'

function replaceHtmlEntities(htmlString) { // This function was written by ChatGPT and modified by me (iamsirsammy)
const doc = new DOMParser().parseFromString(htmlString.replace("&", "&"), 'text/html');
return doc.documentElement.innerText;
}

Comment on lines +4 to +8
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string here has been processed and does not contain HTML tag content. I don't think there is any difference here compared to directly using htmlString.replaceAll("&", "&"). Have I missed something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the tone I've been talking to customer support all day
This code is designed to remove all HTML Entities from code, that's what the DOMParser is for. For example, generally, ' is the html entity for an apostrophe ('). However, your code or YouTube subtitles or whatever has &#39; for an apostrophe (at least for some videos), meaning the & has to be replaced with &, then domparser has to be used to deal with the actual apostrophe. This should also apply for <> maybe ,/ and probably other characters, why domparser is used instead of many find and replaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We could also probably remove amp replacement and just run replace HtmlEntities twice but that feels silly)

export default {
init: async (hostname, userConfig, getInput, mountComponent) => {
try {
Expand Down Expand Up @@ -41,12 +46,14 @@ export default {
let subtitleContent = ''
while (subtitleData.indexOf('">') !== -1) {
subtitleData = subtitleData.substring(subtitleData.indexOf('">') + 2)
subtitleContent += subtitleData.substring(0, subtitleData.indexOf('<')) + ','
subtitleContent += subtitleData.substring(0, subtitleData.indexOf('<')) + ' '
}

subtitleContent = replaceHtmlEntities(subtitleContent)

return cropText(
`Provide a brief summary of the video using concise language and incorporating the video title.` +
`The video title is:"${title}".The subtitle content is as follows:\n${subtitleContent}`,
`Provide a brief summary of the following video using concise language, still including all the important details, and incorporating the video title.` +
`The video title is "${title}". The subtitle content is as follows:\n${subtitleContent}`,
)
} catch (e) {
console.log(e)
Expand Down