It basically accepts any html text, but can also be markdown text and analyses the dimensions:
- readTime - number of minutes it is required to read this text
- keywords - phrases of 1, 2, 3 words that repeat themselves in the text.
- vulgarityIndex - you need to scan for vulgar words in english and calculate an index for a story if it is vulgar or not.
- nudityIndex - images need to be analysed if they contain adult content
- images: need to be parsed from the text into a separate array (ordered by occurance in the text!)
- language -> recognise language of the text. It needs to work great for english, japanese spanish and german.
- plain - plain version of a text without html tags and images that could be for example sent out in an email
- textImageRatio
- compressed version of the plain text.
Any HTML text
{
readTime: number,
keywords: {
1: string[]
2: string[]
3: string[]
}
compressed: string
nudityIndex: number (0:1)
vulgarityIndex: number (0:1)
images: [{ url: string }],
language: "en" | "de" etc
textImageRatio: number
plain: string
}
interface TextAnalyzer {
getReadTime: () => Text
getPlainText: () => Text
extractImages: () => Images
analyzeLang: () => Lang
extractKeywords: (noOfWordsInKeyword) => Keywords
analyze: () => TextAnalysis // get complete analysis
}
npm i ath-text-processing-package
This script will build the component:
npm run build
This script will build and run the application.
npm run start
MIT