See The alignment problem in the docs for more background of the problem this module set out to address.
Originally developed as a node version of python's stt-align by Chris Baume - BBC R&D.
git clone git@github.com:bbc/stt-align-node.git
cd stt-align-node
npm install
npm install @bbc/stt-align-node
Other then to realign STT results with accurate text, this modules can also be used to perform related oprations in the same domain, such as benchmarking STT.
Function | Description | type |
---|---|---|
alignSTT |
Realign STT json with accurate text. by transposing words from accurate text to timecodes of STT. | json |
diffsList |
return a diff json of STT vs accurate text | json |
diffsListAsHtml |
return a diff of STT vs accurate text as HTML | html |
diffsCount |
return a diff of STT vs accurate text as HTML | json |
calculateWordDuration |
return a diff of STT vs accurate text as HTML | Number |
See See README
in example-usage
folder as well as code examples for more.
Node version of stt-align by Chris Baume - R&D.
In pseudo code overview of alignSTT
:
-
input, output as described in the example usage.
- Accurate base text transcription, string.
- Array of word objects transcription from STT service.
-
Align words
-
normalize words, by removing capitalization and punctuation and converting numbers to letters
-
generate array list of words from base text, and array list of words from stt transcript.
- get opcodes using
difflib
comparing two arrays - for equal matches, add matched STT word objects segment to results array base text index position.
- Then iterate to result array to replace STT word objects text with words from base text
- get opcodes using
-
interpolate missing words
- calculates missing timecodes
- first optimization
- using neighboring words to do a first pass at setting missing start and end time when present
- Then Missing word timings are interpolated using interpolation library
'everpolate
.
-
- node
10
- npm
6.1.0
npm run build
bundles the code with react, into a ./build
folder.
npm run build:demo
Demo is in docs folder
Publish demo to github pages
npm run deploy:ghpages
npm run test:watch
- add more tests
Deploy to npm
npm run publish:public