- Record with QuickTime
- Record with Zoom
- Split the MP4 video
- Add captions using AWS transcribe service
- Improve automated transcription with custom vocabulary
- Manually review and fix the transcription
- Generate a transcript with video references
- Wrap in HTML with some Javascript controls
- Batching it all up
- Review locally before deploying
- Deploy to a web server
- Deploy to Google Docs
- Browser Compatibility
Use Command-Shift-5 and make a Quicktime recording.
For an alternative, see Record with Zoom, below.
Note that recording the Retina display makes a giant resolution file. Find a way to reduce it to something reasonable.
Here are some speed comparisons:
ffmpeg conversion of a 5:54 long fullscreen recording (3360 x 2114 resolution, 60 fps):
frame= 432 fps= 60 q=31.0 size= 1042kB time=00:00:07.12 bitrate=1198.7kbits/s speed=0.993x
The same video after downscaling the Quicktime and reducing the frame rate from 60 to 15 fps with ffmpeg (Quicktime Export as 720p; 1152 x 738 resolution, 15 fps):
frame= 1525 fps=252 q=27.0 size= 5888kB time=00:01:42.57 bitrate= 470.3kbits/s dup=0 drop=4355 speed= 17x
Using ffmpeg to downscale the QuickTime works OK too (and can be batched). It runs at about 2.5x.
ffmpeg -i resolution.mov -vcodec h264 -acodec mp2 -r 15 -vf "scale=qhd" resolution-scaled-qhd.mp4
Some size comparisons of various scaled resolutions:
(env) screencast$ ls -l 02/01jsonapi*.mp4
-rw-r--r-- 1 ac45 1749373939 43916456 Jul 6 14:44 02/01jsonapi-15fps.mp4
-rw-r--r--@ 1 ac45 1749373939 24769616 Jul 6 14:16 02/01jsonapi-720-15fps.mp4
-rw-r--r--@ 1 ac45 1749373939 32668772 Jul 6 14:13 02/01jsonapi-720.mp4
-rw-r--r--@ 1 ac45 1749373939 21752453 Jul 6 14:51 02/01jsonapi-scaled-840-528.mp4
-rw-r--r-- 1 ac45 1749373939 21492952 Jul 6 15:05 02/01jsonapi-scaled-hd480.mp4
-rw-r--r--@ 1 ac45 1749373939 25663549 Jul 6 14:58 02/01jsonapi-scaled-hd720.mp4
-rw-r--r-- 1 ac45 1749373939 22618104 Jul 6 15:08 02/01jsonapi-scaled-qhd.mp4
-rw-r--r-- 1 ac45 1749373939 73275598 Jul 6 11:47 02/01jsonapi.mp4
Within the screen recording you can trim off the beginning and end. Use iMovie or something else to edit further.
Next, convert the MOV video to MP4 so it can be viewed in a browser. You can use ffmpeg for this.
$ brew install ffmpeg
$ ffmpeg -i test.mov -vcodec h264 -acodec mp2 -r 15 test.mp4
A perhaps better alternative to recording with QuickTime, although one that comes at a cost, is to record a Zoom meeting with yourself. Unlike with QuickTime recordings that can only record the full screeen or a portion of it, Zoom can record the full screen or a specific application window. When recording, select Record on this Computer. At the end of the meeting, the recording will be delivered as an MP4 so you don't have to transcode with with ffmpeg.
The Zoom session may contain multiple topics. You can split the zoom_0.mp4 into multiple files with tools like avidemux:
$ brew cask install avidemux
In Finder, go to Applications and open Avidemux, then:
- File/Open zoom_0.mp4
- Click on "A" to set the start point of the clip.
- Scroll forward to the end and click on "B".
- Select "MP4 Muxer" as the Output Format.
- Select "Save" and give it a name (e.g. zoom_1.mp4).
Create a WEBVTT file and just add the text track <track>
.
See aws-transcribe.py which does a few things:
usage: aws-transcribe.py [options] infile outfile
positional arguments:
infile local file or s3 url
outfile output VTT file
optional arguments:
-h, --help show this help message and exit
--srt SRT output SRT file
--profile PROFILE AWS config profile [default alan:CTO]
--job JOB existing AWS transcribe job
--saved_job SAVED_JOB
use saved AWS transcribe job output
--job_delete delete transcription job
--vocabulary VOCABULARY
custom vocabulary to use [default: jsonapi]
--s3bucket S3BUCKET s3 bucket [default: n2ygk]
--s3_delete delete s3 file
- Copies the .mp4 file to S3. If the file is already there, provide an s3:// URL that references it.
- Submits a transcribe job. If the job has already been submitted, provide the
--job
option. - Waits for the job to complete.
- Pulls the transcription data from AWS.
- Merges the individual words and punctionation into phrases.
- Outputs a .VTT file.
- Deletes the transcripion job results unless
--no_delete
option is used.
The transcription will likely have some systemic errors for things like jargon, specific acronynms and so on. You can upload a custom vocabulary to AWS and apply it to the transcription.
(env) screencast$ aws s3 cp jsonapi-vocab.txt s3://n2ygk --profile alan:CTO
(env) screencast$ aws transcribe create-vocabulary --vocabulary-name jsonapi --language-code en-US --vocabulary-file-uri s3://n2ygk/jsonapi-vocab.txt --profile alan:CTO
{
"VocabularyName": "jsonapi",
"LanguageCode": "en-US",
"VocabularyState": "PENDING"
}
...
(env) screencast$ aws transcribe list-vocabularies --profile alan:CTO
{
"Vocabularies": [
{
"VocabularyName": "jsonapi",
"LanguageCode": "en-US",
"VocabularyState": "READY"
}
]
}
Here's an example of some vocabulary improvments:
4,5c4,5
< 00:00:00.700 --> 00:00:04.970
< This is an introduction at rest ful ap eyes rests
---
> 00:00:00.700 --> 00:00:05.670
> This is an introduction at RESTful APIs rests APIs air
8,9c8,9
< 00:00:04.970 --> 00:00:07.870
< AP Eyes were frequently used for micro service is and
---
> 00:00:05.670 --> 00:00:07.510
> frequently used for microservices.
12,13c12,13
< 00:00:07.870 --> 00:00:10.140
< we'll talk a little bit more about that as we ...
---
291,297c283,285
< 73
< 00:03:43.980 --> 00:03:44.570
< hate. Oh,
<
< 74
< 00:03:44.570 --> 00:03:45.790
< yes, kind of thing,
---
> 71
> 00:03:43.980 --> 00:03:45.790
> HATEOAS kind of thing,
The transcription will have some errors. Edit the .vtt
file to fix them.
aws-transcribe
will not overwrite an existing .vtt file.
Some further manual improvements:
240,241c240,241
< 00:03:12.070 --> 00:03:15.300
< which will get into later called JSON API I dot
---
> 00:03:12.070 --> 00:03:15.580
> -- which we will get into later -- called JSONAPI.org
vtt2transcript.py reads the (corrected) .vtt
file and generates a hyperlinked
transcript where each phrase from the captioning causes the video to jump to the associated time.
Here's an example:
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" type="text/css" href="transcript.css">
<script type = "text/javascript" src="videocontrols.js"></script>
<title>s3test</title>
</head>
<body>
<div style="text-align:center">
<button onclick="makeSmall()">Small</button>
<button onclick="makeBig()">Big</button>
<br><br>
<video id="video1" width="50%" controls preload="metadata">
<source src="s3test.mp4" type="video/mp4">
<track label="English" kind="captions" srclang="en" src="s3test.vtt" default>
Your browser does not support HTML5 video.
</video>
<br>
<button onclick="slowerSpeed()">Slow down</button>
<button onclick="normalSpeed()">Speed 1.0</button>
<button onclick="fasterSpeed()">Speed up</button>
<button onclick="captionsOn()">Captions ON</button>
<button onclick="captionsOff()">Captions Off</button>
</div>
<script>
setVideo("video1");
</script>
<p>Transcript</p>
<div w3-include-html="s3test-transcript.html"></div>
<script>includeHTML()</script>
</body>
</html>
See batchtranscribe.sh and batchvtt2transcript.sh which convert and transcribe the Quicktime videos, and then, after cleaning up the VTT transcripts, generates hyperlinked transcripts.
NOTE: That when you try to test locally by just opening the file, the subtitles will not work in Chrome but will work in other browsers like Firefox. However, the transcript file refuses to display.
So, run a simple http server locally:
(env) screencast$ npm install http-server -g
/usr/local/bin/http-server -> /usr/local/lib/node_modules/http-server/bin/http-server
/usr/local/bin/hs -> /usr/local/lib/node_modules/http-server/bin/http-server
+ http-server@0.11.1
added 3 packages from 6 contributors and updated 10 packages in 0.905s
(env) screencast$ http-server
Starting up http-server, serving ./
Available on:
http://127.0.0.1:8080
http://160.39.178.127:8080
Hit CTRL-C to stop the server
Just copy the files to a web server:
(env) screencast$ scp videocontrols.js 02/01jsonapi.mp4 02/01jsonapi.vtt 02/01jsonapi.html 02/01jsonapi-transcript.html alan@cunix:public_html/
(env) screencast$ ssh alan@cunix chmod a+r public_html/01*
Google Docs magically handles MP4 files and you can even add captions:
- Google only supports SRT transcript files.
aws-transcribe
has a--srt FILE
option to generate this at the same time as the VTT is generated. - Upload the MP4 video file to google docs.
- Click on it after waiting a while -- it will tell you it's still processing it.
- Click on the vertical
. . .
menu and select "manage caption tracks" and upload your SRT file.
TODO: So far this only works fully with Chrome.
Firefox: Fails to render the transcript text.
Safari: Fails to play sound.