Automatically redact PII from audio and video files in minutes with Python.
Companion repo for the blog Automatically redact PII from audio and video with Python.
Example output (set to redact names and phone numbers):
Good afternoon, MGK design. Hi. I'm looking to have plans drawn up for an addition in my house. Okay, let me have one of our architects return your call. May I have your name, please? My name is ####. ####. And your last name? My last name is #####. Would you spell that for me, please? # # # # # #. Okay, and your telephone number? Area code? ###-###-#### that's ###-###-#### yes, ma'am. Is there a good time to reach you? That's my cell, so he could catch me anytime on that. Okay, great. I'll have him return your call as soon as possible. Great. Thank you very much. You're welcome. Bye.
- Install the AssemblyAI Python SDK
pip install assemblyai
- Set your AssemblyAI API key as an environment variable (you can get a key here)
# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>
# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>
- Run
python main.py
to print a redacted transcript and URL for redacted version of a hard-coded audio file - (Optional) Install
termcolor
withpip install termcolor
, and then runpython compare.py
to print out a comparison of the unredacted and redacted transripts.