Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 263 Bytes

README.md

File metadata and controls

7 lines (5 loc) · 263 Bytes

Opensubtitles_dataset

downloads and parses subtitle dataset from opensubtitles.org

Usage

python3 parse_opensubtitle_xml.py

the above will download a zip containing the english opensubtitles corpus, and extract text from all the xml files (removes metadata)