This project converts LaTeX documents into plain text you might read aloud. Use this program alongside espeak
, festival
, or gtts-cli
to convert a LaTeX document to speech without your figures, environments, and commands polluting the audio with gibberish.
As an example, consider the following test document (ripped from NYU’s Introduction to LaTeX):
% This is a simple sample document. For more complicated documents take a look in the exercise tab. Note that everything that comes after a % symbol is treated as comment and ignored when the code is compiled.
\documentclass{article} % \documentclass{} is the first command in any LaTeX code. It is used to define what kind of document you are creating such as an article or a book, and begins the document preamble
\usepackage{amsmath} % \usepackage is a command that allows you to add functionality to your LaTeX code
\title{Simple Sample} % Sets article title
\author{My Name} % Sets authors name
\date{\today} % Sets date for date compiled
% The preamble ends with the command \begin{document}
\begin{document} % All begin commands must be paired with an end command somewhere
\maketitle % creates title using information in preamble (title, author, date)
\section{Hello World!} % creates a section
\textbf{Hello World!} Today I am learning \LaTeX. %notice how the command will end at the first non-alphabet charecter such as the . after \LaTeX
\LaTeX{} is a great program for writing math. I can write in line math such as $a^2+b^2=c^2$ %$ tells LaTexX to compile as math
. I can also give equations their own space:
\begin{equation} % Creates an equation environment and is compiled as math
\gamma^2+\theta^2=\omega^2
\end{equation}
If I do not leave any blank lines \LaTeX{} will continue this text without making it into a new paragraph. Notice how there was no indentation in the text after equation (1).
Also notice how even though I hit enter after that sentence and here $\downarrow$
\LaTeX{} formats the sentence without any break. Also look how it doesn't matter how many spaces I put between my words.
For a new paragraph I can leave a blank space in my code.
\end{document} % This is the end of the document
Reading even this simple document with a text-to-speech program will waste your time reading latex commands like \begin{equation}
, all your comments, and preamble, possibly even character by character. The output from TexToSpeech
is:
Section Hello World!. Hello World! Today I am learning LaTeX. LaTeX is a great program for writing math. I can write in line math such as blah . I can also give equations their own space: blah If I do not leave any blank lines LaTeX will continue this text without making it into a new paragraph. Notice how there was no indentation in the text after equation (1). Also notice how even though I hit enter after that sentence and here blah LaTeX formats the sentence without any break. Also look how it doesn't matter how many spaces I put between my words. For a new paragraph I can leave a blank space in my code.
Try reading this with the command: cat test.tex | textospeech | festival --tts
. Notice that mathematics expressions convert to the text “blah”; this is how I usually proofread documents with mathematics. Later, I intend to convert mathematics to speech using Speech Rule Engine, to proofread mathematics as well as prose.
Building this program requires a working copy of cabal
and ghc
version 9.2.4. Execute cabal build
to build the program and cabal install
to install the program to ~/.cabal/bin
.
This is a work-in-progress! Eventually, I hope to allow you to:
- Toggle conversion of mathematics to text,
- Configure the list of skipped commands and environments,
- Configure the handling of citations,
- More to come…
- LaTeX parsing is notoriously hard. The library I use, HaTeX, does a reasonable job but sometimes falls short. At the moment, I handle
\section*
commands specially because they do not parse correctly. I know of other commands that don’t work correctly. I will try to submit upstream parser issues as they arrive.