Skip to content

Convert LaTeX to plain text for consumption by text-to-speech programs

Notifications You must be signed in to change notification settings

lispandfound/TexToSpeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TexToSpeech

This project converts LaTeX documents into plain text you might read aloud. Use this program alongside espeak, festival, or gtts-cli to convert a LaTeX document to speech without your figures, environments, and commands polluting the audio with gibberish.

As an example, consider the following test document (ripped from NYU’s Introduction to LaTeX):

% This is a simple sample document.  For more complicated documents take a look in the exercise tab. Note that everything that comes after a % symbol is treated as comment and ignored when the code is compiled.

\documentclass{article} % \documentclass{} is the first command in any LaTeX code.  It is used to define what kind of document you are creating such as an article or a book, and begins the document preamble

\usepackage{amsmath} % \usepackage is a command that allows you to add functionality to your LaTeX code

\title{Simple Sample} % Sets article title
\author{My Name} % Sets authors name
\date{\today} % Sets date for date compiled

% The preamble ends with the command \begin{document}
\begin{document} % All begin commands must be paired with an end command somewhere
    \maketitle % creates title using information in preamble (title, author, date)

    \section{Hello World!} % creates a section

    \textbf{Hello World!} Today I am learning \LaTeX. %notice how the command will end at the first non-alphabet charecter such as the . after \LaTeX
     \LaTeX{} is a great program for writing math. I can write in line math such as $a^2+b^2=c^2$ %$ tells LaTexX to compile as math
     . I can also give equations their own space:
    \begin{equation} % Creates an equation environment and is compiled as math
    \gamma^2+\theta^2=\omega^2
    \end{equation}
    If I do not leave any blank lines \LaTeX{} will continue  this text without making it into a new paragraph.  Notice how there was no indentation in the text after equation (1).
    Also notice how even though I hit enter after that sentence and here $\downarrow$
     \LaTeX{} formats the sentence without any break.  Also   look  how      it   doesn't     matter          how    many  spaces     I put     between       my    words.

    For a new paragraph I can leave a blank space in my code.

\end{document} % This is the end of the document

Reading even this simple document with a text-to-speech program will waste your time reading latex commands like \begin{equation}, all your comments, and preamble, possibly even character by character. The output from TexToSpeech is:

Section Hello World!.
Hello World! Today I am learning LaTeX. LaTeX is a great program for writing math. I can write in line math such as blah . I can also give equations their own space:
blah
If I do not leave any blank lines LaTeX will continue this text without making it into a new paragraph. Notice how there was no indentation in the text after equation (1).
Also notice how even though I hit enter after that sentence and here blah
LaTeX formats the sentence without any break. Also look how it doesn't matter how many spaces I put between my words.

For a new paragraph I can leave a blank space in my code.

Try reading this with the command: cat test.tex | textospeech | festival --tts. Notice that mathematics expressions convert to the text “blah”; this is how I usually proofread documents with mathematics. Later, I intend to convert mathematics to speech using Speech Rule Engine, to proofread mathematics as well as prose.

Building and Installing

Building this program requires a working copy of cabal and ghc version 9.2.4. Execute cabal build to build the program and cabal install to install the program to ~/.cabal/bin.

Command Line Interface

This is a work-in-progress! Eventually, I hope to allow you to:

  • Toggle conversion of mathematics to text,
  • Configure the list of skipped commands and environments,
  • Configure the handling of citations,
  • More to come…

Limitations

  • LaTeX parsing is notoriously hard. The library I use, HaTeX, does a reasonable job but sometimes falls short. At the moment, I handle \section* commands specially because they do not parse correctly. I know of other commands that don’t work correctly. I will try to submit upstream parser issues as they arrive.

About

Convert LaTeX to plain text for consumption by text-to-speech programs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published