-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathFAQ.txt
530 lines (400 loc) · 24.5 KB
/
FAQ.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
----------------------------------------------------------------------------
Frequent Asked Questions (FAQ) on programming by voice
Version of 7/1/2017
Latest version available at <http://vocola.net/programming-by-voice-FAQ.html>
Created by [Mark](mailto:mdl@alum.mit.edu) with help from the voice coders mailing list
Contents
1. Can I program by voice?
1.1. Can I program primarily or entirely by voice?
1.2. Who programs entirely or mostly by voice?
1.3. What if I just want to type and click less?
1.4. What if my main language isn't English?
1.5. I am a disabled college student who cannot type.
Can I become a programmer?
2. How can programming by voice possibly work?
2.1. Isn't it hard to enter code that contains a lot of
punctuation like 'input := open("foo.txt");'?
2.2. How do programmers enter unpronounceable "words" like
strncpy, cosh, or ostreambuf_iterator?
2.3. What are some tasks that programming by voice is bad at?
2.4. What are some tasks that programming by voice is faster for?
3. Why is it so hard to get programming by voice working?
3.1. How long does it take to start programming by voice?
3.2. Why is programming by voice hard to learn / build systems
for?
3.3. Why are there no ready-made systems for programming by voice?
3.4. Are some languages/editors/IDEs/domains easier than others?
4. How can I get started programming by voice?
4.1. What's the best speech recognizer for programming by voice?
4.2. What's the best hardware for programming by voice?
4.3. What's the best software for creating voice commands?
4.4. Are there any demos I can watch?
4.5. Where can I ask questions?
4.6. Are there any other helpful resources?
5. Disclaimers and licensing
----------------------------------------------------------------------------
1. Can I program by voice?
1.1. Can I program primarily or entirely by voice? That is, using
speech recognition rather than a keyboard or mouse?
Yes. Many people do this including people paralyzed from the neck
down. Larry Allen, of pcspeak.com, estimates that there are 100
full-time programmers doing essentially all programming tasks by
voice with a further 1,000 to 10,000 programmers doing some
programming tasks by voice.
1.2. Who programs entirely or mostly by voice?
Today mostly people with a disability that rules out using a
keyboard -- e.g., carpal tunnel, wrist tendinitis, and other forms
of repetitive strain injury (RSI) -- and people that have recovered
from such disabilities. Uninjured people are put off by the effort
required to get programming by voice working even though it is
faster for many tasks.
1.3. What if I just want to type and click less?
Although this FAQ focuses on how to avoid using your hands at all
or more than a small amount per day, you can use simplified
versions of these techniques to type and click a lot less.
To start, you can dictate emails, documentation, and other English
text; browse the web via voice commands; and use voice commands to
click the mouse. Clicking the mouse can also be done by using a
foot switch or "auto click", where software automatically clicks
the mouse if it remains in one place long enough.
1.4. What if my main language isn't English?
Dragon NaturallySpeaking supports Dutch, French, German, Italian,
Spanish, and Japanese whereas traditional and simplified Chinese
are supported by Windows Speech Recognition. If you speak one of
these languages, you can dictate to and control your computer in
that language. That said, English is used in most of the relevant
forums and third-party documentation, so it helps to understand at
least some English.
1.5. I am a disabled college student who cannot type. Can I become a
programmer?
Yes. Master dictating and the basic built-in voice commands of
your speech recognition program, preferably first. You may need a
scribe or other assistance for your initial programming classes.
Later, self-paced classes or classes with group projects may be
good choices.
If you struggle in your initial classes, you must decide whether
you struggle because you cannot use a keyboard or because
programming is not for you. Not everyone is cut out to be a
programmer -- being a successful programmer requires an unusual set
of talents, and many bright people have failed at programming.
----------------------------------------------------------------------------
2. How can programming by voice possibly work?
2.1. Isn't it hard to enter code that contains a lot of punctuation like
'input := open("foo.txt");'?
To handle punctuation, programmers can add special vocabulary words
(e.g., "colon equals" for ":=", "left paren" for "(" with no
preceding space), use code templates, or write editor code that
translates more English-like language into actual code. Even with
these methods, it may take somewhat longer to enter code than
English. This isn't a big deal, however, because programmers, in
practice, don't spend much time entering new code.
2.2. How do programmers enter unpronounceable "words" like strncpy,
cosh, or ostreambuf_iterator?
When these words occur frequently, programmers just create new
vocabulary words with English spoken names (e.g., "D fun" for defun
and "short square root" for sqrt). Unfortunately, this method does
not work for rare words because no one can remember many rarely
used names. Programmers instead use more complicated methods for
these words, which are usually program symbols:
New unpronounceable symbols are usually either spelled (e.g.,
"spell c o s h") or produced by entering then modifying an existing
symbol. Existing symbols can be entered by several methods:
Programmers can copy a visible symbol (e.g., insert the first
symbol on line 10), or they can repeat a recent symbol by choosing
from a numbered list of recently entered symbols. Alternatively,
they can specify an existing symbol by any English words that it
abbreviates (e.g., "output stream buffer iterator" for
ostreambuf_iterator). Finally, programmers can use IDE completion
by partially spelling out symbols or choosing completions from a
list.
By comparison, symbols made from pronounceable words like
camel-case symbols (e.g., RelayClientDisconnected) are easy to
enter. A simple voice command can join the component English words
or editor code can do so after the words are dictated.
2.3. What are some tasks that programming by voice is bad at?
People find selecting or positioning unlabeled, graphical elements
-- think creating PowerPoint diagrams or using GUI creators -- by
voice alone frustratingly slow. They find it easy, however, to
select elements that have been labeled (e.g., Mouseless Browsing's
labeled HTML links).
When elements cannot be labeled, people often supplement speech
recognition with mouse replacements like head or eye trackers.
Touchscreens or pen-based tablets may also work for some people if
not used too much per day.
2.4. What are some tasks that programming by voice is faster for?
Most people speak much faster than they can type, so anything
involving writing down English (e.g., email, documentation,
comments) is going to be faster by voice. The average typist types
40 words per minute (WPM) but could dictate at over 150 WPM with
some practice -- almost 4 times faster.
Routine tasks can also be faster because you can use more shortcuts
with voice. Voice allows for more usable shortcuts because people
remember named shortcuts far better than keystroke combinations.
Thousands of voice shortcuts are common. Many people, for example,
make a separate voice shortcut to move to each frequently used
directory and source file.
----------------------------------------------------------------------------
3. Why is it so hard to get programming by voice working?
3.1. How long does it take to start programming by voice?
Peoples' experience varies, but getting a basic system working
seems to take a small number of months. For example, Tavis Rudd
estimates it took him three months to get productive again. A more
complete system can take a year.
3.2. Why is programming by voice hard to learn / build systems for?
First, a specialized language must be created and learned. Editing
text (programming code or otherwise) involves many small operations
that need concise descriptions to be efficient. Consider by
analogy how a waitress communicates with a short order cook: she
doesn't use "natural language" like "I need two grilled cheese
sandwiches, one with fries"; rather, she says something concise
like "two cheese one fry". A sample editing command might be "go
14 leap semicolon erase", meaning go to the start of the visible
line whose line number's last two digits are 14, move the cursor
forward to the next semicolon, then erase 1 character (namely, the
semicolon).
Second, a large number of varied commands needs to be created.
Editing and code dictation commands alone are not enough; commands
are also needed for browsing code, manipulating files, using
version control, compiling, debugging, reviewing code, emailing,
and browsing the web.
Third, unless the applications the programmer wants to use are
especially keyboard friendly, they will need to be extended to
support voice control. Extending can be done by using a extension
language (e.g., elisp), by writing a plug-in, or by modifying the
source code. Extension is often needed so you can easily verbally
pick an item out of a list -- the method of "hold down an arrow key
until you reach the item you want" is far too slow when done by
voice. Better methods include visually labeling items so the
programmer can say the desired item's label and picking items by
saying words they contain. The later method may require extracting
the names of the items from the application at runtime if the set
of items varies.
3.3. Why are there no ready-made systems for programming by voice?
We believe no company has attempted to build one of these because
the market is too small: There are many fewer disabled programmers
than lawyers and doctors who could benefit from speech recognition.
Moreover, programmers use hundreds, if not thousands, of
incompatible languages, editors, and IDEs. This means any one
programming-by-voice system can capture only a fraction of the
disabled programmer market. Even the larger market for helping the
disabled in general control their computers by voice attracts
little commercial attention. These market realities are why speech
recognition vendors emphasize ease of learning and natural language
commands (e.g., "move down 2 lines") rather than ease of use (e.g.,
"down 2").
Open source, by contrast, is mostly driven by programmers with an
itch--in this case, disabled programmers trying to program by
voice. Unfortunately, disabled programmers are usually too busy
trying to do their day jobs and get/keep a programming-by-voice
system working for themselves to have much time to contribute to
open source. They do not have time to convert a system that only
works for them to a portable, reusable system. Because of these
constraints and the fragmentation of programming environments, most
of the open source to date has been around tools for building voice
commands.
An additional complication is that programming-by-voice systems
are most effective when they match the cognitive style and naming
conventions of their users (e.g., it's harder to remember command
names chosen by other people); this further fragments the
community's efforts.
There is one quasi-exception to no open source system being
available, namely VoiceCode. VoiceCode is an open source system
for Emacs that supports many popular programming languages.
Unfortunately, it appears to be living-dead software, slowly
rotting, and without any maintainers. As of July 2017, it does not
work with the 2012+ versions of the associated speech recognizer.
3.4. Are some languages/editors/IDEs/domains easier than others to
program by voice with?
Yes. For languages, ones like Java and Ruby that by convention use
names made out of English words are easier than languages like C++
that encourage unpronounceable names. For editors and IDEs, ease of
extensibility makes things easier. For domains, command-line-only
programs are easier to create by voice than GUIs or web applications
because graphic elements need not be manipulated.
Gnuemacs stands out as especially easy to use for programming by
voice: it provides excellent extensibility via elisp as well as
packages for manipulating files, using version control, compiling
source code, and the like. Moreover, all of this functionality can
be used without a mouse.
----------------------------------------------------------------------------
4. How can I get started programming by voice?
4.1. What's the best speech recognizer for programming by voice?
There are only two recognizers usable today for serious programming
by voice. Best is Dragon NaturallySpeaking (Dragon) -- recently
renamed to Dragon Professional Individual -- by Nuance. Second
best is Windows Speech Recognition (WSR), which is included in
Microsoft's recent operating systems. Although WSR is free unlike
Dragon, more than one voice coder has given up on it due to its
reduced accuracy, particularly with noise words.
Both of these recognizers run under only Windows. They may,
however, be used under Linux or Mac OS by running them in a virtual
machine. They can also be used with Linux or Macs by using a
Windows PC as a "smart terminal": the programmer sits at the
Windows PC and SSH's into one or more Linux/Mac OS boxes and pops
up xterms, Emacs, and the like via X11 on the Windows PC. These
approaches forgo some functionality: Dragon and WSR provide
built-in commands for controlling applications written using the
standard Windows graphical toolkit and text controls; Dragon also
provides natural language commands for Microsoft Office and
Internet Explorer. None of this functionality is available for
other operating systems.
DragonDictate for the Mac, also made by Nuance, is good for
dictation but its command and control functionality is way behind
that of Dragon. Siri and similar programs likewise do well at
dictation but poorly at commands.
Dragon comes in several versions; you want Dragon Professional
Individual 14 or the slightly older Dragon NaturallySpeaking
premium 13. The most recent version, DPI 15, should be avoided as
it has a bug that makes it unusable with many of the best
programming-by-voice tools (including NatLink, Vocola, and
DragonFly). There doesn't appear to be a compelling reason to
choose the more expensive DPI 14 over DNS 13. The older DNS 12
premium/professional actually works better for programming by
voice, but is no longer available.
4.2. What's the best hardware for programming by voice?
Although Dragon NaturallySpeaking works okay with most recent PCs,
it works best with a fast processor (e.g., Intel core i7, 3+ GHz)
and lots of RAM (8 GB or more). If you will be working in a noisy
area, you will want to invest in a high quality noise-canceling
microphone. There are many styles of microphones with different
people preferring different styles, so we'll just refer you here to
some vendor guides:
+ [KnowBrainer microphone comparison guide][KB]
+ [Speech Recognition Solutions guide][SR]
[KB]: http://www.knowbrainer.com/core/pages/miccompare.cfm?
[SR]: http://www.speechrecsolutions.com/microphone_selection_guide.htm
4.3. What's the best software for creating voice commands?
Vocola and/or DragonFly are your best choice. Both are open source
and work with both Dragon and Windows Speech Recognition (WSR).
Vocola provides a concise language for writing voice commands; here
are four example commands:
Copy That = {Ctrl+c};
Copy to WordPad = {Ctrl+a}{Ctrl+c} AppBringUp(WordPad);
1..40 (Left | Right | Up | Down) = {$2_$1};
Sort by (Date=e | Sender=n | Subject=s) = {Alt+v}o $1;
Vocola supports user-defined functions, optional command parts,
arbitrary text parts, multiple commands in a single utterance, and
easy-to-write extensions written in Python (Dragon version) or .Net
(WSR version).
DragonFly, by contrast, is a Python programming framework for
creating voice commands. Dragonfly's commands are less concise
than Vocola's but can do pretty much anything possible with the
speech recognition engines. Vocola's standard functionality
probably suffices for 95-99% of the voice commands a programmer
wants, so we recommend starting with Vocola and only later using
DragonFly or custom Vocola extensions for the commands that need
more power.
4.4. Are there any demos I can watch?
Here are some demos that you may find inspiring. The demonstrated
systems vary in maturity and effectiveness. None shows all the
most effective known techniques. For example, Tavis Rudd doesn't
demonstrate the use of line numbers and pauses more than necessary.
Also, some of the speakers are intentionally speaking slowly to
make their demos easier to follow or are using now-decade-old
hardware and software, which cannot keep up in real time.
+ An excellent 2013 demo given by Tavis Rudd at PyCon, entitled
["Using Python to Code by Voice"][TR]. Tavis says that in
practice he uses line numbers heavily and goes much faster, in
part by using shortcuts he didn't show for fear of angering the
demo gods.
+ A follow-up demo on [VimSpeak][VS] by Ashley Feniello; note that
he does not appear to have speech recognition set up optimally,
and thus has a much higher misrecognition rate than expected.
+ The original [VoiceCode demo][VC] (the audio is missing in the
beginning)
+ Nils Klarlund's demo of his [ShortTalk system][ST], which
demonstrates power of stringing together many small editing
commands
+ Jason Veldicott's [demo][JV], which uses approximate string
matching for dictating existing symbols
[TR]: https://www.youtube.com/watch?v=8SkdfdXWYaI
[VS]: https://www.youtube.com/watch?v=TEBMlXRjhZY
[VC]: http://www.youtube.com/watch?v=A7f9Iik3q58
[ST]: https://www.youtube.com/watch?v=pXOBKlJ5Oms
[JV]: http://www.youtube.com/watch?v=BIJBl5llcQ4
4.5. Where can I ask questions?
For general speech recognition questions (e.g., "What's a good
microphone?" or "How do I select cells in Excel?"), the best
current forums are at [KnowBrainer][KB]. The Dragon
NaturallySpeaking Speech Recognition [forum][NS] is particularly
good. Similar forums in German can be found [here][G].
For programming-specific questions, the voice-coders [mailing
list][VC] is probably your best bet.
[KB]: https://www.knowbrainer.com/forums/forum/index.cfm
[NS]: https://www.knowbrainer.com/forums/forum/categories.cfm?catid=4
[G]: http://dragon-spracherkennung.forumprofi.de/
[VC]: https://groups.yahoo.com/neo/groups/VoiceCoder/info
4.6. Are there any other helpful resources for building a programming by
voice system?
Here are some other useful links:
+ The [Vocola website](http://vocola.net/)
+ The [DragonFly website](https://github.com/t4ngo/dragonfly)
+ James' [blog](http://handsfreecoding.org/) on using DragonFly to
program by voice. David Conway's [series of articles][ED]
introducing voice programming is also good.
+ Mark's [sample editing commands][SE] in Vocola for Win32Pad that
demonstrate combining multiple concise commands into a single
utterance, using line numbers mod 100, moving by punctuation, and
operating on line ranges specified by line number. Mark uses
clever tricks to provide this functionality for an underpowered
editor; likely, some variant of these tricks can be used with
many programming editors.
+ The [NatLink website][NA], the unofficial extension of Dragon
that allows creating voice commands using a kludgy Python
interface. This extension is used by both Vocola and DragonFly
to provide smoother ways of creating commands. A paper,
["Implementation and Acceptance of NatLink, a Python-Based Macro
System for Dragon NaturallySpeaking"][PA], and a PowerPoint,
["NatLink: A Python Macro System for Dragon
NaturallySpeaking"][PP], are available.
+ [Unimacro][UN], a collection of NatLink grammars by the current
maintainer of NatLink
+ The [VoiceCode project][VC] website such as it is and the
academic paper, ["VoiceCode: an Innovative Speech Interface for
Programming-by-Voice"][VP], describing the VoiceCode system
+ The [Mouseless Browsing][MB] extension for Firefox, which labels
HTML links with numbers and allows them to be selected by typing
those numbers; this allows links to be [chosen by voice][MM].
The new [Click by Voice][CV] extension provides equivalent
functionality for Chrome.
+ [Freesr Speech Recognition][FS] and [SpeechStart+][SS],
third-party utilities that among other things label widgets,
windows, and taskbar items and icons for manipulation by provided
voice commands. SpeechStart+ also allows restarting Dragon
NaturallySpeaking by voice when it gets stuck, which is handy for
completely hands-free users.
+ [NaturalPoint's SmartNav 4][SN], a reliable and accurate,
hands-free mouse alternative that allows complete control of a
computer by naturally moving the head.
+ [AutoHotkey][AN], a free, open-source keyboard-macro-creation and
automation program. The AutoHotkey community is a useful source
of expertise on automating Windows applications; AutoHotkey
macros can be invoked from voice macros using `SendSystemKeys` or
`SendInput`.
[ED]: http://explosionduck.com/wp/introduction-to-voice-programming-part-one-dns-natlink/
[SE]: http://vocola.net/unofficial/commands_for_Win32Pad.html
[NA]: http://sourceforge.net/projects/natlink/
[PA]: http://qh.antenna.nl/unimacro/implementationandacceptanceofnatlink.pdf
[PP]: http://www.synapseadaptive.com/joel/natlinktalk.ppt>
[UN]: http://qh.antenna.nl/unimacro/
[VC]: http://sourceforge.net/projects/voicecode/
[VP]: https://dl.acm.org/citation.cfm?id=1125502
[MB]: https://addons.mozilla.org/en-us/firefox/addon/mouseless-browsing/
[CV]: https://chrome.google.com/webstore/detail/click-by-voice/dleiijbbjajmfcaiiiadgjpgfjmfdfen
[MM]: https://www.knowbrainer.com/forums/forum/messageview.cfm?catid=25&threadid=20139
[FS]: http://freesr.org/
[SS]: http://www.pcbyvoice.com/shop/pcbyvoice-speechstart-plus/
[SN]: http://www.naturalpoint.com/smartnav/
[AN]: http://ahkscript.org/
----------------------------------------------------------------------------
5. Disclaimers and licensing
All contributors' employers will no doubt disown any statements
herein. We are not speaking for anyone but ourselves.
Every effort has been made to produce an accurate and useful
document, but the information herein is completely without
warranty. If you find any errors or otherwise wish to contribute,
please contact [Mark](mailto:mdl@alum.mit.edu).
This document may be freely distributed as long as it is not
modified. An [ASCII version][ASCII] is also available.
[ASCII]: http://vocola.net/programming-by-voice-FAQ.txt
----------------------------------------------------------------------------