Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove old CTC decoder (Fixes #1675) #1696

Merged
merged 13 commits into from
Nov 12, 2018
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .compute
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,4 @@ python3 -u DeepSpeech.py \
--display_step 0 \
--validation_step 1 \
--checkpoint_dir "../keep" \
--summary_dir "../keep/summaries" \
--decoder_library_path "../tmp/native_client/libctc_decoder_with_kenlm.so"
--summary_dir "../keep/summaries"
1 change: 0 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
*.binary filter=lfs diff=lfs merge=lfs -crlf
data/lm/trie filter=lfs diff=lfs merge=lfs -crlf
data/lm/vocab.txt filter=lfs diff=lfs merge=lfs -text
data/lm/trie.ctcdecode filter=lfs diff=lfs merge=lfs -text
6 changes: 6 additions & 0 deletions .install
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,10 @@ pip install tensorflow-gpu==1.12.0rc2

python3 util/taskcluster.py --arch gpu --target ../tmp/native_client

# Install ds_ctcdecoder package from TaskCluster
VERSION=$(python -c 'import pkg_resources; print(pkg_resources.safe_version(open("VERSION").read()))')
PYVER=$(python -c 'import sys; print("cp{0}{1}-cp{0}{1}m".format(sys.version_info.major, sys.version_info.minor))')
python3 util/taskcluster.py --arch cpu --target ../tmp --artifact "ds_ctcdecoder-${VERSION}-${PYVER}-manylinux1_x86_64.whl"
reuben marked this conversation as resolved.
Show resolved Hide resolved
pip install ../tmp/ds_ctcdecoder-*.whl

mkdir -p ../keep/summaries
1,393 changes: 132 additions & 1,261 deletions DeepSpeech.py

Large diffs are not rendered by default.

9 changes: 4 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -165,9 +165,6 @@ RUN ./configure
# passing LD_LIBRARY_PATH is required cause Bazel doesn't pickup it from environment


# Build LM Prefix Decoder, CPU only - no need for CUDA flag
RUN bazel build -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx //native_client:libctc_decoder_with_kenlm.so --verbose_failures --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}

# Build DeepSpeech
RUN bazel build --config=monolithic --config=cuda -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie --verbose_failures --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}

Expand All @@ -184,8 +181,7 @@ RUN bazel build --config=monolithic --config=cuda -c opt --copt=-O3 --copt="-D_G
# RUN pip install /tmp/tensorflow_pkg/*.whl

# Copy built libs to /DeepSpeech/native_client
RUN cp /tensorflow/bazel-bin/native_client/libctc_decoder_with_kenlm.so /DeepSpeech/native_client/ \
&& cp /tensorflow/bazel-bin/native_client/generate_trie /DeepSpeech/native_client/ \
RUN cp /tensorflow/bazel-bin/native_client/generate_trie /DeepSpeech/native_client/ \
&& cp /tensorflow/bazel-bin/native_client/libdeepspeech.so /DeepSpeech/native_client/

# Install TensorFlow
Expand All @@ -200,6 +196,9 @@ RUN make deepspeech
WORKDIR /DeepSpeech/native_client/python
RUN make bindings
RUN pip install dist/deepspeech*
WORKDIR /DeepSpeech/native_client/ctcdecode
RUN make
RUN pip install dist/*.whl


# << END Build and bind
Expand Down
1 change: 0 additions & 1 deletion bin/run-tc-ldc93s1_checkpoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ python -u DeepSpeech.py --noshow_progressbar \
--n_hidden 494 --epoch -1 --random_seed 4567 --default_stddev 0.046875 \
--max_to_keep 1 --checkpoint_dir '/tmp/ckpt' \
--learning_rate 0.001 --dropout_rate 0.05 \
--decoder_library_path '/tmp/ds/libctc_decoder_with_kenlm.so' \
--lm_binary_path 'data/smoke_test/vocab.pruned.lm' \
--lm_trie_path 'data/smoke_test/vocab.trie' | tee /tmp/resume.log

Expand Down
1 change: 0 additions & 1 deletion bin/run-tc-ldc93s1_new.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,5 @@ python -u DeepSpeech.py \
--default_stddev 0.046875 --max_to_keep 1 \
--checkpoint_dir '/tmp/ckpt' \
--learning_rate 0.001 --dropout_rate 0.05 --export_dir '/tmp/train' \
--decoder_library_path '/tmp/ds/libctc_decoder_with_kenlm.so' \
--lm_binary_path 'data/smoke_test/vocab.pruned.lm' \
--lm_trie_path 'data/smoke_test/vocab.trie' \
2 changes: 0 additions & 2 deletions bin/run-tc-ldc93s1_singleshotinference.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ python -u DeepSpeech.py \
--n_hidden 494 --epoch 1 --random_seed 4567 --default_stddev 0.046875 \
--max_to_keep 1 --checkpoint_dir '/tmp/ckpt' --checkpoint_secs 0 \
--learning_rate 0.001 --dropout_rate 0.05 \
--decoder_library_path '/tmp/ds/libctc_decoder_with_kenlm.so' \
--lm_binary_path 'data/smoke_test/vocab.pruned.lm' \
--lm_trie_path 'data/smoke_test/vocab.trie'

Expand All @@ -28,7 +27,6 @@ python -u DeepSpeech.py \
--n_hidden 494 --epoch 1 --random_seed 4567 --default_stddev 0.046875 \
--max_to_keep 1 --checkpoint_dir '/tmp/ckpt' --checkpoint_secs 0 \
--learning_rate 0.001 --dropout_rate 0.05 \
--decoder_library_path '/tmp/ds/libctc_decoder_with_kenlm.so' \
--lm_binary_path 'data/smoke_test/vocab.pruned.lm' \
--lm_trie_path 'data/smoke_test/vocab.trie' \
--one_shot_infer 'data/smoke_test/LDC93S1.wav'
1 change: 0 additions & 1 deletion bin/run-tc-ldc93s1_tflite.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ python -u DeepSpeech.py \
--n_hidden 494 \
--checkpoint_dir '/tmp/ckpt' \
--export_dir '/tmp/train' \
--decoder_library_path '/tmp/ds/libctc_decoder_with_kenlm.so' \
--lm_binary_path 'data/smoke_test/vocab.pruned.lm' \
--lm_trie_path 'data/smoke_test/vocab.trie' \
--notrain --notest \
Expand Down
4 changes: 2 additions & 2 deletions data/lm/trie
Git LFS file not shown
3 changes: 0 additions & 3 deletions data/lm/trie.ctcdecode

This file was deleted.

3 changes: 0 additions & 3 deletions data/lm/vocab.txt

This file was deleted.

Binary file modified data/smoke_test/vocab.trie
Binary file not shown.
Binary file removed data/smoke_test/vocab.trie.ctcdecode
Binary file not shown.
70 changes: 43 additions & 27 deletions evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,10 @@
from attrdict import AttrDict
from collections import namedtuple
from ds_ctcdecoder import ctc_beam_search_decoder_batch, Scorer
from DeepSpeech import initialize_globals, create_flags, log_debug, log_info, log_warn, log_error, create_inference_graph
from multiprocessing import Pool
from util.flags import create_flags
from util.coordinator import C, initialize_globals
reuben marked this conversation as resolved.
Show resolved Hide resolved
from util.logging import log_debug, log_info, log_warn, log_error
from multiprocessing import Pool, cpu_count
from six.moves import zip, range
from util.audio import audiofile_to_input_vector
from util.text import Alphabet, ctc_label_dense_to_sparse, wer, levenshtein
Expand Down Expand Up @@ -86,31 +88,11 @@ def calculate_report(labels, decodings, distances, losses):
return samples_wer, samples


def main(_):
initialize_globals()

if not FLAGS.test_files:
log_error('You need to specify what files to use for evaluation via '
'the --test_files flag.')
exit(1)

global alphabet
alphabet = Alphabet(FLAGS.alphabet_config_path)

scorer = Scorer(FLAGS.lm_weight, FLAGS.valid_word_count_weight,
def evaluate(test_data, inference_graph, alphabet):
scorer = Scorer(FLAGS.lm_alpha, FLAGS.lm_beta,
FLAGS.lm_binary_path, FLAGS.lm_trie_path,
alphabet)
C.alphabet)

# sort examples by length, improves packing of batches and timesteps
test_data = preprocess(
FLAGS.test_files.split(','),
FLAGS.test_batch_size,
alphabet=alphabet,
numcep=N_FEATURES,
numcontext=N_CONTEXT,
hdf5_cache_path=FLAGS.hdf5_test_set).sort_values(
by="features_len",
ascending=False)

def create_windows(features):
num_strides = len(features) - (N_CONTEXT * 2)
Expand All @@ -130,7 +112,7 @@ def create_windows(features):
test_data['features'] = test_data['features'].apply(create_windows)

with tf.Session() as session:
inputs, outputs, layers = create_inference_graph(batch_size=FLAGS.test_batch_size, n_steps=-1)
inputs, outputs, layers = inference_graph

# Transpose to batch major for decoder
transposed = tf.transpose(outputs['outputs'], [1, 0, 2])
Expand Down Expand Up @@ -192,7 +174,10 @@ def create_windows(features):
widget=progressbar.AdaptiveETA)

# Get number of accessible CPU cores for this process
num_processes = len(os.sched_getaffinity(0))
try:
num_processes = cpu_count()
except:
num_processes = 1

# Second pass, decode logits and compute WER and edit distance metrics
for logits, batch in bar(zip(logitses, split_data(test_data, FLAGS.test_batch_size))):
Expand Down Expand Up @@ -221,7 +206,38 @@ def create_windows(features):
print(' - res: "%s"' % sample.res)
print('-' * 80)

return samples


def main(_):
initialize_globals()

if not FLAGS.test_files:
log_error('You need to specify what files to use for evaluation via '
'the --test_files flag.')
exit(1)

global alphabet
alphabet = Alphabet(FLAGS.alphabet_config_path)

# sort examples by length, improves packing of batches and timesteps
test_data = preprocess(
FLAGS.test_files.split(','),
FLAGS.test_batch_size,
alphabet=alphabet,
numcep=N_FEATURES,
numcontext=N_CONTEXT,
hdf5_cache_path=FLAGS.hdf5_test_set).sort_values(
by="features_len",
ascending=False)

from DeepSpeech import create_inference_graph
reuben marked this conversation as resolved.
Show resolved Hide resolved
graph = create_inference_graph(batch_size=FLAGS.test_batch_size, n_steps=-1)

samples = evaluate(test_data, graph, alphabet)

if FLAGS.test_output_file:
# Save decoded tuples as JSON, converting NumPy floats to Python floats
json.dump(samples, open(FLAGS.test_output_file, 'w'), default=lambda x: float(x))


Expand Down
21 changes: 1 addition & 20 deletions native_client/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,10 @@ genrule(

KENLM_SOURCES = glob(["kenlm/lm/*.cc", "kenlm/util/*.cc", "kenlm/util/double-conversion/*.cc",
"kenlm/lm/*.hh", "kenlm/util/*.hh", "kenlm/util/double-conversion/*.h"],
exclude = ["kenlm/*/*test.cc", "kenlm/*/*main.cc"]) + glob(["boost_locale/**/*.hpp"])
exclude = ["kenlm/*/*test.cc", "kenlm/*/*main.cc"])

KENLM_INCLUDES = [
"kenlm",
"boost_locale"
]

DECODER_SOURCES = glob([
Expand Down Expand Up @@ -102,24 +101,6 @@ tf_cc_shared_object(
defines = ["KENLM_MAX_ORDER=6"],
)

tf_cc_shared_object(
name = "libctc_decoder_with_kenlm.so",
srcs = [
"beam_search.cc",
"beam_search.h",
"alphabet.h",
"trie_node.h"
] +
KENLM_SOURCES,
includes = KENLM_INCLUDES,
copts = ["-std=c++11"],
defines = ["KENLM_MAX_ORDER=6"],
deps = ["//tensorflow/core:framework_headers_lib",
"//tensorflow/core/util/ctc",
"//third_party/eigen3",
],
)

cc_binary(
name = "generate_trie",
srcs = [
Expand Down
5 changes: 1 addition & 4 deletions native_client/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@ This will download and extract `native_client.tar.xz` which includes the deepspe

If you want the CUDA capable version of the binaries, use `--arch gpu`. Note that for now we don't publish CUDA-capable macOS binaries.

If you're looking to train a model, you now have a `libctc_decoder_with_kenlm.so` file that you can pass to the `--decoder_library_path` parameter of `DeepSpeech.py`.

## Required Dependencies

Running inference might require some runtime dependencies to be already installed on your system. Those should be the same, whatever the bindings you are using:
Expand Down Expand Up @@ -77,10 +75,9 @@ Before building the DeepSpeech client libraries, you will need to prepare your e
Preferably, checkout the version of tensorflow which is currently supported by DeepSpeech (see requirements.txt), and use the bazel version recommended by TensorFlow for that version.
Then, follow the [instructions](https://www.tensorflow.org/install/install_sources) on the TensorFlow site for your platform, up to the end of 'Configure the installation'.

After that, you can build the Tensorflow and DeepSpeech libraries using the following commands. Please note that the flags for `libctc_decoder_with_kenlm.so` differs a little bit.
After that, you can build the Tensorflow and DeepSpeech libraries using the following command.

```
bazel build -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" //native_client:libctc_decoder_with_kenlm.so
bazel build --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie
```

Expand Down
Loading