Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ctcdecode in native client (Fixes #1668) #1679

Merged
merged 14 commits into from
Oct 30, 2018
Merged

Use ctcdecode in native client (Fixes #1668) #1679

merged 14 commits into from
Oct 30, 2018

Conversation

reuben
Copy link
Contributor

@reuben reuben commented Oct 24, 2018

This PR switches the native client code (and clients that use it) to use ctcdecode instead of the TensorFlow decoder. It does not change any of the training code, nor does it remove the TensorFlow decoder. That will be done as part of #1675. With this PR, there's a disconnect between training code and clients, and DeepSpeech.py WER reports for example won't match the results of our clients. On the other hand, it does let us get the benefits of ctcdecode out to users more quickly.

I recommend reviewing commit by commit. They all build on top of each other and the code builds and works at any given commit in this branch. Here's a breakdown:

  1. c34fc5b - Just imports the code and adds a README.mozilla pointing to the version imported
  2. 0002d0f - Adds the new code to the BUILD file and also cleans it up a bit
  3. e72f079 - Makes a few modifications to the Alphabet class needed for using it in ctcdecode
  4. 770d742 - Modifies ctcdecode to use our Alphabet class, and adds support for saving and loading the trie to/from disk
  5. 3cc9b37 - Switches the native client to use ctcdecode. Also adapts generate_trie.cpp and trie_load.cc to work with the new setup.
  6. b29d0ab - The new decoder can build the trie from the language model, so make the trie flag optional in clients.

I'm not asking for review yet because I want to see how this does on the tests.

@reuben reuben self-assigned this Oct 24, 2018
@reuben reuben force-pushed the ctcdecode branch 6 times, most recently from d42530f to ca35e2c Compare October 25, 2018 18:52
@reuben
Copy link
Contributor Author

reuben commented Oct 25, 2018

All tests are green, modulo some infra issues.

native_client/alphabet.h Outdated Show resolved Hide resolved
native_client/client.cc Outdated Show resolved Hide resolved
@@ -52,6 +52,8 @@ tf_cc_shared_object(
linkopts = select({
"//tensorflow:darwin": [],
"//conditions:default": [
"-ldl",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we need it as well on macOS ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine on my machine, and it passes tests. I don't know why :)

native_client/BUILD Outdated Show resolved Hide resolved
Copy link
Collaborator

@lissyx lissyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good, but I'm wondering regarding improving future-proof-ability

@kdavis-mozilla
Copy link
Contributor

@star3858 and @ybg7955 A PR is not the place for such comments. Please in future do not add non-review comments to a PR.

Copy link
Contributor

@kdavis-mozilla kdavis-mozilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few little changes for this PR.

However, for the final PR we should remove some temporary code such as the

label_to_str_.push_back("*");

and data/smoke_test/vocab.trie.ctcdecode pointing to `v0.2.0-prod-ctcdecode~ and the like.

native_client/alphabet.h Outdated Show resolved Hide resolved
native_client/alphabet.h Show resolved Hide resolved
}

private:
size_t size_;
std::unordered_map<unsigned int, std::string> label_to_str_;
unsigned int space_label_;
std::vector<std::string> label_to_str_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why switch from std::unordered_map to std::vector? Just curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctcdecode code uses a vector<string> as the alphabet representation so I initially was just passing Alphabet's vector to it, but eventually I made it use the Alphabet class directly instead. But std::unordered_map is unnecessary here as a vector is sufficient and faster/leaner.

@@ -17,29 +18,29 @@ class Alphabet {
Alphabet(const char *config_file) {
std::ifstream in(config_file, std::ios::in);
unsigned int label = 0;
space_label_ = -2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a strange dependency between Alphabet and ctc_beam_search_decoder, both know about this magic number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where ctc_beam_search_decoder knows about this magic number? It's supposed to not match any real label if there's no space label in the alphabet. Although now I realize that there's actually a subtle bug here, since space_label_ is unsigned, this is assigning a missing space label to UINT_MAX-1, which could break for an alphabet that's exactly UINT_MAX long :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it I don't think this is a big deal as languages without a space label will always use character-based language models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that "languages without a space label will always use character-based language models"? For example Thai has an alphabet but words can also can be written without spaces. I think Javanese is the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -2 constant would be a problem for a language that uses a word-based language model and has an alphabet that is 2^32-2 characters long. AFAIK, the CJK languages are the only ones that could possibly get close to that many characters, and they would all probably use character-level LMs. Unicode defines close to 100,000 CJK ideograms, so even then we still have a lot of margin. I can add a separate "has_space" flag to remove the in-band signaling here, but I don't think it would be a problem anytime soon.

native_client/ctcdecode/scorer.h Outdated Show resolved Hide resolved
@@ -320,55 +320,24 @@ ModelState::infer(const float* aMfcc, unsigned int n_frames, vector<float>& logi
char*
ModelState::decode(vector<float>& logits)
{
const int top_paths = 1;
const int cutoff_top_n = 40;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this magic number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the just the default value used in ctcdecode.

native_client/client.cc Outdated Show resolved Hide resolved
native_client/BUILD Show resolved Hide resolved
@ryanleary
Copy link

Hey -- maintainer of parlance/ctcdecode here. Is there a way that we can structure this so that future improvements either in the Mozilla fork or in the parlance fork can be easily shared?

@reuben
Copy link
Contributor Author

reuben commented Oct 29, 2018

Hi @ryanleary, thanks for your work on ctcdecode!

Would you be interested in merging the support for caching the FST in disk? That's the main change I had to make, the rest is simple adaptations of the function signatures.

I definitely intend to upstream any bug fixes or improvements we make to the decoder.

@ryanleary
Copy link

Sure, I'm happy to add any new features/improvements, as well as expose additional bindings as it makes sense.

@reuben
Copy link
Contributor Author

reuben commented Oct 30, 2018

All comments should be addressed. I decided to add the functionality to build the trie at runtime on the native client only to limit the exposure.

Copy link
Contributor

@kdavis-mozilla kdavis-mozilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -48,6 +50,7 @@ bool ProcessArgs(int argc, char** argv)
{"lm", required_argument, nullptr, 'l'},
{"trie", required_argument, nullptr, 'r'},
{"audio", required_argument, nullptr, 'w'},
{"run_very_slowly_without_trie_I_really_know_what_Im_doing", no_argument, nullptr, 999},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😃

"-Wl,-Bsymbolic",
"-Wl,-Bsymbolic-functions",
"-Wl,-export-dynamic",
"-l:libstdc++.a",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still lacks some trivial factorization ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I did the review commit by commit, and this is fixed by a newer one, so disregard that comment (github flagged it as "outdated")

@@ -45,10 +45,10 @@ Scorer::Scorer(double alpha,

Scorer::~Scorer() {
if (language_model_ != nullptr) {
delete static_cast<lm::base::Model*>(language_model_);
delete language_model_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make sure there's no memory leak?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean by using std::unique_ptr?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example :)

Copy link
Collaborator

@lissyx lissyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if we can fix any memory issue before landing it's perfect, otherwise we can check and fix that later

@reuben reuben merged commit c3860f6 into master Oct 30, 2018
@reuben reuben deleted the ctcdecode branch October 30, 2018 13:49
@lissyx
Copy link
Collaborator

lissyx commented Oct 31, 2018

@ryanleary Hello Ryan, sorry to hijack this PR, but it looks like we have some contributor willing to spend time building stuff for Windows, and currently hitting some roadblocks on ctcdecode side: https://discourse.mozilla.org/t/compiling-for-windows/32939/18

@lock
Copy link

lock bot commented Jan 2, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants