Poor separation of concerns in fasttext design #2312
Labels
difficulty hard
Hard issue: required deep gensim understanding & high python/cython skills
fasttext
Issues related to the FastText model
The architecture consists of several classes:
The separation of concerns between the classes is poor. For example, the FastTextTrainables neural network knows far too much about the implementation details of FastTextKeyedVectors embeddings. Here is a concrete example (full code here):
The above code is part of the FastTextTrainables, but it's writing to attributes of FastTextKeyedVectors. It knows about what the attributes of FastTextKeyedVectors are, and how they are related.
Ideally, such code should be in the FastTextKeyedVectors class. In practice, this may not be as simple, because there may be code common to both classes there. Identifying such areas (concerns), splitting them, and separating the concerns would improve the fasttext design significantly.
The text was updated successfully, but these errors were encountered: