This notebook shows how to use torchtext and PyTorch libraries to retrieve a dataset and build a simple RNN model to classify text.
It is based on the TREC-6 dataset, which consists on 5,952 questions written in English, classified in the following categories, depending on their answer:
- HUM: Human
- DESC: Description
- ABBR: Abbreviation
- LOC: Location
- NUM: Number
- ENTY: Entity
Try improving the performance of the model by:
- Adding more complexity (RNN layers, other layers)
- Add regularisation (L1, L2, dropout)
- Make the model a bidirectional RNN
- Use pretrained embeddings such as word2vec or GLOVE. Note that you can use: nn.Embedding.from_pretrained(...)