This is a Place to store the learning notes of Language Models
- TF-IDF Vectorization + KMeans Drawing + Cosine Similarity Filtering ✅
- TF-IDF Vectorization + Minhashing + Locality Sensitive Hashing
- Sentence-BERT Embedding + KMeans + SemDeDup
- Dataset & DataPreprocessing ✅
- IWSLT
- Tokenizer ✅
- BPE Algorithm
- .pkl Saving & Loding
- Model Config ✅
- Standard Transformer Structure
- Training ✅
- Frame
- Bash Script
- Inference ✅
- Dataset & DataPreprocessing ✅
- Bookcorpus
- Tokenizer ✅
- BPE Algorithm
- WordPiece Algorithm ⚫
- MultiProcess Accelerate ⚫
- Model Config ✅
- Base BERT
- Training ⚫
- Frame
- Bash Script
- Logger
- Inference ⚫
- Fine-Tuning ⚫
- Classification
- Dataset & DataPreprocessing
- Bookcorpus
- Tokenizer
- BPE Algorithm
- WordPiece Algorithm
- Unigram Algorithm
- Model Config
- GPT2
- Training
- Frame
- Bash Script
- Logger
- Inference
- Fine-Tuning
- ChatBot
- Summarization