-
Notifications
You must be signed in to change notification settings - Fork 873
Pull requests: karpathy/minbpe
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
add lexicographic ordering for breaking ties to make the tokenizer deterministic
#90
opened Sep 21, 2024 by
dapopov-st
Loading…
Optimal algorithm for _encode_chunk(): 20% faster encoding, with 0.5% better COMPRESSION
#84
opened Jun 17, 2024 by
Majdoddin
Loading…
Deduplication of text chunks with frequency count, training and encoding 5x speedup
#82
opened Jun 8, 2024 by
Majdoddin
Loading…
calling len(ids) in merge() function only once to increase performance
#76
opened May 14, 2024 by
crpatil1901
Loading…
Update regex.py to correctly parse scripts with combining marks
#71
opened May 5, 2024 by
ajaykg
Loading…
Updated decode() method in GPT4Tokenizer so that it handles special t…
#63
opened Apr 7, 2024 by
Vakarva
Loading…
updated self.vocab initialization and reuse self._build_vocab()
#53
opened Mar 2, 2024 by
muerghq
Loading…
Update lecture.md based on video tutorial content from 08:15 through 28:23
#42
opened Feb 23, 2024 by
astaff
Loading…
Use
pyproject.toml
, pdm
and ruff
for improved reproducibility and cleaner code
#40
opened Feb 22, 2024 by
nizhib
Loading…
ProTip!
Adding no:label will show everything without a label.