Skip to content
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.

Encoding large single text files is not working #59

Open
0TT0mation opened this issue Aug 23, 2020 · 1 comment
Open

Encoding large single text files is not working #59

0TT0mation opened this issue Aug 23, 2020 · 1 comment

Comments

@0TT0mation
Copy link

0TT0mation commented Aug 23, 2020

I m trying to encode a 1.7g txt file for training purposes. After starting the encode process from cmd I could see in task manager resources being drained but after ~30m everything went back to idle while the console output has not moved from reading files 0%. From what i can tell i have gpu working too with cudart64_101.dll loading.

System spec:
Gtx 970
i5-8400
8G ram+nvme ssd

Pls help cause scrapping this much was hard

Later Edit:
2nd try produced this error eventually

Traceback (most recent call last):
File "encode.py", line 31, in
main()
File "encode.py", line 25, in main
chunks = load_dataset(enc, args.in_text, args.combine, encoding=args.encoding)
File "C:_stash\openAI\gpt-2\src\load_dataset.py", line 35, in load_dataset
tokens = np.stack(enc.encode(raw_text))
File "C:_stash\openAI\gpt-2\src\encoder.py", line 100, in encode
bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))
MemoryError

Later Later Edit:
encoding the folder containing individual text files w/o merging them into a single file worked fine

@amacfie
Copy link

amacfie commented Aug 24, 2020

#43

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants