Encoding large single text files is not working #59

0TT0mation · 2020-08-23T14:40:44Z

I m trying to encode a 1.7g txt file for training purposes. After starting the encode process from cmd I could see in task manager resources being drained but after ~30m everything went back to idle while the console output has not moved from reading files 0%. From what i can tell i have gpu working too with cudart64_101.dll loading.

System spec:
Gtx 970
i5-8400
8G ram+nvme ssd

Pls help cause scrapping this much was hard

Later Edit:
2nd try produced this error eventually

Traceback (most recent call last):
File "encode.py", line 31, in
main()
File "encode.py", line 25, in main
chunks = load_dataset(enc, args.in_text, args.combine, encoding=args.encoding)
File "C:_stash\openAI\gpt-2\src\load_dataset.py", line 35, in load_dataset
tokens = np.stack(enc.encode(raw_text))
File "C:_stash\openAI\gpt-2\src\encoder.py", line 100, in encode
bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))
MemoryError

Later Later Edit:
encoding the folder containing individual text files w/o merging them into a single file worked fine

amacfie · 2020-08-24T14:16:19Z

#43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding large single text files is not working #59

Encoding large single text files is not working #59

0TT0mation commented Aug 23, 2020 •

edited

Loading

amacfie commented Aug 24, 2020

Encoding large single text files is not working #59

Encoding large single text files is not working #59

Comments

0TT0mation commented Aug 23, 2020 • edited Loading

amacfie commented Aug 24, 2020

0TT0mation commented Aug 23, 2020 •

edited

Loading