Chinese_Chatbot_Torch

requirements: PyTorch, Jieba

使用QQ的聊天记录作为语料库，需要先把QQ聊天记录提取为txt格式才能使用
每个模块在干什么，在代码注释里写的还是比较清楚的

!!!由于QQ聊天过于碎片化，并且每句都很短，聊天记录最好选取关于短时间内同一个话题的内容（其实手工筛最好了）!!!

程序运行顺序

    格式化聊天记录-->预处理-->生成语料库-->训练模型-->聊天测试;

运行程序

    chatlog-->preprocess-->train;

chatlog:

会根据QQ聊天记录生成一个csv文件，作为之后的材料

preprocess: 会在这个csv的基础上生成

所需要的字典index2voc和voc2index
根据voc2index字典生成的index句子对，句子对文件分为2行，左边是input，右边是target

corpus_gen:

根据输入的句子对生成一个batch，这个batch会作为一次训练的素材

rnn_model:

训练用的神经网络，模型为RNN+Attention，其中RNN作为双向编码还有解码，Attention作为解码的一部分

train:

训练和测试，使用trainBegin函数进行训练，使用chatBegin开始试着聊天在实际进行聊天时使用了贪婪算法，在GreedySearchDecoder当中可以调参数默认为每1000次迭代保存一次模型参数，初始时不加载模型参数(从0开始)，字典默认是加载的以便加快速度(重新生成字典很慢的)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
chat_log.py		chat_log.py
corpus_gen.py		corpus_gen.py
preprocess.py		preprocess.py
rnn_model.py		rnn_model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese_Chatbot_Torch

chatlog:

preprocess: 会在这个csv的基础上生成

corpus_gen:

rnn_model:

train:

About

Releases

Packages

Languages

License

Ninzore/Chinese_Chatbot_Torch

Folders and files

Latest commit

History

Repository files navigation

Chinese_Chatbot_Torch

chatlog:

preprocess: 会在这个csv的基础上生成

corpus_gen:

rnn_model:

train:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages