Chatscrape

There are quite a few APIs that allow fine-tuning on large language models like GPT-3. You may want to have a go training your own model on WhatsApp chats. Generally, this is a bit awkward since fine tuning typically can only be done on data structured in a prompt-response fashion. This snippet of code helps you do this!

Turn the output

[09/11/2020, 12:14:48] A: Text 1
[09/11/2020, 12:15:04] A: Text 2
[09/11/2020, 12:41:26] B: Text 3
[09/11/2020, 12:41:26] A: Text 4

Into a csv with prompt and completion columns

,prompt,completion
0,Text 1. Text 2, Text 3 
1, Text 4, NaN

Fine tuning can then be performed easily on something like openai's data preparation api.

openai tools fine_tunes.prepare_data -f output.csv

https://beta.openai.com/docs/guides/fine-tuning

Running

Git clone this repo

git clone https://github.com/afiqhatta/chat_scrape.git

Set up a virtual environment, and then run the python script with the following command line arguments.

virtualenv env
pip install -r requirements.txt 
source env/bin/activate 
python wa_parser.py <EXPORTED_TEXT> <PROMPTER> <RESPONDER>

Examples

Export and download the WhatsApp chat of your choice to a txt file. The start of your WhatsApp export probably looks like this, in chat.txt.

[07/11/2020, 11:09:28] Alice Smith: Are you ok?
[07/11/2020, 11:09:32] Bob Jones: Hey there, yes I am!

To parse this, use

python wa_parser.py chat.txt 'Alice Smith' 'Bob Jones'

Deploying this to Google App Run

The original intention of creating this module was to create a service where people can parse chats through an api. Currently, this is run in production on a Google compute engine instance. To deploy this to google cloud, after creating the project, you can run

gcloud app deploy

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.idea		.idea
family_chats		family_chats
fine_tuning		fine_tuning
gmail_api		gmail_api
google_cloud_upload		google_cloud_upload
parsing		parsing
processing		processing
routes		routes
static/js		static/js
templates		templates
tests		tests
.gcloudignore		.gcloudignore
README.md		README.md
app.yaml		app.yaml
display_configs.py		display_configs.py
endpoints_configs.py		endpoints_configs.py
firebase_initialisation.py		firebase_initialisation.py
main_app.py		main_app.py
requirements.txt		requirements.txt
session_config.py		session_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chatscrape

Running

Examples

Deploying this to Google App Run

About

Releases 1

Packages

Languages

casualPhysics/chat_scrape

Folders and files

Latest commit

History

Repository files navigation

Chatscrape

Running

Examples

Deploying this to Google App Run

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages