Text classification tool for Classificationbox.
- Prepare teaching data
- Run Classificationbox
- Teach and test
Create a directory structure that organizes the files into classes, with each folder as the class name:
/teaching-items
/class1
class1example1.txt
class1example2.txt
class1example3.txt
/class2
class2example1.txt
class2example2.txt
class2example3.txt
/class3
class3example1.txt
class3example2.txt
class3example3.txt
The files can be text of any size, one file per example.
In a terminal do:
docker run -p 8080:8080 -e "MB_KEY=$MB_KEY" machinebox/classificationbox
- Get yourself an
MB_KEY
from https://machinebox.io/account
Use the textclass
tool to teach the
textclass -teachratio 0.8 -src ./teaching-items
The tool will post a random 80% (-teachratio 0.8
) of the files to Classificationbox for teaching, and the
remaining items will be used to test the model.
You will be prompted a few times as the tool goes through its various stages. The tool will:
- Create a new model
- Use a percentage of the data to teach the model
- Use the remaining items to validate the model
- Display the results, including the percentage accurary of the model