Trunklucator is a python module for data scientists and ML practitioners for quick creating annotation projects and testing your ideas. It acts like a python's native input() function, but support displaying rich content and advance interaction with the user (using a web browser). Trunklucator lets you easily plug interaction with a human to your model prototype.
from trunklucator import WebUI
with WebUI() as webui: # start http server in background
for item in data:
y = webui.ask(item) #<- wait for user action on web page
print(y)
For full examples see examples/quickstart
directory
Task | Screenshot | Example code |
---|---|---|
binary classification | For images - examples/quickstart/binary_class_image.py For text - examples/quickstart/binary_class_text.py |
|
multiclass classification | examples/quickstart/multi_class_text.py | |
multilabel classification | examples/quickstart/multi_label_text.py | |
Named Entity Recognition (NER) | examples/quickstart/ner_text.py | |
HTML page annotation | examples/quickstart/ner_html.py |
Trunklucator is the best when you need to represent complex data like image, formatted text, video or sound to the user and ask the user to label/annotate this data. After a user's action, you immediately are able to use this data in your pipeline. Trunklucator works well together with active learning (see example https://github.com/Dumbris/pytorch_active_learning/active_learning_basics.py).
pip install trunklucator
You can use environmet variable to change default parameters
- HOST - bind to host (default 127.0.0.1)
- PORT - use port number (default 8086)
- DATA_DIR - directory will be available through HTTP by path /data
- FRONTEND_DIR - directory path to your custom frontend
PORT=8080 python3 main.py
Also, you can use similar parameters in code then instanciate trunklucator.WebUI class.
with WebUI(host='0.0.0.0', port=8080, data_dir='./data', frontend_dir='./myfront')
For instance of WebUI class:
.ask(data, meta(optional))
- by calling this method you will stop the execution of your code until the user action in a web browser..update(data)
- asynchronously publish information to the frontend part.
- clone github repo
- cd to examples/
- run python3 filename.py, open browser on http://localhost:8086
Trunklucator contains two parts: python module which runs a small HTTP server in the background thread and frontend - it could be any javascript single page application that supports simple protocol for fetching task data.
These parts interact with each other using HTTP or WebSocket. You don't need to change the python part it's ready to use abstraction.
You can select which frontend part to use by setting frontend_dir
WebUI init parameter or using environment variable FRONTEND_DIR
You can set path to your custom frontend directory or use predefined names for frontends integrated into python package.
In the current version there are two frontend integrated:
- Default
WebUI(frontend_dir='html_field')
designed like hackable part, you can adjust it for your specific data format. The default implementation is able to load arbitrary HTML text. UI controls can be configured in python code. WebUI(frontend_dir='label_studio')
- advanced frontend with a support a lot of data types. For more information check the official site https://labelstud.io/ and example/quickstart/ner_text.py
To customize default frontend part:
- clone github repo
- Make a copy of trunklucator/frontend/html_field directory. Implementation is simple and doesn't use tools like npm, webpack, etc.
- You can edit it on disk and after refreshing web page, you will see the results. Use FRONTEND_DIR environment variable to setup path to your custom frontend.