Python framework for artificial text detection: NLP approaches to compare natural text against generated by neural networks.
Project description is put into:
We use poetry
as an enhanced dependency resolver.
make poetry-download
poetry install --no-dev
To create datasets for the further classification, it is necessary to collect them. There are 2 available ways for it:
- Via Data Version Control.
Get in touch with
@msaidov
in order to have the access to the private Google Drive; - Via datasets generation. One dataset with a size of 20,000 samples was process with MT model on V100 GPU for 30 mins;
poetry add "dvc[gdrive]"
Then, run dvc pull
. It will download preprocessed translation datasets
from the Google Drive.
To generate translations before artificial text detection pipeline,
install the detection
module from the cloned repo or PyPi (TODO):
pip install -e .
Then, run generate script:
python detection/data/generate.py --dataset_name='tatoeba' --size=20000 --device='cuda:0'
To run the artificial text detection classifier, execute the pipeline:
python detection/old.py