A module that can convert natural language sentence into binary tree implied sequence, is based on and leveraging the Structural-Probe: https://github.com/john-hewitt/structural-probes
The module is easily integrated into existing LLMs based on Transformers by using the binary tree implied sequence as the input and output of the decoder. The module also supports converting binary tree implied sequences to natural language sentences and letting the output be easily converted back to readable sentences.
- It is recommended to construct a virtual environment for this project. Only python3 is supported.
- Download and install the Structural-Probe project, references: Installing & Getting Started of Structural-Probe.
- Configurate your system environment variables for Python to import the modules.
- Clone this repository.
- Edit the configuration file
probe.yaml
of this project, especially the absolute path ofdepth_params_path
Run demo.py
and sentree_util.py
to get started.
The sentree_util.py
can do the conversions between sentences and binary tree implied sequences based on your option and input.
It will be a handy tool during the process of integrating SenTree to existing systems.
The module is proposed to be used to convert the raw sentence into binary tree implied sequence for decoder. The training processes of autoregressive models stay the same with the original processes of them. The autoregressive process of generating will be altered so that it is no longer a word after a word style, but the latest generated word may be inserted at some position in the being generated sentence that is not completed yet.
The SenTree module should be integrated into autoregressive models as illustrated below:
The binary tree implied sequences output by SenTree corresponding to the input sentences are not constant. These sequences vary depending on the model weights and different auto-encoding models behind the Structural-Probe. Also, this variability is a key feature that can be used to search for proper ways of expressions from both perspectives of decoder and the SenTree system.