Using Deep Learning to Generate Complete Log Statements

We present LANCE(Log stAtemeNt reCommEnder), a DL-based approach for supporting the task of log statement generation and injection in the context of Java. LANCE is built on the recently proposed Text-To-Text Transfer Transformer (T5) architecture

How to experiment with LANCE

How to train a new SentencePiece Model

Before training the T5 small, namely the core of LANCE, it is important to also train a new tokenizer (sentencepiece model) to accomodate the expanded vocabulary given by the java programming language. For such, we used the raw pre-training instances(Java corpus) + English sentences from the well known C4 dataset

Pythonic way
```
pip install sentencepiece==0.1.96
import sentencepiece as spm
spm.SentencePieceTrainer.train('--input=all_sp.txt --model_prefix=LOG_SP --vocab_size=32000 --bos_id=-1  --eos_id=1 --unk_id=2 --pad_id=0 --shuffle_input_sentence=true --character_coverage=1.0 --user_defined_symbols=“<LOG_STMT>”') 
```
Under this path we also provide our trained tokenizer: https://github.com/antonio-mastropaolo/LANCE/tree/main/Code
Setup a Google Cloud Storage (GCS) Bucket

To setup a new GCS Bucket for training and fine-tuning a T5 Model, please follow the original guide provided by Google: Here the link: https://cloud.google.com/storage/docs/quickstart-console
Datasets

The datasets for pre-training, fine-tuning, validating and finally testing LANCE can be found at this link: https://drive.google.com/drive/folders/1D12y-CIJTYLxMeSmGQjxEXjTEzQImgaH?usp=sharing
Pre-training/Fine-tuning

To pre-train and then, fine-tune LANCE, please use the following:
- Pre-Training
- Fine-Tuning
Models
Results: 📂
Additional:

Under Miscellaneous, you can find the additional script used for the data analysis and the exact hyper-parameters configuration we employed in the study.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Code		Code
Miscellaneous		Miscellaneous
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Deep Learning to Generate Complete Log Statements

How to experiment with LANCE

How to train a new SentencePiece Model

Setup a Google Cloud Storage (GCS) Bucket

Datasets

Pre-training/Fine-tuning

Models

Results: 📂

Additional:

About

Releases 1

Packages

Contributors 2

Languages

License

antonio-mastropaolo/LANCE

Folders and files

Latest commit

History

Repository files navigation

Using Deep Learning to Generate Complete Log Statements

How to experiment with LANCE

How to train a new SentencePiece Model

Setup a Google Cloud Storage (GCS) Bucket

Datasets

Pre-training/Fine-tuning

Models

Results: 📂

Additional:

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages