Book-based Fictional Character Profiling Workflow •

This repository represents source code for the literature 📚 character personality formation workflow which is 🔥 solely relies on book content only 🔥, described in paper Personality Profiling for Literary Character Dialogue Agents with Human Level Attributes (pre-print) that has been accepted for Long Paper track at LOD-2024.

Update 26/09/2024: The 📹 @ YouTube that presents the paper concepts is out 🥳

Workflow

This repository represents a source code for literature novel book processing workflow implementation.

Task: Studies propose the novel Character Comments Annotation problem, which refers to quotation annotation [paper].

This workflow relies on external text processing components: (1) NER, (2) automatic dialogue annotation. See dependencies section for greater detail.

The formation of datasets of character conversations represent a byproduct of the related data flow. The content of dataset yields of dialogues, with utterances that annotated with speakers.

Personality Profiling Model

We adopt adjective-pair lexicon (FCP-lexicon) as a source for the spectrum-based character profiling model. We provide API for collecting information about literature characters and compose their personalities in a form of output matrices:

Each row of the matrix represent character, while columns are related to their personality traits. There are two type of output personalities (see figure below): (left) individual and (right) inter-dependent / embeddings based on personalities factorization model.

Applications

Updated 04/07/2024: The complete list of applications could be found at https://github.com/nicolay-r/book-persona-retriever/tree/complete-edition

e_pairs -- response generation and response prediction for the given dialogue pairs aka CONV-turns

Limitations

There are following limitations of the proposed system within its present implementation:

NER -- due to the focus of the e_pairs applicatio towards LDC construction, we adopt already pre-annotated speakers with their name variations (Coreference Resolution). If you wish to address on the related limitation, there is a need to provide the related support here.

Datasets

LDC
1. LDC-400

`LDC`

Literature Dialogue Collection (LDC) represent a processed collection of the 13K books from Project Gutenberg. As for the source of the related books, we utilized the following list from the following studies. Due to the license specifics for the Project Gutenberg content, the complete edition of this LDC is prohibited. Therefore, this project shares the downloading scripts as well as series of scripts at e_pairs dir aimed at LDC construction.

This resource could be automatically constructed using the following steps:

Downloading all the necessary books 📚 and resources (Downloading takes: ~3.5 hours ☕)
Executing the scripts from e_pairs directory.

We fine-cleaned dataset of dialogue pairs between 400 most-frequently appeared characters which results in LDC-400 datasets.

`LDR-400`

This dataset if for the Response Prediction problem.

We utilize ParlAI framework for conducting experiments. In order to embed extracted data, we utilize the related data formatter.

Link for ParlAI agents / task: [parlai-agents]

Collection-type	Format	train	valid	test
NO-HLA	ParlAI	Train w/o HLA	Valid w/o HLA	Not Applicable
HLA-spectrum	ParlAI	Train with HLA	Valid with HLA	Five speakers: [1] [2] [3] [4] [5]
Human Evaluation	Text	--	--	Five speakers: [1] [2] [3] [4] [5]

Candidates count: 20

Test Speakers:

Mr. Summerlee The Lost World by Conan Doyle
Sergeant Cuff from The Moonstone by Wilkie Collins
Mr. MacWilliams from Soldiers of Fortune by Richard Harding Davis
Arthur Donnithorne from Adam Bede by George Elio
Lord Duke from Tree Musketeers by Alexandre Dumas Per

NOTE: Please use nicolay-r/parlai_bookchar_task repository on embedding task into ParlAI. All the resources below are automatically downloaded once the task is embedded into ParlAI framework.

Experiments

Dependencies

NER:
- CEB-framework -- pre-annotated and grouped speakers from Project Gutenberg. [paper]
Dialogue utterances extraction from literature novel books:
- gutenberg-dialog -- automatic dialogue annotation algorithm [paper]

Organizations

This work has been accomplished as a part of my Research Fellow position at Newcastle University.

References

You can cite this work as follows:

@proceedings{rusnachenko2024personality,
  title     = {Personality Profiling for Literary Character Dialogue Agents with Human Level Attributes}
  authors   = {Rusnachenko, Nicolay and Liang, Huizhi}
  booktitle = {Proceedings of the 10th International Conference on Machine Learning, Optimization, and Data Science (LOD)},
  year      = {2024},
  month     = sep,
  days      = {22--25},
  address   = {Castiglione della Pescaia (Grosseto), Tuscany, Italy},
  publisher = {Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 407 Commits
api		api
core		core
e_pairs		e_pairs
resources		resources
test		test
LICENSE		LICENSE
README.md		README.md
dependencies.txt		dependencies.txt
download_data.py		download_data.py
parlai_gutenberg_experiments.ipynb		parlai_gutenberg_experiments.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book-based Fictional Character Profiling Workflow •

Contents

Workflow

Personality Profiling Model

Applications

Limitations

Datasets

`LDC`

`LDR-400`

Experiments

Dependencies

Organizations

References

About

Releases 1

Packages

Languages

License

nicolay-r/book-persona-retriever

Folders and files

Latest commit

History

Repository files navigation

Book-based Fictional Character Profiling Workflow •

Contents

Workflow

Personality Profiling Model

Applications

Limitations

Datasets

LDC

LDR-400

Experiments

Dependencies

Organizations

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

`LDC`

`LDR-400`

Packages