Our project makes use of already built models from "spaCy" and the Topic Modelling Algorithm BERTopic to create and visualize associations between characters and topics in a book from the Project Gutenberg or from the complete corpus. Inside our App you are able to pick a book and make it display every character of that book, which has been found by the ENT-model. Once displayed, you can pick a character and every topic which is associated with that character is shown to you. The topics were found beforehand by BERTTopic.
Once done with the filtering, you can reset your selection to start anew by clicking on the red resetbutton or just pick a different book. In addtion to the book's characters and the character's topics, a counter next to the columns shows you the total number of books which are loaded into the app, the number of characters found in a book and the number of topics in a book (derived from the character-topic association).
First Clone the Github Repository into the desired directory
git clone https://github.com/schanikk/Gutenberg-Graph-Network.git
Next you need to have Docker Installed, if you don't have Docker already installed check the following Link https://www.docker.com/products/docker-desktop/
Once Docker is installed, change into the Web Application Directory and Start docker-Compose
cd gutemberg-Graph-Network/gutenbergApp
docker-compose up
Next Step is to open the Docker bash of the Django Web App container. We do this by first checking the Container id and then execute the bash inside the desired container by following the steps below. The Container ID we need is the one from the image gutenbergapp-web. In this example it is 5f794608962e.
bash(base) ➜ ~ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5f794608962e gutenbergapp-web "python manage.py ru…" 4 days ago Up 20 seconds 0.0.0.0:8000->8000/tcp gutenbergapp-web-1
14c8f8a5fa52 postgres "docker-entrypoint.s…" 4 days ago Up 21 seconds 5432/tcp gutenbergapp-db-1
docker exec -t -i 5f794608962e bash
First we check if there are any migrations to make by running the following Command. Normally there shouldn't be any to make, but just to be sure we check it anyway.
python manage.py makemigrations
System check identified some issues:
WARNINGS:
?: (staticfiles.W004) The directory '/var/www/static/' in the STATICFILES_DIRS setting does not exist.
No changes detected
The Warning can be ignored but the last line is the important one where it says "No changes detected. However even tho there are no changes detected, there aare still migrations we need to do after running the web application the first time. Because Django manages the changes of the Database and their Models with Migration files which are also stored in the Repository. We Migrate all the changes with the migrate command.
python manage.py migrate
WARNINGS:
?: (staticfiles.W004) The directory '/var/www/static/' in the STATICFILES_DIRS setting does not exist.
Operations to perform:
Apply all migrations: admin, auth, contenttypes, sessions, webDB
Running migrations:
No migrations to apply.
Again the warning can be ignored, as it indicates that the STATICFILE_DIRS doesnt exist, however the static files are caught by Django. When you run this command you will see several changes applied.
Next we need to initalize the database with some data so the Application can visualize the relationship between Characters and Topics. We do this with the built-in loaddata command which uses Fixtures to upload data into the database. More about Fixtures can be found here(https://docs.djangoproject.com/en/4.1/howto/initial-data/). The order of loading the data is important, because there are severeal relations between the tables. The correct order is:
- books
- character
- topics
- sentences
- sent2char
NOTE: The final Fixtures for 81 Books is too big for the GitHub Repository, therefore we had to upload them to an external service (Mafiasi) where you need to download them, extract them from the ZIP and place them into a directory in fixtures/ with the name BigFixtures.
Fixtures Link: https://cloud.mafiasi.de/s/Sgb72AGeaBB3gWM
python manage.py loaddata webDB/fixtures/BigFixtures/bookFixturesSmall.json
python manage.py loaddata webDB/fixtures/BigFixtures/characterFixturesSmall.json
python manage.py loaddata webDB/fixtures/BigFixtures/topicsFixturesSmall.json
python manage.py loaddata webDB/fixtures/BigFixtures/sentencesFixturesSmall.json
python manage.py loaddata webDB/fixtures/BigFixtures/sent2charFixturesSmall.json
Now the Application is ready to use! You can access it via localhost:8000/gutenberg. Have Fun to pick a book and start filtering!!
-
FrontEnd: User Interface, makes API calls
HTML, CSS, Vanilla JS, Bootstrap -
BackEnd:
- Logic: Django --> Provides the API for FronEnd, queries the database on API CALL
- Database: PostgreSQL --> Provides the data once queried
-
Containerization: Package it up into one entity
Docker -
Preprocessing: Applying the ENR and Topic Modeling
- spaCy (ENR)
- BERTTopic (Topic Modeling)
-
Data: Data displayed
Gutenberg Corpus Books
Links | Beschreibung |
---|---|
Python Dokumentation | Python Docs |
Project Gutenberg | Actual Gutenberg Project |
Natural Language Toolkit | Useful data |
The Standardized Project Gutenberg Corpus | Used for Data Extraction |
spaCy | Models for Entity Name Recognition (ENR) |
BERTopic | Model for Topic Modelling |
GitHub Projectboard | Our Projectboard in HitHub |
Beispielprojekt zu Gutenberg | Different implementation of the project (not ours) |