Gutenberg-Graph-Network

About the Project

Our project makes use of already built models from "spaCy" and the Topic Modelling Algorithm BERTopic to create and visualize associations between characters and topics in a book from the Project Gutenberg or from the complete corpus. Inside our App you are able to pick a book and make it display every character of that book, which has been found by the ENT-model. Once displayed, you can pick a character and every topic which is associated with that character is shown to you. The topics were found beforehand by BERTTopic.

Once done with the filtering, you can reset your selection to start anew by clicking on the red resetbutton or just pick a different book. In addtion to the book's characters and the character's topics, a counter next to the columns shows you the total number of books which are loaded into the app, the number of characters found in a book and the number of topics in a book (derived from the character-topic association).

Starting screen

Picked a book only

Picked a book and a character

How To Run The App

First Clone the Github Repository into the desired directory

git clone https://github.com/schanikk/Gutenberg-Graph-Network.git

Next you need to have Docker Installed, if you don't have Docker already installed check the following Link https://www.docker.com/products/docker-desktop/

Once Docker is installed, change into the Web Application Directory and Start docker-Compose

cd gutemberg-Graph-Network/gutenbergApp
docker-compose up

Next Step is to open the Docker bash of the Django Web App container. We do this by first checking the Container id and then execute the bash inside the desired container by following the steps below. The Container ID we need is the one from the image gutenbergapp-web. In this example it is 5f794608962e.

bash(base) ➜  ~ docker ps

CONTAINER ID   IMAGE              COMMAND                  CREATED      STATUS          PORTS                    NAMES
5f794608962e   gutenbergapp-web   "python manage.py ru…"   4 days ago   Up 20 seconds   0.0.0.0:8000->8000/tcp   gutenbergapp-web-1
14c8f8a5fa52   postgres           "docker-entrypoint.s…"   4 days ago   Up 21 seconds   5432/tcp                 gutenbergapp-db-1

docker exec -t -i 5f794608962e bash

First we check if there are any migrations to make by running the following Command. Normally there shouldn't be any to make, but just to be sure we check it anyway.

python manage.py makemigrations

System check identified some issues:

WARNINGS:
?: (staticfiles.W004) The directory '/var/www/static/' in the STATICFILES_DIRS setting does not exist.
No changes detected

The Warning can be ignored but the last line is the important one where it says "No changes detected. However even tho there are no changes detected, there aare still migrations we need to do after running the web application the first time. Because Django manages the changes of the Database and their Models with Migration files which are also stored in the Repository. We Migrate all the changes with the migrate command.

python manage.py migrate

WARNINGS:
?: (staticfiles.W004) The directory '/var/www/static/' in the STATICFILES_DIRS setting does not exist.
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, sessions, webDB
Running migrations:
  No migrations to apply.

Again the warning can be ignored, as it indicates that the STATICFILE_DIRS doesnt exist, however the static files are caught by Django. When you run this command you will see several changes applied.

Next we need to initalize the database with some data so the Application can visualize the relationship between Characters and Topics. We do this with the built-in loaddata command which uses Fixtures to upload data into the database. More about Fixtures can be found here(https://docs.djangoproject.com/en/4.1/howto/initial-data/). The order of loading the data is important, because there are severeal relations between the tables. The correct order is:

books
character
topics
sentences
sent2char

NOTE: The final Fixtures for 81 Books is too big for the GitHub Repository, therefore we had to upload them to an external service (Mafiasi) where you need to download them, extract them from the ZIP and place them into a directory in fixtures/ with the name BigFixtures.

Fixtures Link: https://cloud.mafiasi.de/s/Sgb72AGeaBB3gWM

python manage.py loaddata webDB/fixtures/BigFixtures/bookFixturesSmall.json
python manage.py loaddata webDB/fixtures/BigFixtures/characterFixturesSmall.json
python manage.py loaddata webDB/fixtures/BigFixtures/topicsFixturesSmall.json
python manage.py loaddata webDB/fixtures/BigFixtures/sentencesFixturesSmall.json
python manage.py loaddata webDB/fixtures/BigFixtures/sent2charFixturesSmall.json

Now the Application is ready to use! You can access it via localhost:8000/gutenberg. Have Fun to pick a book and start filtering!!

Technologies/Data

FrontEnd: User Interface, makes API calls
HTML, CSS, Vanilla JS, Bootstrap
BackEnd:
- Logic: Django --> Provides the API for FronEnd, queries the database on API CALL
- Database: PostgreSQL --> Provides the data once queried
Containerization: Package it up into one entity
Docker
Preprocessing: Applying the ENR and Topic Modeling
- spaCy (ENR)
- BERTTopic (Topic Modeling)
Data: Data displayed
Gutenberg Corpus Books

Architecture Overview

Useful links

Links	Beschreibung
Python Dokumentation	Python Docs
Project Gutenberg	Actual Gutenberg Project
Natural Language Toolkit	Useful data
The Standardized Project Gutenberg Corpus	Used for Data Extraction
spaCy	Models for Entity Name Recognition (ENR)
BERTopic	Model for Topic Modelling
GitHub Projectboard	Our Projectboard in HitHub
Beispielprojekt zu Gutenberg	Different implementation of the project (not ours)

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.vscode		.vscode
gutenbergApp		gutenbergApp
gutenbergCorpus		gutenbergCorpus
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gutenberg-Graph-Network

About the Project

How To Run The App

Technologies/Data

Architecture Overview

Useful links

About

Packages

Contributors 4

Languages

schanikk/Gutenberg-Graph-Network

Folders and files

Latest commit

History

Repository files navigation

Gutenberg-Graph-Network

About the Project

How To Run The App

Technologies/Data

Architecture Overview

Useful links

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Contributors 4

Languages

Packages