German Poetry Analysis with Deep Learning

A comprehensive Natural Language Processing (NLP) project for analyzing and classifying German poetry using deep learning techniques. This project focuses on predicting the century of creation for German poems through text analysis and neural network architectures.

🎯 Project Overview

This project implements and compares different deep learning approaches for German poetry classification, specifically designed to predict the historical period (century) when poems were written. It combines traditional NLP techniques with modern deep learning architectures to analyze the evolution of German literary styles.

📊 Dataset

The project uses a curated dataset of German poems (data/de_poems.parquet) containing:

Title: Poem titles
Text: Full poem content
Author: Poet information
Creation: Year of creation (converted to centuries for classification)

🏗️ Architecture

The project implements and compares two main neural network architectures:

1. Feedforward Neural Network (`models/feedforward_nn/`)

Multi-layer perceptron (MLP) architecture
Word2Vec embeddings for text representation
Dense layers for classification

2. Recurrent Neural Network (`models/recurrent_nn/`)

Sequential processing of text data
LSTM/GRU-based architecture
Enhanced temporal pattern recognition

🛠️ Technology Stack

Core Dependencies

Deep Learning: PyTorch ≥1.10.0
NLP Processing: spaCy ≥3.2.0, NLTK ≥3.6.0, Gensim ≥4.1.2
Data Science: Pandas ≥1.3.0, NumPy ≥1.20.0, Scikit-learn ≥1.0.0
Visualization: Matplotlib ≥3.5.0, Seaborn ≥0.11.0

Advanced Features

Hyperparameter Optimization: Optuna ≥2.10.0
Experiment Tracking: MLflow ≥1.23.0

📁 Project Structure

Project-NLP/
├── data/
│   └── de_poems.parquet         # German poetry dataset
├── models/
│   ├── feedforward_nn/
│   │   └── w2v.ipynb            # MLP implementation
│   └── recurrent_nn/
│       └── w2v.ipynb            # RNN implementation
├── README.md                     
└── requirements.txt             # Python dependencies

🚀 Getting Started

Prerequisites

Python 3.8+
CUDA-compatible GPU (recommended for training)

Installation

Clone the repository

git clone <repository-url>
cd Project-NLP

Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Download German language model for spaCy

python -m spacy download de_core_news_sm

Usage

Data Preprocessing
- The notebooks automatically handle text tokenization using spaCy
- German stop words are removed and lemmatization is applied
Word2Vec Training
- Custom Word2Vec models are trained on the poetry corpus
- Vector dimensions: 100-500
- Window size: 5-20
Model Training
- Open the respective Jupyter notebooks in models/
- Follow the cell-by-cell execution for training and evaluation

📈 Performance Metrics

The models are evaluated using comprehensive metrics:

Accuracy: Overall classification performance
Precision/Recall: Per-class performance
F1-Score: Balanced measure of precision and recall
ROC Curves: Model discrimination ability
Confusion Matrix: Detailed error analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

German Poetry Analysis with Deep Learning

🎯 Project Overview

📊 Dataset

🏗️ Architecture

1. Feedforward Neural Network (`models/feedforward_nn/`)

2. Recurrent Neural Network (`models/recurrent_nn/`)

🛠️ Technology Stack

Core Dependencies

Advanced Features

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

Usage

📈 Performance Metrics

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
models		models
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

ErwinGoneMad/Project-NLP

Folders and files

Latest commit

History

Repository files navigation

German Poetry Analysis with Deep Learning

🎯 Project Overview

📊 Dataset

🏗️ Architecture

1. Feedforward Neural Network (models/feedforward_nn/)

2. Recurrent Neural Network (models/recurrent_nn/)

🛠️ Technology Stack

Core Dependencies

Advanced Features

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

Usage

📈 Performance Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Feedforward Neural Network (`models/feedforward_nn/`)

2. Recurrent Neural Network (`models/recurrent_nn/`)

Packages