JDerekLomas · JDerekLomas · Nov 15, 2025 · Nov 15, 2025
diff --git a/README.md b/README.md
@@ -1,56 +1,114 @@
 # CodeVibing
 
-A visual gallery of AI-generated React components and experiments. Share and explore creative coding with AI assistance.
+CodeVibing is a hybrid workspace that pairs a visual gallery of AI-generated React components with a research-grade Latin bibliography toolkit. The project combines a shareable Next.js playground for creative coding with a Python pipeline for constructing a master catalogue of Latin works (1450–1900).
 
-## Features
+```mermaid
+flowchart TD
+    subgraph Frontend Gallery
+        A[Next.js App Router]
+        B[Shared UI Components]
+        C[Data Seeds]
+        A --> B
+        A --> C
+    end
 
-- 🎨 Visual gallery of AI-generated projects
-- 💻 Live React playground
-- 🌟 Easy project sharing
-- 📱 Responsive design
-- 🎥 Auto-generated previews
+    subgraph Latin Corpus Toolkit
+        R[Raw Catalogue CSVs]
+        N[Normalization Utilities]
+        M[Master Bibliography Builder]
+        T[Translation Matcher]
+        P[Priority Scorer]
+        O[latin_master_1450_1900.csv]
+        R --> N --> M --> T --> P --> O
+    end
 
-## Getting Started
+    B -->|Showcase| Gallery[Live Gallery Experience]
+    O -->|Insights| Gallery
+```
 
-1. Clone the repository:
-   ```bash
-   git clone https://github.com/JDerekLomas/codevibing.git
-   cd codevibing
-   ```
+## Repository Structure
+
+```
+codevibing/
+├── src/                  # Next.js application source
+├── public/               # Static assets for the gallery
+├── latin_corpus/         # Python toolkit for the Latin master bibliography
+├── notebooks/            # Prototyping notebooks for dataset exploration
+├── package.json          # Frontend dependencies
+└── requirements.txt      # Python dependencies for the toolkit
+```
 
-2. Install dependencies:
+## Frontend Quick Start
+
+1. **Install dependencies**
    ```bash
    npm install
    ```
 
-3. Copy .env.example to .env.local and add your credentials:
+2. **Configure environment variables**
    ```bash
    cp .env.example .env.local
+   # Edit .env.local and add any required API keys
    ```
 
-4. Start the development server:
+3. **Run the development server**
    ```bash
    npm run dev
    ```
 
-Visit [http://localhost:3000](http://localhost:3000) to see the app running.
+   Visit <http://localhost:3000> to explore the gallery.
 
-## Project Structure
+## Latin Corpus Toolkit Overview
 
+The toolkit in `latin_corpus/` assembles catalogue exports, flags digitization and translation coverage, and scores works for follow-up research.
+
+### Prerequisites
+
+```bash
+cd latin_corpus
+python -m venv .venv
+source .venv/bin/activate   # On Windows: .venv\Scripts\Activate.ps1
+pip install -r requirements.txt
 ```
-codevibing/
-├── src/
-│   ├── app/          # Next.js app directory
-│   ├── components/   # Shared components
-│   ├── lib/         # Utilities and shared code
-│   └── data/        # Initial seed data
-└── public/          # Static assets
-```
+
+### Workflow
+
+1. Place catalogue exports (USTC, VD16/17/18, ESTC, etc.) and translation series CSVs in `latin_corpus/data/raw/`.
+2. Run the end-to-end builder:
+   ```bash
+   python -m latin_corpus.main
+   ```
+3. Inspect the generated master table at `latin_corpus/data/processed/latin_master_1450_1900.csv`.
+
+See [latin_corpus/README.md](latin_corpus/README.md) for detailed customization options, column mappings, and troubleshooting tips.
+
+## Publishing Your Own Copy to GitHub
+
+If you started from a local folder and want to push it to a new GitHub repository, follow these steps:
+
+1. Create an empty repository at <https://github.com/new>.
+2. Run the following commands from your project directory (replace the URL with your repo):
+   ```bash
+   git init
+   git remote add origin https://github.com/<your-username>/codevibing.git
+   git add .
+   git commit -m "Initial commit"
+   git branch -M main
+   git push -u origin main
+   ```
+3. Verify the remote:
+   ```bash
+   git remote -v
+   ```
+4. Clone elsewhere when needed:
+   ```bash
+   git clone https://github.com/<your-username>/codevibing.git
+   ```
 
 ## Contributing
 
-We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
+We welcome improvements! Please read [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.
 
 ## License
 
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.
diff --git a/latin_corpus/README.md b/latin_corpus/README.md
@@ -0,0 +1,81 @@
+# Latin Corpus Toolkit
+
+This toolkit assembles disparate catalogue exports into a unified master bibliography of Latin works published between roughly 1450 and 1900. It normalizes metadata, flags digitized editions and modern translations, and assigns a configurable research priority score.
+
+## Pipeline at a Glance
+
+```mermaid
+flowchart LR
+    R[Raw Catalogue CSVs\nUSTC / VD16-18 / ESTC / etc.] --> N[normalize.py\nAuthor & title cleanup]
+    N --> M[merge.py\nBuild master bibliography]
+    M --> T[translation_match.py\nMatch modern translations]
+    T --> P[priority.py\nScore & tag works]
+    P --> O[data/processed/latin_master_1450_1900.csv]
+```
+
+Each stage uses pandas DataFrames and can be customized through configuration dictionaries and helper functions.
+
+## Directory Layout
+
+```
+latin_corpus/
+├── data/
+│   ├── raw/         # Drop catalogue & translation CSV/TSV exports here
+│   └── processed/   # Generated outputs (e.g., latin_master_1450_1900.csv)
+├── latin_corpus/    # Python package with the normalization/merge pipeline
+├── notebooks/       # Optional Jupyter notebooks for exploration
+└── requirements.txt # Toolkit-specific dependencies
+```
+
+## Quick Start
+
+1. **Create a virtual environment and install dependencies**
+   ```bash
+   cd latin_corpus
+   python -m venv .venv
+   source .venv/bin/activate      # On Windows: .venv\Scripts\Activate.ps1
+   pip install -r requirements.txt
+   ```
+
+2. **Stage your source data**
+   * Copy catalogue exports (USTC, VD16/VD17/VD18, ESTC, national catalogues, etc.) into `data/raw/`.
+   * Add translation spreadsheets (Loeb, I Tatti, Brill, or custom lists) to the same folder.
+
+3. **Run the end-to-end build**
+   ```bash
+   python -m latin_corpus.main
+   ```
+   The script prints progress summaries and writes `data/processed/latin_master_1450_1900.csv`.
+
+## Configuring Inputs
+
+* **Column mappings:** The loader functions in `io_utils.py` accept optional dictionaries for renaming columns when catalogue exports use different headings.
+* **Language filtering:** `merge.py` includes a `LANGUAGE_ALLOWED` configuration block—add or remove variants as needed (e.g., `"lat"`, `"Latin"`).
+* **Translation files:** Adjust the `TRANSLATION_SERIES` list near the top of `latin_corpus/main.py` if your filenames differ or you want to add additional translation datasets.
+* **Fuzzy matching:** `translation_match.py` exposes `MATCHING_CONFIG` for enabling/disabling fuzzy title similarity and tuning thresholds.
+
+## Inspecting Results
+
+You can explore the master bibliography interactively using the provided notebook:
+
+```bash
+jupyter notebook notebooks/build_master_example.ipynb
+```
+
+Within the notebook, import and call:
+
+```python
+from latin_corpus.merge import build_master_bibliography
+master_df = build_master_bibliography()
+master_df.head()
+```
+
+## Troubleshooting
+
+* Install pandas and related dependencies if you see a `MissingDependencyError` from `_compat.py`.
+* Verify filenames and encodings for any CSV/TSV that fails to load; the loaders accept both UTF-8 and Latin-1.
+* Delete or move old outputs in `data/processed/` if you want to regenerate the master CSV from scratch.
+
+## Contributing
+
+Pull requests and issue reports are welcome. Please follow the repository-wide [CONTRIBUTING.md](../CONTRIBUTING.md) guidelines when proposing changes.
diff --git a/latin_corpus/data/processed/.gitkeep b/latin_corpus/data/processed/.gitkeep
diff --git a/latin_corpus/data/raw/.gitkeep b/latin_corpus/data/raw/.gitkeep
diff --git a/latin_corpus/latin_corpus/__init__.py b/latin_corpus/latin_corpus/__init__.py
@@ -0,0 +1,32 @@
+"""Utility package for constructing a Latin bibliography master table."""
+
+from __future__ import annotations
+
+from importlib import import_module
+from typing import Any
+
+__all__ = [
+    "add_priority_scores",
+    "add_translation_flags",
+    "build_master_bibliography",
+    "build_translation_index",
+    "run_pipeline",
+]
+
+
+_MODULE_MAP = {
+    "run_pipeline": (".main", "run_pipeline"),
+    "build_master_bibliography": (".merge", "build_master_bibliography"),
+    "add_priority_scores": (".priority", "add_priority_scores"),
+    "add_translation_flags": (".translation_match", "add_translation_flags"),
+    "build_translation_index": (".translation_match", "build_translation_index"),
+}
+
+
+def __getattr__(name: str) -> Any:  # pragma: no cover - dynamic import glue
+    try:
+        module_name, attr = _MODULE_MAP[name]
+    except KeyError as exc:
+        raise AttributeError(f"module 'latin_corpus' has no attribute {name!r}") from exc
+    module = import_module(module_name, package=__name__)
+    return getattr(module, attr)
diff --git a/latin_corpus/latin_corpus/_compat.py b/latin_corpus/latin_corpus/_compat.py
@@ -0,0 +1,33 @@
+"""Compatibility helpers for optional runtime dependencies."""
+
+from __future__ import annotations
+
+from importlib import import_module
+from types import ModuleType
+
+
+class MissingDependencyError(RuntimeError):
+    """Raised when a required optional dependency is unavailable."""
+
+
+def require_pandas() -> ModuleType:
+    """Return the :mod:`pandas` module or raise a helpful error message.
+
+    The toolkit leans heavily on pandas for all tabular operations. When the
+    dependency is not installed, importing modules that rely on pandas results
+    in an opaque ``ModuleNotFoundError``. Centralising the import behind this
+    helper lets us surface an actionable instruction for users instead.
+    """
+
+    try:
+        return import_module("pandas")
+    except ModuleNotFoundError as exc:  # pragma: no cover - import-time guard
+        raise MissingDependencyError(
+            "pandas is required for the latin_corpus toolkit. Install the "
+            "dependencies via 'pip install -r requirements.txt' before running "
+            "the pipeline."
+        ) from exc
+
+
+__all__ = ["MissingDependencyError", "require_pandas"]
+