Skip to content

Translate EPUB books using Large Language Models while preserving the original text. The translated content is displayed side-by-side with the original, creating bilingual books perfect for language learning and cross-reference reading.

License

Notifications You must be signed in to change notification settings

oomol-lab/epub-translator

Repository files navigation

EPUB Translator

ci pip install epub-translator pypi epub-translator python versions license

Open in OOMOL Studio

English | 中文

Translate EPUB books using Large Language Models while preserving the original text. The translated content is displayed side-by-side with the original, creating bilingual books perfect for language learning and cross-reference reading.

Translation Effect

Features

  • Bilingual Output: Preserves original text alongside translations for easy comparison
  • LLM-Powered: Leverages large language models for high-quality, context-aware translations
  • Format Preservation: Maintains EPUB structure, styles, images, and formatting
  • Complete Translation: Translates chapter content, table of contents, and metadata
  • Progress Tracking: Monitor translation progress with built-in callbacks
  • Flexible LLM Support: Works with any OpenAI-compatible API endpoint
  • Caching: Built-in caching for progress recovery when translation fails

Installation

pip install epub-translator

Requirements: Python 3.11, 3.12, or 3.13

Quick Start

Using OOMOL Studio (Recommended)

The easiest way to use EPUB Translator is through OOMOL Studio with a visual interface:

Watch the Tutorial

Using Python API

from pathlib import Path
from epub_translator import LLM, translate, language

# Initialize LLM with your API credentials
llm = LLM(
    key="your-api-key",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="o200k_base",
)

# Translate EPUB file using language constants
translate(
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language=language.ENGLISH,
    llm=llm,
)

With Progress Tracking

from tqdm import tqdm

with tqdm(total=100, desc="Translating", unit="%") as pbar:
    last_progress = 0.0

    def on_progress(progress: float):
        nonlocal last_progress
        increment = (progress - last_progress) * 100
        pbar.update(increment)
        last_progress = progress

    translate(
        source_path=Path("source.epub"),
        target_path=Path("translated.epub"),
        target_language="English",
        llm=llm,
        on_progress=on_progress,
    )

API Reference

LLM Class

Initialize the LLM client for translation:

LLM(
    key: str,                          # API key
    url: str,                          # API endpoint URL
    model: str,                        # Model name (e.g., "gpt-4")
    token_encoding: str,               # Token encoding (e.g., "o200k_base")
    cache_path: PathLike | None = None,           # Cache directory path
    timeout: float | None = None,                  # Request timeout in seconds
    top_p: float | tuple[float, float] | None = None,
    temperature: float | tuple[float, float] | None = None,
    retry_times: int = 5,                         # Number of retries on failure
    retry_interval_seconds: float = 6.0,          # Interval between retries
    log_dir_path: PathLike | None = None,         # Log directory path
)

translate Function

Translate an EPUB file:

translate(
    source_path: PathLike | str,       # Source EPUB file path
    target_path: PathLike | str,       # Output EPUB file path
    target_language: str,              # Target language (e.g., "English", "Chinese")
    user_prompt: str | None = None,    # Custom translation instructions
    max_retries: int = 5,              # Maximum retries for failed translations
    max_group_tokens: int = 1200,      # Maximum tokens per translation group
    llm: LLM | None = None,            # Single LLM instance for both translation and filling
    translation_llm: LLM | None = None,  # LLM instance for translation (overrides llm)
    fill_llm: LLM | None = None,       # LLM instance for XML filling (overrides llm)
    on_progress: Callable[[float], None] | None = None,  # Progress callback (0.0-1.0)
    on_fill_failed: Callable[[FillFailedEvent], None] | None = None,  # Error callback
)

Note: Either llm or both translation_llm and fill_llm must be provided. Using separate LLMs allows for task-specific optimization.

Language Constants

EPUB Translator provides predefined language constants for convenience. You can use these constants instead of writing language names as strings:

from epub_translator import language

# Usage example:
translate(
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language=language.ENGLISH,
    llm=llm,
)

# You can also use custom language strings:
translate(
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language="Icelandic",  # For languages not in the constants
    llm=llm,
)

Error Handling with on_fill_failed

Monitor and handle translation errors using the on_fill_failed callback:

from epub_translator import FillFailedEvent

def handle_fill_error(event: FillFailedEvent):
    print(f"Translation error (attempt {event.retried_count}):")
    print(f"  {event.error_message}")
    if event.over_maximum_retries:
        print("  Maximum retries exceeded!")

translate(
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language=language.ENGLISH,
    llm=llm,
    on_fill_failed=handle_fill_error,
)

The FillFailedEvent contains:

  • error_message: str - Description of the error
  • retried_count: int - Current retry attempt number
  • over_maximum_retries: bool - Whether max retries has been exceeded

Dual-LLM Architecture

Use separate LLM instances for translation and XML structure filling with different optimization parameters:

# Create two LLM instances with different temperatures
translation_llm = LLM(
    key="your-api-key",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="o200k_base",
    temperature=0.8,  # Higher temperature for creative translation
)

fill_llm = LLM(
    key="your-api-key",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="o200k_base",
    temperature=0.3,  # Lower temperature for structure preservation
)

translate(
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language=language.ENGLISH,
    translation_llm=translation_llm,
    fill_llm=fill_llm,
)

Configuration Examples

OpenAI

llm = LLM(
    key="sk-...",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="o200k_base",
)

Azure OpenAI

llm = LLM(
    key="your-azure-key",
    url="https://your-resource.openai.azure.com/openai/deployments/your-deployment",
    model="gpt-4",
    token_encoding="o200k_base",
)

Other OpenAI-Compatible Services

Any service with an OpenAI-compatible API can be used:

llm = LLM(
    key="your-api-key",
    url="https://your-service.com/v1",
    model="your-model",
    token_encoding="o200k_base",  # Match your model's encoding
)

Use Cases

  • Language Learning: Read books in their original language with side-by-side translations
  • Academic Research: Access foreign literature with bilingual references
  • Content Localization: Prepare books for international audiences
  • Cross-Cultural Reading: Enjoy literature while understanding cultural nuances

Advanced Features

Custom Translation Prompts

Provide specific translation instructions:

translate(
    source_path=Path("source.epub"),
    target_path=Path("translated.epub"),
    target_language="English",
    llm=llm,
    user_prompt="Use formal language and preserve technical terminology",
)

Caching for Progress Recovery

Enable caching to resume translation progress after failures:

llm = LLM(
    key="your-api-key",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="o200k_base",
    cache_path="./translation_cache",  # Translations are cached here
)

Related Projects

PDF Craft

PDF Craft converts PDF files into EPUB and other formats, with a focus on scanned books. Combine PDF Craft with EPUB Translator to convert and translate scanned PDF books into bilingual EPUB format.

Workflow: Scanned PDF → [PDF Craft] → EPUB → [EPUB Translator] → Bilingual EPUB

For a complete tutorial, watch: Convert scanned PDF books to EPUB format and translate them into bilingual books

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

About

Translate EPUB books using Large Language Models while preserving the original text. The translated content is displayed side-by-side with the original, creating bilingual books perfect for language learning and cross-reference reading.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published