Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Book representation #146

Closed
azerupi opened this issue Jun 28, 2016 · 8 comments
Closed

[Discussion] Book representation #146

azerupi opened this issue Jun 28, 2016 · 8 comments
Labels
A-Internal-representation Area: Internal Representation C-enhancement Category: Enhancement or feature request M-Discussion Meta: Discussion S-Experiment Status: Experiment

Comments

@azerupi
Copy link
Contributor

azerupi commented Jun 28, 2016

Currently, the internal representation is very uni-language focused. We have one struct MDBook containing all the information and the chapters for the book.

To support multiple languages we have to store multiple books with different metadata. There should be one default language, the rest are assumed to be translations.

I propose that MDBook holds a hashmap with language codes like "en" or "fr" as keys and Books as values.

pub struct MDBook<'a> {
    books: HashMap<&'a str, Book>,
    // Other fields omitted 
}         

The Book structure would be defined as follows:

pub struct Book {
    metadata: BookMetadata,

    preface: Vec<Chapter>,
    chapters: Vec<Chapter>,
    appendix: Vec<Chapter>,
}

Chapter would be

pub struct Chapter {
    title: String,
    file: path::PathBuf,

    sub_chapters: Vec<Chapter>,
}

And BookMetadata would be

pub struct BookMetadata {
    pub title: String,
    pub description: String,

    pub language: Language,

    authors: Vec<Author>,
    translators: Vec<Author>,
}

And finally Author and Language would be

pub struct Author {
    name: String,
    email: Option<String>,
}

pub struct Language {
    name: String,
    code: String,
}

This is what PR #147 implements. (not merged)

It allows to represent multiple languages, each with different book structures, authors / translators etc.

Open questions

  • Would there be a better way to represent the books?
  • How do we represent the default language?
    Store the key (language code) for the default language in MDBook?
@azerupi azerupi added this to the Support multi-language books milestone Jun 28, 2016
@azerupi azerupi added the M-Discussion Meta: Discussion label Aug 12, 2016
@azerupi azerupi changed the title new book struct [Discussion] Book representation Aug 12, 2016
@gambhiro
Copy link
Contributor

gambhiro commented Nov 2, 2016

This is a good start!

Based on books I produced in the past, I'd recommend to extend this a bit to recognize other common features of a book's structure.

See the structure of the Table of Contents in the EPUB and PDF of these books for example, to get an idea of some more challenging structures. I'd like to be able to reproduce these with mdbook (someday?).

In Book, I'd recommend different key names:

pub struct Book {
    metadata: BookMetadata,

    frontmatter: Vec<Chapter>,
    mainmatter: Vec<Chapter>,
    backmatter: Vec<Chapter>,
}

The frontmatter is typically a handful of chapters (Preface, Introduction, Editor's Note), which are not numbered when rendering.

The backmatter is also often more than one un-numbered chapters (Glossary, Appendix, Copyright). Sometimes there are more than one Appendix chapters, but in that case I think it is enough to number it manually in the title, passing "Appendix A", "Appendix B".

For Chapter I'd recommend some fields:

pub struct Chapter {
    title: String,
    file: path::PathBuf,
    author: Author,
    description: String,
    index: i32,
    class: String,

    sub_chapters: Vec<Chapter>,
}

Chapter should be suitable for rendering:

  • section divisions (pages for Part One, Part Two)
  • regular chapters
  • untitled pages (illustration pages, stand-alone quotes between regular chapters)

author and description are useful when generating a HTML toc page, or when using a page temlate which anticipates putting these in the chapter heading. In an anthology the chapters have different authors.

index is useful to access in the page template, so that you can print "Chapter 19" in the ebook HTML.

class is useful to pass on when you have chapters that look different, and so you need to wrap the chapter's HTML with a CSS class, either <body class="{{ chapter_class }}"> or a wrapper <div>.

Chapter should be able to name and number itself (apart from the title) according to common styles:

"Chapter 19", "Chapter XIX" or "Chapter Nineteen" are what I had to use before. So that you can open a chapter with:

Chapter XIX

The Miskatonic Expedition

I suppose this might be a method on Chapter, but I've put fields to store this setting in BookMetadata.

The BookMetadata is cool, multiple authors already! But still it should pack more punch that normally goes into the copyright page and epub metadata.

BookMetadata should be suitable for rendering:

  • title page (book title, author and publisher's logo and link)
  • copyright page
  • metadata files for ebooks

I know what's below looks overkill at first, but it really helps to anticipate managing the details. Ideally these can be loaded from book.toml.

Say, when one has to update an ISBN number, one really doesn't want to track down manually (or w/ grep) how many files have to be updated, just update book.toml, run mdbook build and be done.

This is just to indicate what information has to be handled in a book. I didn't give much thought to what is pub or not, it seems to me everything could be.

pub struct BookMetadata {
    pub title: String,
    pub subtitle: String,
    pub description: String,
    pub publisher: Publisher,

    pub language: Language,

    pub paperback: Paperback,
    pub ebook: Ebook,

    authors: Vec<Author>,
    translators: Vec<Author>,

    /// Chapter numbering scheme
    number_format: NumberFormat,
    /// Part, Chapter, Section
    section_names: Vec<String>,
}

pub enum NumberFormat {
  /// 19
  Arabic,
  /// XIX
  Roman,
  /// Nineteen
  Word,
}

pub struct Publisher {
  /// name of the publisher
  name: String,
  /// link to the publisher's site
  url: String,
  /// path to publisher's logo image
  logo_src: path::PathBuf,
}

pub struct Paperback {
  /// paperbacks and ebooks have separate isbn numbers
  isbn: String,
  /// Edition line
  edition: String,
  /// date of publication
  published_on: Timespec,
}

pub struct Ebook {
  isbn: String,
  /// ebook need a unique identifier
  uuid: Uuid,
  /// v1.0 or any arbitrary identifier for the human editor
  version: String,
  published_on: Timespec,
  /// "POETRY / European / General", 
  subject: String,
  /// book's or publisher's url
  source: String,
}

Ebook subjects come from the BISAC Subject Headings List.

@azerupi
Copy link
Contributor Author

azerupi commented Nov 2, 2016

Thanks for your post, it's very helpful and contains interesting ideas I would like to explore further! :)

In Book, I'd recommend different key names:

  • preface → frontmatter
  • chapters → mainmatter
  • appendix → backmatter

I like that, it's more generic and corresponds to the latex naming.

author and description are useful when generating a HTML toc page, or when using a page temlate which anticipates putting these in the chapter heading. In an anthology the chapters have different authors.

Having an author per chapter is interesting, but I have a hard time figuring out how / where all this metadata will be provided by the user.

  • I would like to avoid creating "mdBook specific" extensions in the markdown files.
  • The configuration file could hold all this information, but then you are essentially repeating the SUMMARY file...
  • Instead we could allow this extra information to be provided in the SUMMARY file in some way. If we do, does it still make sense to stick to a subset of markdown or use another format?

There is one important design goal I would like to adhere to: mdBook should be as easy to use as possible when you don't require more "advanced" features. By that, I mean that the option to have one author per chapter should not complicate the simple case where a book has only one author.

index

Index would be the field containing the chapter number? How would you handle nested chapters? (e.g. 3.2.4)

Regarding the BookMetadata part, could you expand a little on Paperback and Ebook? What purpose would this metadata serve?

@gambhiro
Copy link
Contributor

gambhiro commented Nov 2, 2016

Absolutely agree with the simple as possible principle. I'd love to see the minimum effort to be:

mdbook build jabberwocky.md

And it would build an HTML page, epub and mobi, and a PDF out of that (if LaTeX is installed).

But then there are the books that have all the detail and effort of a full-on publication. I find that these need a lot of custom input and tweaks for each book, but nonetheless they are finite and possible to anticipate.

Author and other chapter-specific metadata is easiest to provide either in the SUMMARY file, or as a YAML header in the Markdown chapters. People coming from the static-site generators would recognise (or even anticipate) that pattern. Hence I'm recommending YAML for the summary as well in #176.

Paperback and Ebook are for keeping separate ISBN numbers and other publishing info separately for a LaTeX output (paperback publication) and HTML static or EPUB / MOBI output (ebook publication).

So these struct would provide containers for whatever field is useful for these specific targets. Ebooks for example need a unique identifier in the metadata such as a uuid, the standardised subject strings identify their categorisation.

@azerupi
Copy link
Contributor Author

azerupi commented Nov 2, 2016

Paperback and Ebook are for keeping separate ISBN numbers and other publishing info separately for a LaTeX output (paperback publication) and HTML static or EPUB / MOBI output (ebook publication).

So these struct would provide containers for whatever field is useful for these specific targets. Ebooks for example need a unique identifier in the metadata such as a uuid, the standardised subject strings identify their categorisation.

I see! I forgot to write down some of my thoughts about how the renderers (and their settings) should work. I've update #149 to detail this a little more. I think it should clarify why I think Ebook and Paperback should not be added here.

@azerupi azerupi modified the milestones: 0.1.0, Support multi-language books Nov 3, 2016
@gambhiro
Copy link
Contributor

Looking back, it seems this will be an API design concern, and what I was asking really is to be able to store different variables for the ebook and paperback (i.e. LaTeX), and access them in the page templates (maybe that will be through the renderer?).

For an ebook, the renderer would generate new ebook files every time.

I found that with LaTeX this is not practicable. The best was to just generate the LaTeX once and then work on those files directly for the publication, and not call the LaTeX generator again. These files would be committed. So generating the LaTeX is more similar to getting a book template started (creating a file an folder structure for the user to edit).

@Evrey
Copy link

Evrey commented Nov 30, 2016

index is useful to access in the page template, so that you can print "Chapter 19" in the ebook HTML.

How would a simple integer index hanle sub-chapters? Or should this index be relative to the next parent chapter?

Having an author per chapter is interesting, but I have a hard time figuring out how / where all this metadata will be provided by the user.

That's something for the YAML/TOML/Whatever Summary Issue, I think. With such a thing you can easily add metadata to chapters and define rules on how to derive default values, e.g. by inspecting parent chapters.

@gambhiro
Copy link
Contributor

@Evrey I think what I had in mind with the chapter index is that it is the chapter item's index +1 in the Vec<Chapter>, and the chapter - sub - sub-sub ... number can be constructed by taking this attribute from each level.

Anyway, it's a bit speculative until the overall renderer and representation is settled.

@gambhiro gambhiro mentioned this issue Dec 20, 2016
@azerupi azerupi added A-Internal-representation Area: Internal Representation S-Experiment Status: Experiment C-enhancement Category: Enhancement or feature request and removed Type: To Do labels May 16, 2017
@azerupi azerupi removed this from the 0.1.0 milestone May 18, 2017
@Michael-F-Bryan
Copy link
Contributor

I believe a lot of the issues here around metadata is solved when the configuration system was updated to be more flexible (#457), so it's not difficult to add your own [metadata] table to book.toml which holds common book metadata. Each renderer can then access the parsed Config struct as part of the alternate renderers PR (#507).

Regarding the underlying structure used to represent a book, I'd prefer to keep that as backend/language agnostic as possible.For example, it doesn't really make sense to embed ISBN metadata as part of the Book struct when it would only be used by one backend.

To support multiple languages we have to store multiple books with different metadata. There should be one default language, the rest are assumed to be translations.

I propose that MDBook holds a hashmap with language codes like "en" or "fr" as keys and Books as values.

Regarding multi-language support, I think by far the easiest way would be to just have a different book for each language under the same repository (e.g. have a mdbook-en, mdbook-es, mdbook-fr). This is the solution that most people would use anyway, so I don't think it's necessary to make translations and multi-lingual books an integral part of mdbook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Internal-representation Area: Internal Representation C-enhancement Category: Enhancement or feature request M-Discussion Meta: Discussion S-Experiment Status: Experiment
Projects
None yet
Development

No branches or pull requests

4 participants