Skip to content

Extract text more carefully in mdbook-xgettext #318

@mgeisler

Description

@mgeisler

Right now, we simply split the text on \n\n+, but this leads to a number of problems:

  • We split code blocks into different messages when there are one or more blank lines in the middle of the block.
  • We extract bullet point lists as a single message.

In general, it would be awesome if we could

  • Make the extracted messages independent of the precise formatting of the Markdown text. In particular, a hard-wrapped paragraph should be extracted without the paragraph breaks.
  • Remove formatting such as # from headers and * from bullet points.
  • Extract code blocks as a single message.

So Markdown like

# This is a heading

A _little_
paragraph.

```rust,editable
fn main() {
    println!("Hello world!");
}
```

* First
* Second

should result in these messages

  • This is a heading (heading type is stripped)
  • A _little_ paragraph. (softwrapped lines are unfolded)
  • fn main() {\n println!("Hello world!");\n} (info string is stripped)
  • First (bullet point extracted individually)
  • Second

You could imagine done something nice with links too: foo [bar](https://example.net) baz could be stored as foo [bar] baz. This might be a poor idea, though: it means that the translator cannot change the destination URL.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions