Extract text more carefully in `mdbook-xgettext`

Right now, we simply split the text on `\n\n+`, but this leads to a number of problems:

* We split code blocks into different messages when there are one or more blank lines in the middle of the block.
* We extract bullet point lists as a single message.

In general, it would be awesome if we could

* Make the extracted messages independent of the precise formatting of the Markdown text. In particular, a hard-wrapped paragraph should be extracted without the paragraph breaks.
* Remove formatting such as `#` from headers and `*` from bullet points.
* Extract code blocks as a single message.

So Markdown like

~~~markdown
# This is a heading

A _little_
paragraph.

```rust,editable
fn main() {
    println!("Hello world!");
}
```

* First
* Second
~~~

should result in these messages

* `This is a heading` (heading type is stripped)
* `A _little_ paragraph.` (softwrapped lines are unfolded)
* `fn main() {\n    println!("Hello world!");\n}` (info string is stripped)
* `First` (bullet point extracted individually)
* `Second`

You could imagine done something nice with links too: `foo [bar](https://example.net) baz` could be stored as `foo [bar] baz`. This might be a poor idea, though: it means that the translator cannot change the destination URL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extract text more carefully in `mdbook-xgettext` #318

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extract text more carefully in mdbook-xgettext #318

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Extract text more carefully in `mdbook-xgettext` #318