-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
good first issueGood for newcomersGood for newcomers
Description
Right now, we simply split the text on \n\n+
, but this leads to a number of problems:
- We split code blocks into different messages when there are one or more blank lines in the middle of the block.
- We extract bullet point lists as a single message.
In general, it would be awesome if we could
- Make the extracted messages independent of the precise formatting of the Markdown text. In particular, a hard-wrapped paragraph should be extracted without the paragraph breaks.
- Remove formatting such as
#
from headers and*
from bullet points. - Extract code blocks as a single message.
So Markdown like
# This is a heading
A _little_
paragraph.
```rust,editable
fn main() {
println!("Hello world!");
}
```
* First
* Second
should result in these messages
This is a heading
(heading type is stripped)A _little_ paragraph.
(softwrapped lines are unfolded)fn main() {\n println!("Hello world!");\n}
(info string is stripped)First
(bullet point extracted individually)Second
You could imagine done something nice with links too: foo [bar](https://example.net) baz
could be stored as foo [bar] baz
. This might be a poor idea, though: it means that the translator cannot change the destination URL.
jiyongp
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers