Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Asciidoc underlined (two line) titles #187

Open
pjanx opened this issue Sep 24, 2018 · 7 comments
Open

Support for Asciidoc underlined (two line) titles #187

pjanx opened this issue Sep 24, 2018 · 7 comments
Milestone

Comments

@pjanx
Copy link
Contributor

pjanx commented Sep 24, 2018

First of all, thank you for taking it upon yourself to create an independent implementation with more of a proper parser. I've already learnt how hard it is and stopped pursuing it because of incompetence.

This isn't mentioned in LIMITATIONS.adoc but the following syntax, which is very valid Asciidoc but deprecated Asciidoctor, is not supported:

Title of document
=================

Heading
-------
Subheading
~~~~~~~~~~
Subsubheading
^^^^^^^^^^^^^
Subsubsubheading
++++++++++++++++

Asciidoc requires the length to be within +/- 2 characters, Asciidoctor allows only +/- 1 character.

@xcoulon
Copy link
Member

xcoulon commented Sep 24, 2018

Thanks for the feedback, @pjanx ;) The plan for now was to support headings using the = prefix character(s), but I guess I could also support this variant in the future, too.

@pjanx
Copy link
Contributor Author

pjanx commented Oct 6, 2018

In the meantime, I have written a preprocessor that hacks this support in, and also handles PEG parse failures. It is intended to be used in Gogs, Gitea and such.

package main

import (
	"bytes"
	"context"
	"encoding/xml"
	"io"
	"io/ioutil"
	"os"
	"strings"
	"unicode"
	"unicode/utf8"

	"github.com/bytesparadise/libasciidoc"
	"github.com/bytesparadise/libasciidoc/pkg/renderer"
)

// isTitle returns the title level if the lines seem to form a title,
// zero otherwise. Input lines may inclide trailing newlines.
func isTitle(line1, line2 []byte) int {
	// This is a very naïve method, we should target graphemes (thus at least
	// NFC normalize the lines first) and account for wide characters.
	diff := utf8.RuneCount(line1) - utf8.RuneCount(line2)
	if len(line2) < 2 || diff < -1 || diff > 1 {
		return 0
	}

	// "Don't be fooled by back-to-back delimited blocks."
	// Still gets fooled by other things, though.
	if bytes.IndexFunc(line1, func(r rune) bool {
		return unicode.IsLetter(r) || unicode.IsNumber(r)
	}) < 0 {
		return 0
	}

	// The underline must be homogenous.
	for _, r := range bytes.TrimRight(line2, "\r\n") {
		if r != line2[0] {
			return 0
		}
	}
	return 1 + strings.IndexByte("=-~^+", line2[0])
}

func writeLine(w *io.PipeWriter, cur, next []byte) []byte {
	if level := isTitle(cur, next); level > 0 {
		w.Write(append(bytes.Repeat([]byte{'='}, level), ' '))
		next = nil
	}
	w.Write(cur)
	return next
}

// ConvertTitles converts AsciiDoc two-line (underlined) titles to single-line.
func ConvertTitles(w *io.PipeWriter, input []byte) {
	var last []byte
	for _, cur := range bytes.SplitAfter(input, []byte{'\n'}) {
		last = writeLine(w, last, cur)
	}
	writeLine(w, last, nil)
}

func main() {
	input, err := ioutil.ReadAll(os.Stdin)
	if err != nil {
		panic(err)
	}

	pr, pw := io.Pipe()
	go func() {
		defer pw.Close()
		ConvertTitles(pw, input)
	}()

	// io.Copy(os.Stdout, pr)
	// return

	_, err = libasciidoc.ConvertToHTML(context.Background(), pr, os.Stdout,
		renderer.IncludeHeaderFooter(true))
	if err == nil {
		return
	}

	// Fallback: output all the text sanitized for direct inclusion.
	os.Stdout.WriteString("<pre>")
	for _, line := range bytes.Split(input, []byte{'\n'}) {
		xml.EscapeText(os.Stdout, line)
		os.Stdout.WriteString("\n")
	}
	os.Stdout.WriteString("</pre>")
}

@xcoulon xcoulon added this to the backlog milestone Oct 13, 2018
@xcoulon
Copy link
Member

xcoulon commented Oct 15, 2018

hello @pjanx. Back on this issue: I'm not sure how to deal with this request in the grammar yet, but out of curiosity and depending on your workflow, could you use that syntax that is already supported by libasciidoc instead ? (I'm referring to the = , == , etc. prefix on the section title).

And good point for the lack of mention in the LIMITATIONS.adoc file, I'll add that, too. (well, until I find a way to resolve it)

@pjanx
Copy link
Contributor Author

pjanx commented Oct 15, 2018

Hi. I've already written that preprocessor above, that works for me. I just wanted to have nice READMEs again, without Ruby or Python on the machine, since I moved my repositories off of GitHub. That has mostly been achieved now, except for the other LIMITATIONS. I enjoy the two line syntax.

@xcoulon
Copy link
Member

xcoulon commented Oct 16, 2018

ok, thanks for your feedback, @pjanx. For now, my main concern is to be able to parse the "subline" that must have the same length as the title (with one or 2 chars of diff). So I'll keep this issue in the backlog for now until I have a good solution, if that's ok for you ;)

@mojavelinux
Copy link

mojavelinux commented Oct 26, 2018

My advice is to not support two-line section titles (setext headings). If/when AsciiDoc gets a spec, this will very likely be dropped. The main reason is that the symbols don't give any indication of the nesting level, so even someone experienced with AsciiDoc like myself can never remember what levels they represent. More important, they conflict with delimited blocks in AsciiDoc, so it makes the language harder / more ambiguous to parse, both for tools and humans.

My advice is to stick with atx headings.

@xcoulon
Copy link
Member

xcoulon commented Oct 26, 2018

thanks for the feedback, @mojavelinux and happy to see you here ;)
Yes, my first concern for not supporting two-line section titles was the parsing, but I also agree with you that the symbol does not easily reflect the section level.

moorereason added a commit to moorereason/libasciidoc that referenced this issue Oct 26, 2018
Also, use literal blocks instead of code blocks in examples.

Closes bytesparadise#187
moorereason added a commit to moorereason/libasciidoc that referenced this issue Oct 27, 2018
Also, use literal blocks instead of code blocks in examples.

Updates bytesparadise#187
xcoulon pushed a commit that referenced this issue Oct 30, 2018
Also, use literal blocks instead of code blocks in examples.

Updates #187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants