diff --git a/_posts/2024-09-30-metanorma-error-log.adoc b/_posts/2024-09-30-metanorma-error-log.adoc deleted file mode 100644 index 1e759037..00000000 --- a/_posts/2024-09-30-metanorma-error-log.adoc +++ /dev/null @@ -1,117 +0,0 @@ ---- -layout: post -title: "Metanorma error logs" -date: 2024-09-30 -categories: documentation - -authors: - - name: Nick Nicholas - email: nick.nicholas@ribose.com - social_links: - - https://github.com/opoudjis - -excerpt: >- - Metanorma error logs are captured to file, and can be used to find problems with standards documents. ---- - -Users of Metanorma will be familiar with the flurry of complaints that the overhelpfully pedantic -Metanorma floods the console window with, whenever it compiles a document. Observant users of Metanorma -may even have noticed that there are fewer types of error flooding the console window than there used to be: -Metanorma grammar errors are no longer being shown on screen. - -Metanorma is indeed on the pedantic side when it comes to error reporting, and it is obsessive about -identifying discrepancies with SDOs' house style, that SDOs themselves are not always as concerned about. -The error log can be useful for document authors, and in order to make it useful, Metanorma has been saving -a copy of the error log to disk. Including those Metanorma grammar errors. We describe here how to navigate -the error log for a document. You will also find this functionality documented on -link:/author/topics/output/validation/[Validation]. - -Metanorma logs errors as belonging to four levels of severity, from 0 to 3. Severity 0 errors are fatal errors, -which crash Metanorma: Metanorma determines that it is not possible to compile a document untl that error is removed. -Severity 3 errors are minor, information-only warnings. You must address Severity 0 errors, -to get documents to compile at all, and you should also address Severity 1 errors. - -TIP: Issues that cause -Severity 0 errors are outlined in the documentation; e.g. duplicate anchors, non-existent images, or -malformed index markup. - -Metanorma also classifies errors into categories, and displays errors of the same category together; -that lets you go through the same class of error as a group. Metanorma defines the following classes of error, -which are documented in the link above: - -* Style: these are potential violations of the house style of the SDO, and should be at least looked at. -* Metanorma XML Syntax: these issues can usually be ignored, as the XML syntax check is quite strict, -and they are demoted to Severity 2 -* Relaton: Issues with the reference requested from the Relaton bibliographic database (e.g. the requested -document identifier does not exist) -* Issues with Metanorma Asciidoc markup: -** Anchors -** AsciiDoc Input -** Bibliography: syntax -** Crossreferences -** Document Attributes -** Images -** Include -** Maths -** Requirements -** Table -** Terms - -If you want to exclude errors from the error log from a given severity up, use the document attribute -`:log-filter-severity:`. So `:log-filter-severity: 2` instructs Metanorma to leave out from the log errors -of Severity 2 or 3. - -If you want to exclude errors from the error log for certain error classes, use the document attribute -`:log-filter-category:`, with a comma-delimited list of classes. So -`:log-filter-category: Crossreferences,Document Attributes,Metanorma XML Syntax` instructs Metanorma -to omit from the log errors of the classes Crossreferences, Document Attributes, and Metanorma XML Syntax. - - -The error log for a document `{document-name}.adoc` is generated with the filename `{document-name}.err.html`, -and you can open it in any browser. When you do, the first information you will find is a list of -error classes, with the count of errors in each class, by severity: - -____ -*Style*: Severity 2: **9** errors - -*Metanorma XML Syntax*: Severity 2: **7** errors -____ - -Each error class is hyperlinked to the listing of errors for that class, so that you can quickly navigate to that -part of the error log. - -The listing of errors under each class is styled by Severity: - -* Severity 0 is boldface against a pink background -* Severity 1 is boldface -* Severity 2 is in normal type -* Severity 3 is italicised and grey - -Each error is presented in a table, with enough information for users to make sense of the error reported: - -____ -|=== -| Line | ID | Message | Context | Severity - -| 000064 | *_a172c0b4-4751-941d-b6c5-344768eb7b1b* | Formula +_a172c0b4-4751-941d-b6c5-344768eb7b1b+ has not been cross-referenced within document a| -.... - - dot Theta ne ddot Theta - -.... -| 2 -|=== -____ - -Line:: The line number of the source Asciidoctor where the issue has occurred. This information is only rarely -recoverable from error reporting. - -ID:: The identifier of the location in the Asciidoctor where the issue has occurred. Depending on when the issue -is identified, this could be the clause number, the GUID or anchor of the paragraph or block, or the line number -of the Metanorma XML file. If the error is aligned with an anchor, a hyperlink is given to the corresponding -anchor in the generated HTML file, so that users can see where the error has happened in the generated output. - -Message:: The error message describing the error. - -Context:: Where applicable, two lines of Metanorma XML surrounding the error location; this helps narrow down -the location of the error, if the Line or ID are not enough to do so. diff --git a/_posts/2024-09-30-persistent-error-log.adoc b/_posts/2024-09-30-persistent-error-log.adoc new file mode 100644 index 00000000..9d28e541 --- /dev/null +++ b/_posts/2024-09-30-persistent-error-log.adoc @@ -0,0 +1,220 @@ +--- +layout: post +title: "Persistent error logs for more effective document engineering" +date: 2024-09-30 +categories: documentation + +authors: + - name: Nick Nicholas + email: nick.nicholas@ribose.com + social_links: + - https://github.com/opoudjis + +excerpt: >- + Metanorma now provides persistent error logs for detailed debugging + post-compilation. This post explains how to navigate and utilize these logs + effectively. +--- + +== Introduction + +Metanorma users often encounter numerous error messages in the console log +during document compilation. These messages are sometimes overwhelming, but are +important for maintaining document quality. + +Recently, Metanorma has improved its error logging by saving these logs to disk, +including grammar errors, allowing them to persist for detailed debugging +post-compilation. + +This post explains how to navigate and utilize these error logs effectively. + +Further details on how to use error logs can be found in the +link:/author/topics/output/validation/[Validation] documentation. + +== Error severity levels + +Metanorma categorizes errors into four severity levels: + +Severity 0:: Fatal errors that prevent document compilation. + +Severity 1:: Critical errors that should be addressed. + +Severity 2:: Warnings that can often be ignored. + +Severity 3:: Informational messages. + +TIP: Severity 0 errors, such as duplicate anchors or malformed index markup, +must be resolved for successful document compilation. + +== Error categories + +Errors are grouped into categories to streamline the troubleshooting process. + +The main categories include the following. + +`Style`:: +Potential violations of the standard-defining organization's (SDO) house style. + +`Metanorma XML Syntax`:: +Strict XML syntax checks, usually Severity 2. + +`Relaton`:: +Issues with references from the Relaton bibliographic database. + +`Metanorma Markup`:: +Various issues such as: + +`Anchors`::: +Issues related to the use of anchors within the document, such as duplicate or missing anchors. + +`AsciiDoc` Input::: +Errors originating from the AsciiDoc source input, including syntax errors and invalid constructs. + +`Bibliography` Syntax::: +Problems with the bibliography section, such as incorrect citation formats or missing references. + +`Crossreferences`::: +Errors in cross-referencing within the document, including broken links or incorrect reference targets. + +`Document Attributes`::: +Issues with document attributes, such as missing or incorrectly defined attributes. + +`Images`::: +Problems related to image inclusion, such as missing image files or incorrect image paths. + +`Include`::: +Errors with included files, such as missing files or incorrect paths. + +`Maths`::: +Issues with mathematical expressions, including syntax errors in math markup. + +`Requirements`::: +Problems related to requirements sections, such as missing or incorrectly formatted requirements. + +`Table`::: +Errors in table formatting, including incorrect table syntax or missing table elements. + +`Terms`::: +Issues with the terms section, such as missing definitions or incorrect term formatting. + + +== Filtering errors + +=== General + +Metanorma provides two mechanisms to filter errors in the logs, making it easier +to focus on the most critical issues. + +=== Filtering by severity + +The first mechanism is filtering by severity. By setting the +`:log-filter-severity:` attribute, users can exclude errors below a certain +severity level. This allows users to concentrate on more severe issues that +require immediate attention. + +.Filtering by severity +[example] +==== +The following configuration as a document attribute will omit Severity 2 and 3 +errors from the log. + +[source,asciidoc] +---- +:log-filter-severity: 2 +---- +==== + +=== Filtering by category + +The second mechanism is filtering by error category. This is done using the +`:log-filter-category:` attribute, where users can specify a comma-separated +list of categories to exclude from the log. + +Excluding errors from specified categories helps users to streamline the +troubleshooting process by focusing on relevant error types. + +.Filtering by category +[example] +==== +The following configuration as a document attribute will omit errors from the +"Crossreferences", "Document Attributes", and "Metanorma XML Syntax" categories. + +[source,asciidoc] +---- +:log-filter-category: Crossreferences,Document Attributes,Metanorma XML Syntax +---- +==== + + +== Error log format + +The error log for a document `{document-name}.adoc` is saved as +`{document-name}.err.html` and can be viewed in any browser. + +The log starts with a summary of error classes and their counts by severity. + +.`{document-name}.err.html` HTML presentation of errors +[example] +____ +*Style*: Severity 2: **9** errors + +*Metanorma XML Syntax*: Severity 2: **7** errors +____ + +Each error class links to detailed listings, allowing quick navigation. + +Errors are styled by severity for easy identification: + +Severity 0:: Boldface on a pink background +Severity 1:: Boldface +Severity 2:: Normal type +Severity 3:: Italicized and grey + + +== Error details + +Each error is presented in a table with the following columns: + + +Line:: +The line number in the source AsciiDoc where the issue occurred. + +ID:: +The identifier of the location, which could be a clause number, GUID, anchor, or line number in the Metanorma XML file. Hyperlinks are provided for anchors. + +Message:: +A description of the error. + +Context:: +Two lines of surrounding Metanorma XML to help locate the error. + +The output looks like this. + +.Sample of error log details in `{document-name}.err.html` +____ +|=== +| Line | ID | Message | Context | Severity + +| 000064 | *_a172c0b4-4751-941d-b6c5-344768eb7b1b* | Formula +_a172c0b4-4751-941d-b6c5-344768eb7b1b+ has not been cross-referenced within document a| +.... + + + dot Theta ne ddot Theta + +.... + +| 2 +|=== +____ + + +== Conclusion + +The Metanorma error logs now provide detailed and comprehensive information +about each issue, including the severity level, error category, and specific +line numbers in the source document. This allows users to pinpoint the exact +location of errors and understand their context. + +By leveraging these logs, users can efficiently identify and resolve issues, +ensuring high-quality content that adhering to the required standards and +guidelines. diff --git a/_posts/2024-10-05-max-data-uri-size.adoc b/_posts/2024-10-05-max-data-uri-size.adoc index 51ef858f..368a53d5 100644 --- a/_posts/2024-10-05-max-data-uri-size.adoc +++ b/_posts/2024-10-05-max-data-uri-size.adoc @@ -1,6 +1,6 @@ --- layout: post -title: "Maximum Data URI size" +title: "Building and distributing a single combined Metanorma artifact using Data URIs" date: 2024-10-05 categories: documentation @@ -11,48 +11,180 @@ authors: - https://github.com/opoudjis excerpt: >- - Metanorma images are by default encoded within the generated XML file as Data URIs. In order to prevent processing - problems, they are also by default constrained to 10 MB in size. + This post describes the how Metanorma leverages Data URIs for media files and + document attachments to create a single, unified XML document for seamless + distribution, and when it is necessary to disable Data URI encoding in cases. --- -Images, audio files, and video files are by default encoded in Metanorma as https://en.wikipedia.org/wiki/Data_URI_scheme[inline Data URIs]: -rather than referencing an external file for the image, the documents generated by Metanorma (including the XML file -that it takes as its starting point) represent the image inside of the file, as a (very long) URI. -The same is done (though as a an XML element rather than a URI) with the potentially even longer representation -of file attachments, which Alex Dyuzhev recently wrote about in link:/_posts/2024-08-20-pdf-attachments/[PDF Attachments]. -(Attachments are just as valid for HTML as for PDF output.) - -There is an advantage to this internal representation of files, -for distributing Metanorma documents: if you generate an HTML document, you can -send it somewhere else as a single file, without needing to take care of the separate media files or file attachments it invokes. -After all, you already do so for Word documents and for PDFs. - -There is a disadvantage to doing this, if the media file becomes so big that software starts having trouble -with processig those URIs. Browsers think nothing of a URI 100 KB or 1 MB large; but by the time the URI -needs to represent a video file 100 MB or 1 GB in size, as we have found, bad things start happening. - -To prevent bad things happen, we have put the following safeguards in place: - -* First of all, the default to represent media files as Data URIs can be turned off, by setting the document attribute -`:data-uri-image: false`. If you do so, then the media files in your document are referenced, in the Metanorma XML files and the HTML output, -as links to those external files, rather than bundling them inside the file. In that case, it is the Word and PDF -outputs that need to convert the media files into internally bundled representations. And you will need to take care -to include those media files when you upload the generated HTML file anywhere. - -* You can do the same with file attachments, through `:data-uri-attachments: false`. In that case, again, any file attachments -will be referenced as links, rather than bundling them inside the file, and you will need to handle them the same way you handle -attachemnts. The catch is that, unlike media files, HTML cannot make sense of Data URI encoding for an arbitrary attachment, -so you will have to distribute the HTML file with its attachments as separate files anyway: `:data-uri-attachments: false` -only shortens the XML files, it does not make the HTML any different. (In the case of HTML rendering, any attachments -bundled with the file are exported to a folder called `_{document-name}_attachments`.) - -* In order to prevent users inadvertently generating Data URIs too big for a browser to handle, we set the maximum allowed -Data URI size by default to 14 MB (corresponding to a 10 MB media file). If the Data URI needed to represent a media file is -bigger than that, we now abort execution, with a warning that you need to change file configuration, to make sure you know what -you are doing. You can deal with this warning in one of three ways: -** Set `:data-uri-attachments: false` -** Set `data-uri-maxsize` to a byte size big enough to capture your file. (Remember that Data URI encodings are one third larger -than the binary files they encode). So if you have a 1 GB media file, you will need to set `data-uri-maxsize: 1400000000`, -to prevent aborting. -** Set `data-uri-maxsize: 0`, if you want to throw caution to the winds, and have no maximum Data URI size for your document. -In which case, we admire your courage... +== Introduction + +Metanorma supports two types of output XML formats: a single-combined Metanorma +XML output, where only a single XML file is generated as a compilation artifact, +and a file tree composed of an XML file with additional file links, with the +links pointing to the original included files. The single-combined format is +useful for distribution as it encapsulates all data within one file, while the +file tree format maintains references to external files, which can be beneficial +for managing and updating individual components. + +This post discusses the benefits of using a single combined Metanorma XML output +and the use of Data URIs to represent media files within the document. + + +== Unified Metanorma XML output + +Metanorma supports generation of a single, combined XML output, which differs +from having a tree of files where files link with each other. This is achieved +by using the `:data-uri-image: true` option which is enabled by default. + +Unlike Microsoft Word, which stores all media files inside an archive, Metanorma +does not currently specify a compressed archive format. + +Instead, it uses Data URIs to combine all the data files into a single Metanorma +XML file. + +This approach is beneficial for distribution since there are no moving parts (a +single file) that can result in broken links. + + +== Encoding as Data URI (default) + +When `:data-uri-image: true` is set, Metanorma encodes images, audio files, and +video files as https://en.wikipedia.org/wiki/Data_URI_scheme[inline Data URIs]. + +As a result, the Metanorma Semantic XML output embeds all media files within the +single XML file as Data URIs. + +The advantage to this internal representation of files for distributing +Metanorma documents: + +* When generating an HTML document, the generated HTML can be sent anywhere as a +single file, without needing to take care of the separate media files or file +attachments it invokes. + +* When distributing the authoritative Semantic XML file, you do not need to +worry about the media files being lost or misplaced, as they are all bundled +within the XML file. + +For Word documents and for PDFs, the media files are converted into internally +bundled representations. + +For file attachments, Metanorma uses the same approach, but as an XML element +rather than a URI. This is similar to the representation of file attachments in +HTML, which Alex Dyuzhev recently wrote about in +link:/blog/2024-08-20/pdf-attachments/[PDF Attachments]. + +NOTE: Attachments are just as valid for HTML as for PDF output. + + +== Limitations of using Data URIs + +There are limitations to using Data URIs though: + +* If media files are too large, presentation software may have trouble +processing those URIs. + +* Browsers can routinely handle URIs up to 1 MB without issues, but larger URIs, +such as those representing video files of 100 MB or 1 GB, can cause problems. + +* Performance and stability issues may arise when dealing with excessively large +Data URIs. + + +== Disabling Data URI encoding + +=== General + +There are valid reasons to disable Data URI encoding, such as when media files +are too large. + +Given that Metanorma supports multiple types of output, including PDF, HTML and Word, +it is essential to consider the implications of disabling Data URI encoding. + +When Data URI encoding is disabled, media files are referenced as links to +external files in the Metanorma XML files and the HTML output, rather than +bundling them inside the file. + +In this case, the Word and PDF outputs will need to convert the media files into +internally bundled representations. + + +=== Disabling Data URI encoding for media files + +Encoding media files as Data URIs can be disabled by setting the document +attribute `:data-uri-image: false`. + +This means that all media files in the document are referenced, in the Metanorma +XML files and the HTML output, as links to those external files, rather than +bundling them inside the file. + +The implications of this approach are: + +* XML: When distributing the authoritative Semantic XML file, you will need to +take care to include those media files to ensure the external file links are +maintained. + +* HTML: When distributing the generated HTML file, you will need to take care to +include those media files to ensure the external file links are maintained. + +* Word and PDF: Unaffected by this setting, as the media files +are bundled internally in these representations. + +=== Disabling Data URI encoding for attachments + +Disabling Data URI encoding for file attachments can be achieved by setting the +document attribute `:data-uri-attachments: false`. + +In this case, any file attachments will be referenced as links, rather than +bundling them inside the file, and you will need to handle them the same way you +handle attachments. + +The catch is that, unlike media files, HTML cannot make sense of Data URI +encoding for an arbitrary attachment, so you will have to distribute the HTML +file with its attachments as separate files anyway. + +The implications of this approach are: + +* XML: When distributing the authoritative Semantic XML file, you will need to +take care to include those file attachments to ensure the external file links +are maintained. + +* HTML: When distributing the generated HTML file, you will need to take care to +include those file attachments to ensure the external file links are maintained. +All attachments bundled with the file are exported to a folder called +`_{document-name}_attachments`. + +* Word and PDF: Unaffected by this setting, as the file attachments are bundled +internally in these representations. + + +=== Extending the Data URI size limit + +To prevent users from inadvertently generating Data URIs too big for a browser to +handle, Metanorma sets the maximum allowed Data URI size by default to 14 MB +(corresponding to a 10 MB media file). + +If the Data URI needed to represent a media file is bigger than that, Metanorma +aborts execution with a warning that you need to change file configuration. + +You can deal with this warning in one of three ways: + +* Set `:data-uri-attachments: false` + +* Set `data-uri-maxsize` to a byte size big enough to capture your file. Remember +that Data URI encodings are one third larger than the binary files they encode. +So if you have a 1 GB media file, you will need to set +`data-uri-maxsize: 1400000000`, to prevent aborting. + +* Set `data-uri-maxsize: 0`, if you want to throw caution to the winds, and have +no maximum Data URI size for your document. + +== Conclusion + +Using Data URIs in Metanorma provides a streamlined way to distribute documents +as a single file, avoiding issues with broken links. However, it is essential to +be aware of the limitations, such as performance issues with large files, and +configure the settings appropriately to handle larger files effectively. + +By understanding and utilizing the options to disable Data URI encoding or +extend the Data URI size limit, users can ensure their documents are both +efficient and reliable for distribution.