Pipeline optimizations #49

danopolan · 2024-05-03T07:11:19Z

With increasing development efforts, we are starting to use all 3000 minutes of CI per month (last month we've used almost 80%), so I would like to optimize the pipelines to save some minutes.

This is a brainstorming issue to collect the ideas. It's not urgent for implementation.

Ideas:

By default build only PDF files for commits
Build all other files (EPUB, HTML, DOCX) if enabled in metadata.yml under separate options like docx-ouptut: false and default to be false.
Build only files where changes were made except main branch. So, e.g. if I do not touch the release notes, it will not be built.
Optimize execution times where possible (e.g. LibreOffice installation)
Employ more checks before LaTeX build to prevent failures (MD syntax, references and links, Unicode chars) or suggest how to run some checks on a local machine.
Prepare a guide on running builds locally to prevent failed CI

The text was updated successfully, but these errors were encountered:

Witiko · 2024-06-19T09:09:39Z

By default build only PDF files for commits

Build all other files (EPUB, HTML, DOCX) if enabled in metadata.yml under separate options like docx-output: false and default to be false.

DOCX should not be a bottleneck, as the conversion to DOCX finishes in a couple seconds as opposed to the other steps, which can take minutes.

We currently don't build EPUB or HTML for ⟨document⟩.tex if a file named ⟨document⟩/NO_HTML exists in the repository. However, this is an opt-out mechanism, which is also quite obscure and unknown to most people except myself. Having an opt-in metadata field seems a better and more visible solution.

Optimize execution times where possible (e.g. LibreOffice installation)

It might make sense to create pre-built Docker images in this repository, which would include LibreOffice and would then be downloaded during CI. This image can also be significantly smaller than the image that we currently use.

Employ more checks before LaTeX build to prevent failures (MD syntax, references and links, Unicode chars) or suggest how to run some checks on a local machine.

Prepare a guide on running builds locally to prevent failed CI

There is a limit to how complex the code into the GitHub Actions YAML file can be before it becomes difficult to maintain. Extracting the CI code into scripts should make this limit much higher and allow us to both 1) perform more advanced checks on the source code, 2) react to values in metadata.yml from the CI (such as docx-output: false), and also 3) run builds locally.

Ad 1) As discussed in https://github.com/istqborg/istqb_shared_documents/issues/65, few tools enable the static analysis of Markdown files. However, I can write scripts that would collect all Markdown documents used in ⟨document⟩.tex, convert them to abstract syntax trees with Pandoc, and then ask questions such as:

If <#section:⟨identifier⟩> or [⟨link text⟩](#section:⟨identifier⟩) appears in a document, is there a corresponding section with attribute #⟨identifier⟩ in any document?

I can then skip the compilation if I find issues with the document.

Witiko · 2024-06-19T09:29:30Z

Build all other files (EPUB, HTML, DOCX) if enabled in metadata.yml under separate options like docx-output: false and default to be false.

While having many metadata fields like docx-output, epub-output, html-output, and line-numbers (from #54) makes it easy to configure the build for authors, it may still make sense to have sensible defaults based on whether the document is under review or released, as discussed in #54 (comment). For example:

If version: release, then set the following defaults:

docx-output: false
epub-output: true
html-output: true
line-numbers: false

Otherwise, set the following defaults:

docx-output: true
epub-output: false
html-output: false
line-numbers: true

As an aside, we may want to add an extra section to the documentation that would describe the supported metadata fields, how they should be used, and how they impact the document. The schema keeps growing and it no longer seems intuitive. In the long-term, we may also want to describe the other types of YAML documents such as language and question definitions.

danopolan · 2024-06-19T09:57:04Z

Regarding 1) and 2)
You are right, that DOCX is not a bottleneck, but it is not needed for regular output. But we can keep it building all the time for now.
Adding some abstraction above output formats and line numbers is cleaner but less intuitive for users. This project is not reflected in ISTQB working processes yet, so I would like to keep full control over the users and add the abstraction later after we decide on detailed processes.

One more thing is that if we could skip building of files, we do not touch (e.g. Body of Knowledge, Accreditation Guidelines, Sample Exam) within the branch. In the the TA, we have split the Syllabus and Sample Exam into two separate branches and PRs, so we need only the syllabus to be built in the syllabus branch and only the exam in the exam branch. But currently, we are building it all, since templates and repos created out of it have it all.

Regarding 3)
Docker image in this repo is a good idea.

Regarding 4) and 5)
Refactoring CI into scripts is a good idea with many benefits.
Static analysis should be discussed in greater scope since we want to add specific checks for ISTQB rules before building. We should agree on a solution that would allow this as well.

Witiko · 2024-06-19T10:39:00Z

One more thing is that if we could skip building of files, we do not touch (e.g. Body of Knowledge, Accreditation Guidelines, Sample Exam) within the branch.

Since building all files still seems useful for the main branch, perhaps we can use a different logic for CI triggered from a pull request and only build documents that have changed in the PR.

Witiko · 2024-07-11T08:53:23Z

Here is a rough outline of my tasks for today and tomorrow:

~~Finish template.py.~~
- ~~Rewrite all steps in .github/workflows/compile.yml with template.py.~~
- ~~Style- and type-check template.py.~~
~~Periodically build Docker image for repository istqborg/istqb_product_base.~~
- ~~This Docker image will be smaller than witiko/markdown and will include LibreOffice.~~
- ~~Blocked by Add file DEPENDS.txt Witiko/markdown#462, finish that first.~~
- ~~Add file DEPENDS.txt with all packages that we depend on in addition to Markdown.~~
- ~~Bake the repository istqborg/istqb_product_base in the Docker image for the convenience of users but update it to the current one in compile.yml.~~
- ~~Only build the Docker image on the main branch.~~
~~Use Docker image for .github/workflows/compile.yml instead of witiko/markdown.~~
- ~~This fixes point 4 from the original post of this ticket:~~
  
  ~~4. Optimize execution times where possible (e.g. LibreOffice installation)~~
~~Document using template.py to compile documents locally.~~
- ~~This fixes point 6 from the original post of this ticket:~~
  
  ~~6. Prepare a guide on running builds locally to prevent failed CI~~
~~Specify whether documents are compiled to PDF, EPUB, and HTML, as discussed in #49 (comment).~~
- ~~This fixes points 1 and 2 from the original post of this ticket:~~
  
  ~~1. By default build only PDF files for commits~~
  ~~2. Build all other files (EPUB, HTML, DOCX) if enabled in metadata.yml under separate options like docx-ou[tp]ut: false and default to be false.~~
~~Determine dependencies between files and whether a TEX document should be compiled based on whether it or any of its dependencies have been changed in a PR:~~
- ~~This fixes point 3 from the original post of this ticket:~~
  
  ~~3. Build only files where changes were made except main branch. So, e.g. if I do not touch the release notes, it will not be built.~~

The work-in-progress implementation is currently in the PR-less branch feat/scripts. The finished implementation should close this ticket with the exception of point 5 from the original post of this ticket:

5. Employ more checks before LaTeX build to prevent failures (MD syntax, references and links, Unicode chars) or suggest how to run some checks on a local machine.

As discussed in #49 (comment) and #49 (comment), we may want to reschedule the point as a separate ticket after we have agreed on the type of additional checks we may want to employ.

danopolan · 2024-07-11T09:23:23Z

This is looking great.

I agree, that point 5 from the original post should be done separately and incrementally based on priorities and needs. So, I confirm that point 5 will not be implemented, but the resolution of this task will prepare for it.

As discussed in <#49 (comment)>.

danopolan assigned Witiko Jun 19, 2024

Witiko mentioned this issue Jun 25, 2024

Hotfix pipeline speed #79

Merged

danopolan mentioned this issue Jul 11, 2024

Add support for importing .md, .yml, and .bib documents from .docx documents #20

Closed

Witiko added a commit that referenced this issue Jul 13, 2024

Determine whether documents should be compiled to PDF, EPUB, and HTML

ab71df8

As discussed in <#49 (comment)>.

Witiko mentioned this issue Jul 13, 2024

Create a single DOCX file for every TeX document and optimize pipeline #84

Merged

Witiko closed this as completed in #84 Jul 14, 2024

Witiko mentioned this issue Jul 14, 2024

Enable a more aggressive caching of conversion results #85

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline optimizations #49

Pipeline optimizations #49

danopolan commented May 3, 2024 •

edited

Loading

Witiko commented Jun 19, 2024 •

edited

Loading

Witiko commented Jun 19, 2024 •

edited

Loading

danopolan commented Jun 19, 2024 •

edited

Loading

Witiko commented Jun 19, 2024

Witiko commented Jul 11, 2024 •

edited

Loading

danopolan commented Jul 11, 2024

Pipeline optimizations #49

Pipeline optimizations #49

Comments

danopolan commented May 3, 2024 • edited Loading

Witiko commented Jun 19, 2024 • edited Loading

Witiko commented Jun 19, 2024 • edited Loading

danopolan commented Jun 19, 2024 • edited Loading

Witiko commented Jun 19, 2024

Witiko commented Jul 11, 2024 • edited Loading

danopolan commented Jul 11, 2024

danopolan commented May 3, 2024 •

edited

Loading

Witiko commented Jun 19, 2024 •

edited

Loading

Witiko commented Jun 19, 2024 •

edited

Loading

danopolan commented Jun 19, 2024 •

edited

Loading

Witiko commented Jul 11, 2024 •

edited

Loading