Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline optimizations #49

Closed
danopolan opened this issue May 3, 2024 · 6 comments · Fixed by #84
Closed

Pipeline optimizations #49

danopolan opened this issue May 3, 2024 · 6 comments · Fixed by #84
Assignees

Comments

@danopolan
Copy link
Contributor

danopolan commented May 3, 2024

With increasing development efforts, we are starting to use all 3000 minutes of CI per month (last month we've used almost 80%), so I would like to optimize the pipelines to save some minutes.

This is a brainstorming issue to collect the ideas. It's not urgent for implementation.

Ideas:

  1. By default build only PDF files for commits
  2. Build all other files (EPUB, HTML, DOCX) if enabled in metadata.yml under separate options like docx-ouptut: false and default to be false.
  3. Build only files where changes were made except main branch. So, e.g. if I do not touch the release notes, it will not be built.
  4. Optimize execution times where possible (e.g. LibreOffice installation)
  5. Employ more checks before LaTeX build to prevent failures (MD syntax, references and links, Unicode chars) or suggest how to run some checks on a local machine.
  6. Prepare a guide on running builds locally to prevent failed CI
@Witiko
Copy link
Contributor

Witiko commented Jun 19, 2024

  • By default build only PDF files for commits
  • Build all other files (EPUB, HTML, DOCX) if enabled in metadata.yml under separate options like docx-output: false and default to be false.

DOCX should not be a bottleneck, as the conversion to DOCX finishes in a couple seconds as opposed to the other steps, which can take minutes.

We currently don't build EPUB or HTML for ⟨document⟩.tex if a file named ⟨document⟩/NO_HTML exists in the repository. However, this is an opt-out mechanism, which is also quite obscure and unknown to most people except myself. Having an opt-in metadata field seems a better and more visible solution.

  • Optimize execution times where possible (e.g. LibreOffice installation)

It might make sense to create pre-built Docker images in this repository, which would include LibreOffice and would then be downloaded during CI. This image can also be significantly smaller than the image that we currently use.

  • Employ more checks before LaTeX build to prevent failures (MD syntax, references and links, Unicode chars) or suggest how to run some checks on a local machine.
  • Prepare a guide on running builds locally to prevent failed CI

There is a limit to how complex the code into the GitHub Actions YAML file can be before it becomes difficult to maintain. Extracting the CI code into scripts should make this limit much higher and allow us to both 1) perform more advanced checks on the source code, 2) react to values in metadata.yml from the CI (such as docx-output: false), and also 3) run builds locally.

Ad 1) As discussed in https://github.com/istqborg/istqb_shared_documents/issues/65, few tools enable the static analysis of Markdown files. However, I can write scripts that would collect all Markdown documents used in ⟨document.tex, convert them to abstract syntax trees with Pandoc, and then ask questions such as:

If <#section:⟨identifier⟩> or [⟨link text⟩](#section:⟨identifier⟩) appears in a document, is there a corresponding section with attribute #⟨identifier⟩ in any document?

I can then skip the compilation if I find issues with the document.

@Witiko
Copy link
Contributor

Witiko commented Jun 19, 2024

Build all other files (EPUB, HTML, DOCX) if enabled in metadata.yml under separate options like docx-output: false and default to be false.

While having many metadata fields like docx-output, epub-output, html-output, and line-numbers (from #54) makes it easy to configure the build for authors, it may still make sense to have sensible defaults based on whether the document is under review or released, as discussed in #54 (comment). For example:

  • If version: release, then set the following defaults:
    docx-output: false
    epub-output: true
    html-output: true
    line-numbers: false
  • Otherwise, set the following defaults:
    docx-output: true
    epub-output: false
    html-output: false
    line-numbers: true

As an aside, we may want to add an extra section to the documentation that would describe the supported metadata fields, how they should be used, and how they impact the document. The schema keeps growing and it no longer seems intuitive. In the long-term, we may also want to describe the other types of YAML documents such as language and question definitions.

@danopolan
Copy link
Contributor Author

danopolan commented Jun 19, 2024

Regarding 1) and 2)
You are right, that DOCX is not a bottleneck, but it is not needed for regular output. But we can keep it building all the time for now.
Adding some abstraction above output formats and line numbers is cleaner but less intuitive for users. This project is not reflected in ISTQB working processes yet, so I would like to keep full control over the users and add the abstraction later after we decide on detailed processes.

One more thing is that if we could skip building of files, we do not touch (e.g. Body of Knowledge, Accreditation Guidelines, Sample Exam) within the branch. In the the TA, we have split the Syllabus and Sample Exam into two separate branches and PRs, so we need only the syllabus to be built in the syllabus branch and only the exam in the exam branch. But currently, we are building it all, since templates and repos created out of it have it all.

Regarding 3)
Docker image in this repo is a good idea.

Regarding 4) and 5)
Refactoring CI into scripts is a good idea with many benefits.
Static analysis should be discussed in greater scope since we want to add specific checks for ISTQB rules before building. We should agree on a solution that would allow this as well.

@Witiko
Copy link
Contributor

Witiko commented Jun 19, 2024

One more thing is that if we could skip building of files, we do not touch (e.g. Body of Knowledge, Accreditation Guidelines, Sample Exam) within the branch.

Since building all files still seems useful for the main branch, perhaps we can use a different logic for CI triggered from a pull request and only build documents that have changed in the PR.

@Witiko
Copy link
Contributor

Witiko commented Jul 11, 2024

Here is a rough outline of my tasks for today and tomorrow:

  • Finish template.py.
    • Rewrite all steps in .github/workflows/compile.yml with template.py.
    • Style- and type-check template.py.
  • Periodically build Docker image for repository istqborg/istqb_product_base.
    • This Docker image will be smaller than witiko/markdown and will include LibreOffice.
    • Blocked by Add file DEPENDS.txt Witiko/markdown#462, finish that first.
    • Add file DEPENDS.txt with all packages that we depend on in addition to Markdown.
    • Bake the repository istqborg/istqb_product_base in the Docker image for the convenience of users but update it to the current one in compile.yml.
    • Only build the Docker image on the main branch.
  • Use Docker image for .github/workflows/compile.yml instead of witiko/markdown.
    • This fixes point 4 from the original post of this ticket:

      4. Optimize execution times where possible (e.g. LibreOffice installation)

  • Document using template.py to compile documents locally.
    • This fixes point 6 from the original post of this ticket:

      6. Prepare a guide on running builds locally to prevent failed CI

  • Specify whether documents are compiled to PDF, EPUB, and HTML, as discussed in #49 (comment).
    • This fixes points 1 and 2 from the original post of this ticket:

      1. By default build only PDF files for commits
      2. Build all other files (EPUB, HTML, DOCX) if enabled in metadata.yml under separate options like docx-ou[tp]ut: false and default to be false.

  • Determine dependencies between files and whether a TEX document should be compiled based on whether it or any of its dependencies have been changed in a PR:
    • This fixes point 3 from the original post of this ticket:

      3. Build only files where changes were made except main branch. So, e.g. if I do not touch the release notes, it will not be built.

The work-in-progress implementation is currently in the PR-less branch feat/scripts. The finished implementation should close this ticket with the exception of point 5 from the original post of this ticket:

5. Employ more checks before LaTeX build to prevent failures (MD syntax, references and links, Unicode chars) or suggest how to run some checks on a local machine.

As discussed in #49 (comment) and #49 (comment), we may want to reschedule the point as a separate ticket after we have agreed on the type of additional checks we may want to employ.

@danopolan
Copy link
Contributor Author

This is looking great.

I agree, that point 5 from the original post should be done separately and incrementally based on priorities and needs. So, I confirm that point 5 will not be implemented, but the resolution of this task will prepare for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants