From b46d9c1ab685335fec6d2fcd912bd968552db44f Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Thu, 29 Feb 2024 13:27:06 +0300
Subject: [PATCH 01/17] feat(#56):first version of blog about cache
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 234 ++++++++++++++++++
images/AssembleMojo.svg | 49 ++++
images/EO.svg | 195 +++++++++++++++
images/buildmvn.svg | 82 ++++++
images/ccache.svg | 83 +++++++
images/defaultPhaseMaven.svg | 67 +++++
6 files changed, 710 insertions(+)
create mode 100644 _posts/2024/2024-02-06-about-caching-in-eo.md
create mode 100644 images/AssembleMojo.svg
create mode 100644 images/EO.svg
create mode 100644 images/buildmvn.svg
create mode 100644 images/ccache.svg
create mode 100644 images/defaultPhaseMaven.svg
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
new file mode 100644
index 0000000..5befaee
--- /dev/null
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -0,0 +1,234 @@
+---
+layout: post
+date: 2024-02-06
+title: "Build cache in EO and other build systems"
+author: Alekseeva Yana
+---
+
+
+
+## Introduction
+Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an
+assembly, he loses focus on a task and spends valuable working time. Different build systems use many tools,
+helping to assemble a project faster, namely caching, task parallelization, distributed building and much more.
+The subject of this article is caching, because completed tasks caching allows not to spend resources again.
+So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work.
+While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin`
+for EO version `0.34.0`. The error occurred, because using a file name and comparing equality of
+compilation time and caching time is not the most reliable verification. Unit tests were written showing that
+cache does not work correctly. Also reading a file was necessary for getting a programme name
+that slowed down an assembly.
+That we came to conclusion that we need caching with a reliable verification which does not require reading a file
+from disk. And using cache should save us enough time for building a project.
+
+The goal of this article is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
+and to create effective caching in [EO](https://github.com/objectionary/eo).
+
+## Build caching of existing build systems
+
+### ccache/sccache
+In compiled programming languages, building a project takes a long time.
+The reason of long compilation is time is spent on preparing, optimizing and checking the code, and so on.
+To speed up the assembly of compiled languages, ccache and sccache are used.
+Let's look at the compilation scheme using C++ as an example,
+to imagine the build process in compiled languages:
+
+
+![Picture 1](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/ccach.svg)
+
+1) First, preprocessor gets the input files. Input files are code files and header files.
+The preprocessor removes comments from the code and converts the code into in accordance
+with macros and executes other directives, starting with the “#” symbol
+(such as #include, #define, various directives like #pragma).
+The result is a single edited file with human-readable code that can be submitted to the compiler.
+
+
+2) The compiler receives the finished code file and converts it into machine code, presented in an object file.
+At the compilation stage, parsing occurs, which checks whether the code matches
+rules of a specific programming language. Next, the code is parsed into machine code according to the rules.
+At the end of its work, the compiler optimizes the resulting machine code and produces an object file.
+To speed up compilation, different files of the same project are compiled in parallel,
+that is, we receive several object files at once.
+
+3) After all received project object files are passed to the linker.
+Linker is a program that combines program components, written in assembly language or a high-level programming language,
+to an executable file or library. The result of the linker is an executable .exe file.
+
+
+As a result, in compiled languages, multiple files are simultaneously and independently converted into machine code at the compilation stage.
+This machine code is then combined into one executable file.
+
+
+`ccache` has two main caching methods они:
+1) `Direct mode` - hashcode is generated based on the source code.
+2) `Preprocessor mode` - hashcode is generated based on the result of preprocessor.
+
+The hashcode includes information: file contents, directory, compiler information, compilation time, extensions
+used by the compiler. A compressed machine code file is placed in the cache using the received key.
+
+`Direct mode` compiles the program faster, since the preprocessor step is skipped.
+But header files are not checked for changes, so the wrong project may be built.
+`Preprocessor mode` is slower than `direct mode`, but right project is built always.
+
+Sccache, unlike ccache, allows to store the cache not only locally but also in the cloud,
+and it also has fixed some bugs (for example, there is a check of header files, which makes direct mode more accurate).
+
+
+### Maven
+`Maven` automates and manages Java-projects build. Building a project in `Maven` is completed in three
+maven [LifeCycles Maven](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
+which consist of `phases`. `Phases` in turn consist of sets of `goals`.
+
+`Maven` has default `phases` and `goals` which build any projects.
+
+
+![Picture 2](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/defaultPhaseMaven.svg)
+
+
+In `Maven` all phases and goals are executed strictly in order, linearly.
+But in `Maven` there is no build-time caching as such.
+`Maven` suggests rebuilding only changed project modules to speed up the build process.
+
+### Gradle
+`Gradle`, like `Maven`, builds a project in
+[LifeCycles Gradle](https://docs.gradle.org/current/userguide/build_lifecycle.html), which consists of phases.
+But unlike `Maven`, `Gradle` builds projects using a task graph -
+[Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph),
+in which some tasks can be executed synchronously.
+To speed up project builds, `Gradle` uses incremental builds
+[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work).
+For an incremental build to work, the tasks that are used to build the project must have
+source and output files must be specified.
+```
+task myTask {
+ inputs.dir 'src/main/java/MyTask.somebody' // Specify the input directory
+ outputs.dir 'build/classes/java/main/MyTask.somebody' // Specify the output directory
+
+ doLast {
+ // Task actions go here
+ // This code will only be executed if the inputs or outputs have changed
+ }
+}
+```
+Every time before executing a task, `Gradle` makes a fingerprint of the path
+and contents of the source files and saves it.
+If the task completes successfully, then `Gradle` also makes a fingerprint from the resulting files.
+To avoid re-fingerprinting the original files, `Gradle` checks the last modification time and the size of the original
+files before reassembling. Thus, when the project is rebuilt, some or all of the tasks may be
+not completed, but to use the results already obtained.
+`Gradle` also stores fingerprints of previous builds so that projects can be built quickly, for example when switching
+from one branch to another - `Build Cache`.
+
+
+
+
+### EO build cache
+
+EO code is compiled using the `Maven` build system.
+For this purpose, the `eo-maven-plugin` plugin was written,
+which contains the goals necessary for working with EO code.
+As was written above, the assembly of projects in `Maven` occurs in a certain order of phases.
+In the diagram you can see the main phases and their goals for the EO version of the compiler (specify version):
+
+![Picture 3](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/EO.svg)
+
+In [Picture 3](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/EO.svg) the goals from the `eo-maven-plugin`
+are highlighted in green.
+
+
+But the actual work with EO code takes place in `AssembleMojo`.
+`AssembleMojo` is the goal consisting of other goals that work with the EO file
+[Picture 4](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/AssembleMojo.svg).
+
+![Picture 4](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/AssembleMojo.svg)
+
+Each goal in `AssembleMojo` is a specific compilation step for EO code, and we need to use
+caching at each step to speed up the assembly of the EO program.
+
+In EO version `0.34.0`,
+caching for different `Mojo` was done using unrelated different `Footprint` and `Optimization` interfaces,
+within which mostly the same methods were used.
+The difference between interfaces is that in `Footprint` the EO version of the compiler is checked,
+while the rest of the checks are exactly the same.
+
+
+Now goals are `ParseMojo`, `OptimazeMojo` и `ShakeMojo` , in which caching can be applied,
+have directory of results and directory of cache.
+
+
+The disadvantages of initial caching in EO:
+* the compilation time and the time of saving to the cache must be equal.
+The problem with this verification is that the moment of compilation and the moment of saving to the cache must coincide.
+* verification data is read from a file on disk. This is a long and expensive operation.
+* each purpose uses its own classes and interfaces for data caching.
+This makes the code difficult to extensibility and readability.
+
+
+Therefore, our target is to create a single class responsible for caching data
+and loading the necessary data from the cache, which can be used for any `Mojo` from the `eo-maven-plugin`.
+
+
+How do we want to fix this disadvantages:
+1) Create a new class `Cache` that will be responsible for data verification, saving to cache and loading from cache.
+
+```
+public class Cache {
+
+ private List validations;
+
+ public Cache(final List cv) {
+ this.validations = cv;
+ }
+
+ public Optional load(final Path source, final Path cache) {...};
+
+ public void save(final Path cache, final Scalar program, final Path relative) {...};
+}
+```
+
+
+`List` is a list of validations that are implemented from the `CacheValidation` interface.
+Different validations can be applied for different `Mojo`.
+
+
+```
+public interface CacheValidation {
+ boolean validate(final Path source, final Path cache) throws IOException;
+}
+```
+
+2) To avoid reading from disk, we will use file paths `Path`.
+The classes `Path` and `Files` have methods to obtain the necessary information.
+
+
+3) The relevance of the cached data will be checked by the condition
+that the time of the last modification of the source file must be earlier than or equal to that saved in the cache.
+
+These solutions will speed up compilation in the build system `Maven`.
+
+
+### Conclusion
+There is an EO program `program.eo`, which is launched for the first time.
+At each `Mojo` stage, the execution results will be saved to the cache of the current `Mojo`.
+If this program is run again, these `Mojo` will receive data from the cache,
+without wasting time and computer resources on recompilation.
+If we change something in the `program.eo` file, the program will have to be recompiled,
+since the last modification time the original file will be later than those stored in the cache.
+As a result of `Mojo` work, the cache was overwritten.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/images/AssembleMojo.svg b/images/AssembleMojo.svg
new file mode 100644
index 0000000..f99b37b
--- /dev/null
+++ b/images/AssembleMojo.svg
@@ -0,0 +1,49 @@
+
+
+
+
\ No newline at end of file
diff --git a/images/EO.svg b/images/EO.svg
new file mode 100644
index 0000000..cde1db8
--- /dev/null
+++ b/images/EO.svg
@@ -0,0 +1,195 @@
+
+
+
+
\ No newline at end of file
diff --git a/images/buildmvn.svg b/images/buildmvn.svg
new file mode 100644
index 0000000..399a6da
--- /dev/null
+++ b/images/buildmvn.svg
@@ -0,0 +1,82 @@
+
+
+
+
\ No newline at end of file
diff --git a/images/ccache.svg b/images/ccache.svg
new file mode 100644
index 0000000..1beaa08
--- /dev/null
+++ b/images/ccache.svg
@@ -0,0 +1,83 @@
+
+
+
+
+
diff --git a/images/defaultPhaseMaven.svg b/images/defaultPhaseMaven.svg
new file mode 100644
index 0000000..6f20610
--- /dev/null
+++ b/images/defaultPhaseMaven.svg
@@ -0,0 +1,67 @@
+
+
+
+
\ No newline at end of file
From 805d79c4bc551be050e94df73c7500234439de65 Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Tue, 5 Mar 2024 13:54:12 +0300
Subject: [PATCH 02/17] fix(#2790):fix images
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index 5befaee..bae884f 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -34,7 +34,7 @@ Let's look at the compilation scheme using C++ as an example,
to imagine the build process in compiled languages:
-![Picture 1](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/ccach.svg)
+![Picture 1](/images/ccach.svg)
1) First, preprocessor gets the input files. Input files are code files and header files.
The preprocessor removes comments from the code and converts the code into in accordance
@@ -130,17 +130,17 @@ which contains the goals necessary for working with EO code.
As was written above, the assembly of projects in `Maven` occurs in a certain order of phases.
In the diagram you can see the main phases and their goals for the EO version of the compiler (specify version):
-![Picture 3](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/EO.svg)
+![Picture 3](/images/EO.svg)
-In [Picture 3](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/EO.svg) the goals from the `eo-maven-plugin`
+In [Picture 3](/images/EO.svg) the goals from the `eo-maven-plugin`
are highlighted in green.
But the actual work with EO code takes place in `AssembleMojo`.
`AssembleMojo` is the goal consisting of other goals that work with the EO file
-[Picture 4](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/AssembleMojo.svg).
+[Picture 4](/images/AssembleMojo.svg).
-![Picture 4](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/AssembleMojo.svg)
+![Picture 4](/images/AssembleMojo.svg)
Each goal in `AssembleMojo` is a specific compilation step for EO code, and we need to use
caching at each step to speed up the assembly of the EO program.
From ca11fee7bb13538f742d5d783e8cc92eb594eaca Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Tue, 5 Mar 2024 14:34:30 +0300
Subject: [PATCH 03/17] feat(#56):fix images
This reverts commit 805d79c4bc551be050e94df73c7500234439de65.
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index bae884f..5befaee 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -34,7 +34,7 @@ Let's look at the compilation scheme using C++ as an example,
to imagine the build process in compiled languages:
-![Picture 1](/images/ccach.svg)
+![Picture 1](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/ccach.svg)
1) First, preprocessor gets the input files. Input files are code files and header files.
The preprocessor removes comments from the code and converts the code into in accordance
@@ -130,17 +130,17 @@ which contains the goals necessary for working with EO code.
As was written above, the assembly of projects in `Maven` occurs in a certain order of phases.
In the diagram you can see the main phases and their goals for the EO version of the compiler (specify version):
-![Picture 3](/images/EO.svg)
+![Picture 3](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/EO.svg)
-In [Picture 3](/images/EO.svg) the goals from the `eo-maven-plugin`
+In [Picture 3](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/EO.svg) the goals from the `eo-maven-plugin`
are highlighted in green.
But the actual work with EO code takes place in `AssembleMojo`.
`AssembleMojo` is the goal consisting of other goals that work with the EO file
-[Picture 4](/images/AssembleMojo.svg).
+[Picture 4](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/AssembleMojo.svg).
-![Picture 4](/images/AssembleMojo.svg)
+![Picture 4](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/AssembleMojo.svg)
Each goal in `AssembleMojo` is a specific compilation step for EO code, and we need to use
caching at each step to speed up the assembly of the EO program.
From b8789f7d87d94efec3ebe45a2c73efc85e53cbcb Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Tue, 5 Mar 2024 15:05:29 +0300
Subject: [PATCH 04/17] feat(#56):fix images
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index 5befaee..c3c8614 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -5,7 +5,6 @@ title: "Build cache in EO and other build systems"
author: Alekseeva Yana
---
-
## Introduction
Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an
@@ -24,6 +23,8 @@ from disk. And using cache should save us enough time for building a project.
The goal of this article is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
and to create effective caching in [EO](https://github.com/objectionary/eo).
+
+
## Build caching of existing build systems
### ccache/sccache
@@ -33,8 +34,7 @@ To speed up the assembly of compiled languages, ccache and sccache are used.
Let's look at the compilation scheme using C++ as an example,
to imagine the build process in compiled languages:
-
-![Picture 1](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/ccach.svg)
+![Picture 1](/images/ccach.svg)
1) First, preprocessor gets the input files. Input files are code files and header files.
The preprocessor removes comments from the code and converts the code into in accordance
@@ -82,7 +82,7 @@ which consist of `phases`. `Phases` in turn consist of sets of `goals`.
`Maven` has default `phases` and `goals` which build any projects.
-![Picture 2](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/defaultPhaseMaven.svg)
+![Picture 2](/images/defaultPhaseMaven.svg)
In `Maven` all phases and goals are executed strictly in order, linearly.
@@ -130,17 +130,17 @@ which contains the goals necessary for working with EO code.
As was written above, the assembly of projects in `Maven` occurs in a certain order of phases.
In the diagram you can see the main phases and their goals for the EO version of the compiler (specify version):
-![Picture 3](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/EO.svg)
+![Picture 3](/images/EO.svg)
-In [Picture 3](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/EO.svg) the goals from the `eo-maven-plugin`
+In [Picture 3](/images/EO.svg) the goals from the `eo-maven-plugin`
are highlighted in green.
But the actual work with EO code takes place in `AssembleMojo`.
`AssembleMojo` is the goal consisting of other goals that work with the EO file
-[Picture 4](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/AssembleMojo.svg).
+[Picture 4](/images/AssembleMojo.svg).
-![Picture 4](/Users/yanaalekseeva/IdeaProjects/news.eolang.org/images/AssembleMojo.svg)
+![Picture 4](/images/AssembleMojo.svg)
Each goal in `AssembleMojo` is a specific compilation step for EO code, and we need to use
caching at each step to speed up the assembly of the EO program.
From 198cb965f41a43171f740e6171dda1788e243569 Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Tue, 5 Mar 2024 15:10:46 +0300
Subject: [PATCH 05/17] feat(#56):delete image and fix
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 4 +-
images/buildmvn.svg | 82 -------------------
2 files changed, 2 insertions(+), 84 deletions(-)
delete mode 100644 images/buildmvn.svg
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index c3c8614..e6b7638 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -34,7 +34,7 @@ To speed up the assembly of compiled languages, ccache and sccache are used.
Let's look at the compilation scheme using C++ as an example,
to imagine the build process in compiled languages:
-![Picture 1](/images/ccach.svg)
+![Picture 1](/images/ccache.svg)
1) First, preprocessor gets the input files. Input files are code files and header files.
The preprocessor removes comments from the code and converts the code into in accordance
@@ -79,7 +79,7 @@ and it also has fixed some bugs (for example, there is a check of header files,
maven [LifeCycles Maven](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
which consist of `phases`. `Phases` in turn consist of sets of `goals`.
-`Maven` has default `phases` and `goals` which build any projects.
+`Maven` has default `phases` and `goals` which build any projects:
![Picture 2](/images/defaultPhaseMaven.svg)
diff --git a/images/buildmvn.svg b/images/buildmvn.svg
deleted file mode 100644
index 399a6da..0000000
--- a/images/buildmvn.svg
+++ /dev/null
@@ -1,82 +0,0 @@
-
-
-
-
\ No newline at end of file
From 96e9f05a91df6602ce876ba0fc52a5ccb05e1b34 Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Tue, 5 Mar 2024 15:41:41 +0300
Subject: [PATCH 06/17] feat(#56):aligning images
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index e6b7638..d5fa140 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -34,7 +34,9 @@ To speed up the assembly of compiled languages, ccache and sccache are used.
Let's look at the compilation scheme using C++ as an example,
to imagine the build process in compiled languages:
-![Picture 1](/images/ccache.svg)
+
+
+
1) First, preprocessor gets the input files. Input files are code files and header files.
The preprocessor removes comments from the code and converts the code into in accordance
@@ -81,9 +83,9 @@ which consist of `phases`. `Phases` in turn consist of sets of `goals`.
`Maven` has default `phases` and `goals` which build any projects:
-
-![Picture 2](/images/defaultPhaseMaven.svg)
-
+
+
+
In `Maven` all phases and goals are executed strictly in order, linearly.
But in `Maven` there is no build-time caching as such.
@@ -130,7 +132,9 @@ which contains the goals necessary for working with EO code.
As was written above, the assembly of projects in `Maven` occurs in a certain order of phases.
In the diagram you can see the main phases and their goals for the EO version of the compiler (specify version):
-![Picture 3](/images/EO.svg)
+
+
+
In [Picture 3](/images/EO.svg) the goals from the `eo-maven-plugin`
are highlighted in green.
@@ -140,7 +144,10 @@ But the actual work with EO code takes place in `AssembleMojo`.
`AssembleMojo` is the goal consisting of other goals that work with the EO file
[Picture 4](/images/AssembleMojo.svg).
-![Picture 4](/images/AssembleMojo.svg)
+
+
+
+
Each goal in `AssembleMojo` is a specific compilation step for EO code, and we need to use
caching at each step to speed up the assembly of the EO program.
From 4d30d65b466f5a0eb908c0ef223c01c1678e2794 Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Wed, 13 Mar 2024 14:40:21 +0300
Subject: [PATCH 07/17] feat(#56):fix introduction and grammar
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 108 +++++++++---------
1 file changed, 52 insertions(+), 56 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index d5fa140..1d19460 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -6,21 +6,21 @@ author: Alekseeva Yana
---
-## Introduction
-Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an
-assembly, he loses focus on a task and spends valuable working time. Different build systems use many tools,
-helping to assemble a project faster, namely caching, task parallelization, distributed building and much more.
-The subject of this article is caching, because completed tasks caching allows not to spend resources again.
-So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work.
-While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin`
-for EO version `0.34.0`. The error occurred, because using a file name and comparing equality of
-compilation time and caching time is not the most reliable verification. Unit tests were written showing that
-cache does not work correctly. Also reading a file was necessary for getting a programme name
-that slowed down an assembly.
-That we came to conclusion that we need caching with a reliable verification which does not require reading a file
-from disk. And using cache should save us enough time for building a project.
-
-The goal of this article is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
+## Introduction
+In [EO](https://github.com/objectionary/eo) a caching is used to speed up program execution.
+While developing [EO](https://github.com/objectionary/eo) we found a caching
+[error](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
+for EO version `0.34.0`. The error occurred because the cache was searched for the needed file using
+a comparison of compilation time and caching time.
+This is not the most reliable verification method,
+because caching time does not have to be equal to compilation time.
+[Unit tests](https://github.com/objectionary/eo/pull/2749) were written to show that the
+cache does not work correctly. Additionally, reading a file was necessary to obtain a program name
+that slowed down the build process.
+That we came to the conclusion that we need caching with a reliable verification method
+that does not require reading a file system. Using a cache should save us enough time for building a project.
+
+The goal of this blog is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
and to create effective caching in [EO](https://github.com/objectionary/eo).
@@ -29,17 +29,18 @@ and to create effective caching in [EO](https://github.com/objectionary/eo).
### ccache/sccache
In compiled programming languages, building a project takes a long time.
-The reason of long compilation is time is spent on preparing, optimizing and checking the code, and so on.
+The reason for the lengthy compilation time is that time is spent on preparing,
+optimizing, checking the code, and so on.
To speed up the assembly of compiled languages, ccache and sccache are used.
-Let's look at the compilation scheme using C++ as an example,
+Let's look at the compilation scheme using C++ as an example
to imagine the build process in compiled languages:
-1) First, preprocessor gets the input files. Input files are code files and header files.
-The preprocessor removes comments from the code and converts the code into in accordance
+1) First, preprocessor gets the input files. The input files are code files and header files.
+The preprocessor removes comments from the code and converts the code in accordance
with macros and executes other directives, starting with the “#” symbol
(such as #include, #define, various directives like #pragma).
The result is a single edited file with human-readable code that can be submitted to the compiler.
@@ -54,14 +55,15 @@ that is, we receive several object files at once.
3) After all received project object files are passed to the linker.
Linker is a program that combines program components, written in assembly language or a high-level programming language,
-to an executable file or library. The result of the linker is an executable .exe file.
+into an executable file or library. The result of the linker is an executable .exe file.
-As a result, in compiled languages, multiple files are simultaneously and independently converted into machine code at the compilation stage.
+As a result, in compiled languages, multiple files are simultaneously and independently converted
+into machine code at the compilation stage.
This machine code is then combined into one executable file.
-`ccache` has two main caching methods они:
+`ccache` has two main caching methods:
1) `Direct mode` - hashcode is generated based on the source code.
2) `Preprocessor mode` - hashcode is generated based on the result of preprocessor.
@@ -69,19 +71,19 @@ The hashcode includes information: file contents, directory, compiler informatio
used by the compiler. A compressed machine code file is placed in the cache using the received key.
`Direct mode` compiles the program faster, since the preprocessor step is skipped.
-But header files are not checked for changes, so the wrong project may be built.
-`Preprocessor mode` is slower than `direct mode`, but right project is built always.
+BuHowever,the header files are not checked for changes, so the wrong project may be built.
+`Preprocessor mode` is slower than `direct mode`, but the right project is built always.
-Sccache, unlike ccache, allows to store the cache not only locally but also in the cloud,
+Sccache, unlike ccache, allows the cache to be stored not only locally but also in the cloud,
and it also has fixed some bugs (for example, there is a check of header files, which makes direct mode more accurate).
### Maven
-`Maven` automates and manages Java-projects build. Building a project in `Maven` is completed in three
+`Maven` automates and manages Java-project builds. Building a project in `Maven` is completed in three
maven [LifeCycles Maven](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
which consist of `phases`. `Phases` in turn consist of sets of `goals`.
-`Maven` has default `phases` and `goals` which build any projects:
+`Maven` has default `phases` and `goals` for building any projects:
@@ -92,15 +94,13 @@ But in `Maven` there is no build-time caching as such.
`Maven` suggests rebuilding only changed project modules to speed up the build process.
### Gradle
-`Gradle`, like `Maven`, builds a project in
-[LifeCycles Gradle](https://docs.gradle.org/current/userguide/build_lifecycle.html), which consists of phases.
But unlike `Maven`, `Gradle` builds projects using a task graph -
[Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph),
in which some tasks can be executed synchronously.
-To speed up project builds, `Gradle` uses incremental builds
+To speed up project builds, `Gradle` employs incremental builds
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work).
-For an incremental build to work, the tasks that are used to build the project must have
-source and output files must be specified.
+For an incremental build to work, the tasks used to build the project must have specified
+source and output files.
```
task myTask {
inputs.dir 'src/main/java/MyTask.somebody' // Specify the input directory
@@ -112,14 +112,13 @@ task myTask {
}
}
```
-Every time before executing a task, `Gradle` makes a fingerprint of the path
+Before executing a task, `Gradle` makes a fingerprint of the path
and contents of the source files and saves it.
-If the task completes successfully, then `Gradle` also makes a fingerprint from the resulting files.
+If the task completes successfully, `Gradle` also makes a fingerprint from the resulting files.
To avoid re-fingerprinting the original files, `Gradle` checks the last modification time and the size of the original
-files before reassembling. Thus, when the project is rebuilt, some or all of the tasks may be
-not completed, but to use the results already obtained.
-`Gradle` also stores fingerprints of previous builds so that projects can be built quickly, for example when switching
-from one branch to another - `Build Cache`.
+files before reassembling. This allows `Gradle` to use the results already obtained when the project is rebuilt.
+Additionally, `Gradle` stores fingerprints of previous builds enabling quick project builds,
+for example when switching from one branch to another - known as the - `Build Cache`.
@@ -127,10 +126,10 @@ from one branch to another - `Build Cache`.
### EO build cache
EO code is compiled using the `Maven` build system.
-For this purpose, the `eo-maven-plugin` plugin was written,
-which contains the goals necessary for working with EO code.
-As was written above, the assembly of projects in `Maven` occurs in a certain order of phases.
-In the diagram you can see the main phases and their goals for the EO version of the compiler (specify version):
+For this purpose, the `eo-maven-plugin` plugin was created,
+which contains the necessary goals for working with EO code.
+As mentioned earlier, the assembly of projects in `Maven` occurs in a specific order of phases.
+In the diagram you can observe the main phases and their goals for the EO last version of the compiler:
@@ -140,8 +139,8 @@ In [Picture 3](/images/EO.svg) the goals from the `eo-maven-plugin`
are highlighted in green.
-But the actual work with EO code takes place in `AssembleMojo`.
-`AssembleMojo` is the goal consisting of other goals that work with the EO file
+However, the actual work with EO code takes place in `AssembleMojo`.
+`AssembleMojo` is the goal consisting of other goals that work with the EO file, as shown in
[Picture 4](/images/AssembleMojo.svg).
@@ -153,30 +152,27 @@ Each goal in `AssembleMojo` is a specific compilation step for EO code, and we n
caching at each step to speed up the assembly of the EO program.
In EO version `0.34.0`,
-caching for different `Mojo` was done using unrelated different `Footprint` and `Optimization` interfaces,
+caching for different `Mojo` was done using unrelated `Footprint` and `Optimization` interfaces,
within which mostly the same methods were used.
The difference between interfaces is that in `Footprint` the EO version of the compiler is checked,
while the rest of the checks are exactly the same.
-Now goals are `ParseMojo`, `OptimazeMojo` и `ShakeMojo` , in which caching can be applied,
+Now, goals are `ParseMojo`, `OptimazeMojo` и `ShakeMojo` , in which caching can be applied,
have directory of results and directory of cache.
-The disadvantages of initial caching in EO:
-* the compilation time and the time of saving to the cache must be equal.
-The problem with this verification is that the moment of compilation and the moment of saving to the cache must coincide.
-* verification data is read from a file on disk. This is a long and expensive operation.
-* each purpose uses its own classes and interfaces for data caching.
-This makes the code difficult to extensibility and readability.
+The disadvantages of initial caching in EO include:
+* The compilation time and the time of saving to the cache must be equal, which can be challenging to verify.
+* Verification data is read from a file on disk, which is a long and expensive operation.
+* Each purpose uses its own classes and interfaces for data caching, making the code difficult to extend and read.
-Therefore, our target is to create a single class responsible for caching data
-and loading the necessary data from the cache, which can be used for any `Mojo` from the `eo-maven-plugin`.
+To address these disadvantages, the following solutions are proposed:
-How do we want to fix this disadvantages:
-1) Create a new class `Cache` that will be responsible for data verification, saving to cache and loading from cache.
+
+1) Create a new class `Cache` responsible for data verification, saving to cache and loading from cache.
```
public class Cache {
From 3cdae01168de2bd22973b336b875f4907b96942f Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Thu, 14 Mar 2024 12:02:55 +0300
Subject: [PATCH 08/17] feat(#56):fix passive voice
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 32 +++++++++----------
1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index 1d19460..3309368 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -7,18 +7,15 @@ author: Alekseeva Yana
## Introduction
-In [EO](https://github.com/objectionary/eo) a caching is used to speed up program execution.
-While developing [EO](https://github.com/objectionary/eo) we found a caching
-[error](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
-for EO version `0.34.0`. The error occurred because the cache was searched for the needed file using
-a comparison of compilation time and caching time.
+In [EO](https://github.com/objectionary/eo), caching is used to speed up program compilation.
+Recently we found a caching
+[bug](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
+for EO version `0.34.0`. The error occurred because the algorithm compared
+the compilation time and caching time to search for the needed file.
This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
-[Unit tests](https://github.com/objectionary/eo/pull/2749) were written to show that the
-cache does not work correctly. Additionally, reading a file was necessary to obtain a program name
-that slowed down the build process.
That we came to the conclusion that we need caching with a reliable verification method
-that does not require reading a file system. Using a cache should save us enough time for building a project.
+that does not require reading a file system.
The goal of this blog is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
and to create effective caching in [EO](https://github.com/objectionary/eo).
@@ -32,7 +29,7 @@ In compiled programming languages, building a project takes a long time.
The reason for the lengthy compilation time is that time is spent on preparing,
optimizing, checking the code, and so on.
To speed up the assembly of compiled languages, ccache and sccache are used.
-Let's look at the compilation scheme using C++ as an example
+Let's look at the assembly scheme using C++ as an example
to imagine the build process in compiled languages:
@@ -43,12 +40,13 @@ to imagine the build process in compiled languages:
The preprocessor removes comments from the code and converts the code in accordance
with macros and executes other directives, starting with the “#” symbol
(such as #include, #define, various directives like #pragma).
-The result is a single edited file with human-readable code that can be submitted to the compiler.
+The result is a single edited file with human-readable code that the compiler will get.
2) The compiler receives the finished code file and converts it into machine code, presented in an object file.
At the compilation stage, parsing occurs, which checks whether the code matches
-rules of a specific programming language. Next, the code is parsed into machine code according to the rules.
+rules of a specific programming language. Next, the parsing occurs preprocessor code into machine code
+according to the rules.
At the end of its work, the compiler optimizes the resulting machine code and produces an object file.
To speed up compilation, different files of the same project are compiled in parallel,
that is, we receive several object files at once.
@@ -125,7 +123,7 @@ for example when switching from one branch to another - known as the - `Build Ca
### EO build cache
-EO code is compiled using the `Maven` build system.
+EO code uses the `Maven` build system to assembly.
For this purpose, the `eo-maven-plugin` plugin was created,
which contains the necessary goals for working with EO code.
As mentioned earlier, the assembly of projects in `Maven` occurs in a specific order of phases.
@@ -152,9 +150,9 @@ Each goal in `AssembleMojo` is a specific compilation step for EO code, and we n
caching at each step to speed up the assembly of the EO program.
In EO version `0.34.0`,
-caching for different `Mojo` was done using unrelated `Footprint` and `Optimization` interfaces,
+caching used unrelated `Footprint` and `Optimization` interfaces for different `Mojo`,
within which mostly the same methods were used.
-The difference between interfaces is that in `Footprint` the EO version of the compiler is checked,
+The difference between interfaces is that `Footprint` checks the EO version of the compiler,
while the rest of the checks are exactly the same.
@@ -190,8 +188,8 @@ public class Cache {
```
-`List` is a list of validations that are implemented from the `CacheValidation` interface.
-Different validations can be applied for different `Mojo`.
+`List` is a list of validations. Validations implemented from the `CacheValidation` interface.
+Different `Mojo` can use different validations.
```
From ed510bc8e347a09c2d18bad9bee2064b8ab2dc0c Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Fri, 15 Mar 2024 12:43:20 +0300
Subject: [PATCH 09/17] feat(#56):fix text and change conclusion
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 112 +++++++++---------
images/{ccache.svg => defaultCPhase.svg} | 0
2 files changed, 58 insertions(+), 54 deletions(-)
rename images/{ccache.svg => defaultCPhase.svg} (100%)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index 3309368..20a7469 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -10,50 +10,45 @@ author: Alekseeva Yana
In [EO](https://github.com/objectionary/eo), caching is used to speed up program compilation.
Recently we found a caching
[bug](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
-for EO version `0.34.0`. The error occurred because the algorithm compared
-the compilation time and caching time to search for the needed file.
+for EO version `0.34.0`. The bug occurred because the old verification method
+contains a comparison of the compilation time and caching time to search for the cached file.
This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
-That we came to the conclusion that we need caching with a reliable verification method
-that does not require reading a file system.
+We came to the conclusion that we need caching with a reliable verification method.
+And this verification method should not use the information that the cached file contains.
The goal of this blog is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
-and to create effective caching in [EO](https://github.com/objectionary/eo).
+and to implement effective caching in [EO](https://github.com/objectionary/eo).
## Build caching of existing build systems
### ccache/sccache
-In compiled programming languages, building a project takes a long time.
-The reason for the lengthy compilation time is that time is spent on preparing,
-optimizing, checking the code, and so on.
-To speed up the assembly of compiled languages, ccache and sccache are used.
+In compiled programming languages, building a project containing many source code files takes a long time.
+This time is spent on loading of libraries, preparing, optimizing, checking the code, and so on.
+To speed up the assembly of compiled languages, [ccache](https://ccache.dev)
+and [sccache](https://github.com/mozilla/sccache) are used.
Let's look at the assembly scheme using C++ as an example
to imagine the build process in compiled languages:
-
+
-1) First, preprocessor gets the input files. The input files are code files and header files.
-The preprocessor removes comments from the code and converts the code in accordance
-with macros and executes other directives, starting with the “#” symbol
-(such as #include, #define, various directives like #pragma).
+1) First, preprocessor gets the input files. The input files are source files (.cpp) and header files (.h).
The result is a single edited file with human-readable code that the compiler will get.
2) The compiler receives the finished code file and converts it into machine code, presented in an object file.
At the compilation stage, parsing occurs, which checks whether the code matches
-rules of a specific programming language. Next, the parsing occurs preprocessor code into machine code
-according to the rules.
-At the end of its work, the compiler optimizes the resulting machine code and produces an object file.
-To speed up compilation, different files of the same project are compiled in parallel,
-that is, we receive several object files at once.
+rules of a specific programming language.
+At the end, the compiler optimizes the resulting machine code and produces an object file.
+To speed up compilation, different files of the same project are compiled in parallel.
-3) After all received project object files are passed to the linker.
-Linker is a program that combines program components, written in assembly language or a high-level programming language,
-into an executable file or library. The result of the linker is an executable .exe file.
+3) Then, the linker gets object files.
+Linker is a program that combines object files into an executable file or library.
+The result of the linker is an executable .exe file.
As a result, in compiled languages, multiple files are simultaneously and independently converted
@@ -61,25 +56,27 @@ into machine code at the compilation stage.
This machine code is then combined into one executable file.
-`ccache` has two main caching methods:
-1) `Direct mode` - hashcode is generated based on the source code.
-2) `Preprocessor mode` - hashcode is generated based on the result of preprocessor.
-
-The hashcode includes information: file contents, directory, compiler information, compilation time, extensions
+`ccache` uses hashcode to find cached files. The hashcode includes information:
+file contents, directory, compiler information, compilation time, extensions
used by the compiler. A compressed machine code file is placed in the cache using the received key.
-`Direct mode` compiles the program faster, since the preprocessor step is skipped.
-BuHowever,the header files are not checked for changes, so the wrong project may be built.
+
+`ccache` has two main caching methods:
+1) `Direct mode` - hashcode is generated based on the source code.
+`Direct mode` compiles the program faster, since the preprocessor step is skipped.
+However,the header files are not checked for changes, so the wrong project may be built.
+2) `Preprocessor mode` - hashcode is generated based on the result of preprocessor.
`Preprocessor mode` is slower than `direct mode`, but the right project is built always.
-Sccache, unlike ccache, allows the cache to be stored not only locally but also in the cloud,
-and it also has fixed some bugs (for example, there is a check of header files, which makes direct mode more accurate).
+`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in the cloud.
+And it also has fixed some bugs (for example, there is a check of header files, which makes direct mode more accurate).
### Maven
-`Maven` automates and manages Java-project builds. Building a project in `Maven` is completed in three
+[Maven](https://maven.apache.org) automates and manages Java-project builds.
+Building a project in `Maven` is completed in three
maven [LifeCycles Maven](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
-which consist of `phases`. `Phases` in turn consist of sets of `goals`.
+which consist of `phases`. `Phases` consist of sets of `goals`.
`Maven` has default `phases` and `goals` for building any projects:
@@ -92,7 +89,7 @@ But in `Maven` there is no build-time caching as such.
`Maven` suggests rebuilding only changed project modules to speed up the build process.
### Gradle
-But unlike `Maven`, `Gradle` builds projects using a task graph -
+But unlike `Maven`, [Gradle](https://gradle.org) builds projects using a task graph -
[Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph),
in which some tasks can be executed synchronously.
To speed up project builds, `Gradle` employs incremental builds
@@ -116,17 +113,18 @@ If the task completes successfully, `Gradle` also makes a fingerprint from the r
To avoid re-fingerprinting the original files, `Gradle` checks the last modification time and the size of the original
files before reassembling. This allows `Gradle` to use the results already obtained when the project is rebuilt.
Additionally, `Gradle` stores fingerprints of previous builds enabling quick project builds,
-for example when switching from one branch to another - known as the - `Build Cache`.
+for example when switching from one branch to another - known as the -
+[Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).
### EO build cache
-EO code uses the `Maven` build system to assembly.
+EO code uses the `Maven` build system to build.
For this purpose, the `eo-maven-plugin` plugin was created,
which contains the necessary goals for working with EO code.
-As mentioned earlier, the assembly of projects in `Maven` occurs in a specific order of phases.
+As mentioned earlier, the build of projects in `Maven` occurs in a specific order of phases.
In the diagram you can observe the main phases and their goals for the EO last version of the compiler:
@@ -147,23 +145,20 @@ However, the actual work with EO code takes place in `AssembleMojo`.
Each goal in `AssembleMojo` is a specific compilation step for EO code, and we need to use
-caching at each step to speed up the assembly of the EO program.
+caching at each step to speed up the build of the EO program.
+
In EO version `0.34.0`,
caching used unrelated `Footprint` and `Optimization` interfaces for different `Mojo`,
-within which mostly the same methods were used.
+which used the same methods.
The difference between interfaces is that `Footprint` checks the EO version of the compiler,
while the rest of the checks are exactly the same.
-Now, goals are `ParseMojo`, `OptimazeMojo` и `ShakeMojo` , in which caching can be applied,
-have directory of results and directory of cache.
-
-
The disadvantages of initial caching in EO include:
-* The compilation time and the time of saving to the cache must be equal, which can be challenging to verify.
-* Verification data is read from a file on disk, which is a long and expensive operation.
-* Each purpose uses its own classes and interfaces for data caching, making the code difficult to extend and read.
+* The cached file is actual if the compilation time and the time of saving to the cache are equal.
+* Verification data is read from a file on file system.
+* Each goal uses own classes and interfaces for data caching, making the code difficult to extend and read.
@@ -188,7 +183,7 @@ public class Cache {
```
-`List` is a list of validations. Validations implemented from the `CacheValidation` interface.
+`List` is a list of validations. Validations are implemented from the `CacheValidation` interface.
Different `Mojo` can use different validations.
@@ -202,22 +197,31 @@ public interface CacheValidation {
The classes `Path` and `Files` have methods to obtain the necessary information.
-3) The relevance of the cached data will be checked by the condition
-that the time of the last modification of the source file must be earlier than or equal to that saved in the cache.
+3) Searching for a cached data will use the following conditions:
+ * The source file and cached file should have same file name;
+ * Each saving cached file `Mojo` should have a cache directory and a result directory.
+ * The time of the last modification of the source file should be earlier or equal than cached file.
-These solutions will speed up compilation in the build system `Maven`.
-
-### Conclusion
There is an EO program `program.eo`, which is launched for the first time.
-At each `Mojo` stage, the execution results will be saved to the cache of the current `Mojo`.
+The cache of each `Mojo` will save the execution results.
If this program is run again, these `Mojo` will receive data from the cache,
without wasting time and computer resources on recompilation.
If we change something in the `program.eo` file, the program will have to be recompiled,
-since the last modification time the original file will be later than those stored in the cache.
+since the last modification time the source file will be later than the cached file.
As a result of `Mojo` work, the cache was overwritten.
+### Conclusion
+In this blog, we showed that `Maven` builds the EO code using the goals of the `eo-maven-plugin`.
+Since the Maven goals work in a strict order and linearly,
+we only need to check that the last modification time of the source files is not younger than the cached files.
+The cached file and the source file should have the same name
+(but not the same file format, for example - name.eo and name.xml).
+This condition is necessary so that you can quickly find the cached file in the file system.
+Each Mojo participating in caching should have its own cache directory.
+
+
diff --git a/images/ccache.svg b/images/defaultCPhase.svg
similarity index 100%
rename from images/ccache.svg
rename to images/defaultCPhase.svg
From 5c2fa3ea03237e7cb26dfb9ef96b9213cde3ed03 Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Mon, 18 Mar 2024 15:53:40 +0300
Subject: [PATCH 10/17] feat(#56):fix gradle and EO
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 102 +++++++++++-------
1 file changed, 63 insertions(+), 39 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index 20a7469..d0066b6 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -36,19 +36,18 @@ to imagine the build process in compiled languages:
-1) First, preprocessor gets the input files. The input files are source files (.cpp) and header files (.h).
-The result is a single edited file with human-readable code that the compiler will get.
+1) First, preprocessor gets the input files. The input files are source files `.cpp` and header files `.h`.
+The result is a single edited file `.cpp` with human-readable code that the compiler will get.
-2) The compiler receives the finished code file and converts it into machine code, presented in an object file.
+2) The compiler receives the edited code file `.cpp` and converts it into machine code, presented in an object file.
At the compilation stage, parsing occurs, which checks whether the code matches
rules of a specific programming language.
At the end, the compiler optimizes the resulting machine code and produces an object file.
To speed up compilation, different files of the same project are compiled in parallel.
-3) Then, the linker gets object files.
-Linker is a program that combines object files into an executable file or library.
-The result of the linker is an executable .exe file.
+3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) gets object files.
+The result of the linker is an executable `.exe` file.
As a result, in compiled languages, multiple files are simultaneously and independently converted
@@ -56,26 +55,37 @@ into machine code at the compilation stage.
This machine code is then combined into one executable file.
-`ccache` uses hashcode to find cached files. The hashcode includes information:
-file contents, directory, compiler information, compilation time, extensions
-used by the compiler. A compressed machine code file is placed in the cache using the received key.
+`ccache` hash algorithm, for the hashing of information to find cached files fast.
+The [`ccache` hash](https://ccache.dev/manual/4.8.2.html#_common_hashed_information)
+includes information:
+* the file contents
+* the current directory of the file
+* the name of the compiler
+* the compiler’s size and modification time
+* extensions used by the compiler.
+
+
+A compressed machine code file is placed in the cache using the received key.
`ccache` has two main caching methods:
-1) `Direct mode` - hashcode is generated based on the source code.
+1) `Direct mode` - hash is generated based on the source code.
`Direct mode` compiles the program faster, since the preprocessor step is skipped.
-However,the header files are not checked for changes, so the wrong project may be built.
-2) `Preprocessor mode` - hashcode is generated based on the result of preprocessor.
-`Preprocessor mode` is slower than `direct mode`, but the right project is built always.
+However,the header files are not checked for changes, so the project may be built with not verified header files.
+2) `Preprocessor mode` - hash is generated based on the result of preprocessor.
+`Preprocessor mode` is slower than `direct mode`, but the project is built with verified header files.
-`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in the cloud.
-And it also has fixed some bugs (for example, there is a check of header files, which makes direct mode more accurate).
+`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage.
+And `sccache` includes support for caching the compilation of C/C++ code,
+[Rust](https://github.com/mozilla/sccache/blob/main/docs/Rust.md), as well as NVIDIA's CUDA using
+[nvcc](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html),
+and [clang](https://llvm.org/docs/CompileCudaWithLLVM.html), while `ccache` works with C and C++ code.
### Maven
[Maven](https://maven.apache.org) automates and manages Java-project builds.
Building a project in `Maven` is completed in three
-maven [LifeCycles Maven](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
+[Maven LifeCycles](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
which consist of `phases`. `Phases` consist of sets of `goals`.
`Maven` has default `phases` and `goals` for building any projects:
@@ -107,25 +117,33 @@ task myTask {
}
}
```
-Before executing a task, `Gradle` makes a fingerprint of the path
-and contents of the source files and saves it.
-If the task completes successfully, `Gradle` also makes a fingerprint from the resulting files.
-To avoid re-fingerprinting the original files, `Gradle` checks the last modification time and the size of the original
-files before reassembling. This allows `Gradle` to use the results already obtained when the project is rebuilt.
+`Gradle` uses this information to determine if a task is up-to-date and needs to perform any work.
+
+How work `Incremental build`:
+1) Before executing a task, `Gradle` makes a [fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
+ of the path and contents of the source files and saves it.
+2) `Gradle` executes a task and saves a fingerprint of the path
+ and contents of the output files.
+3) Before each rebuilding of task, `Gradle` makes a fingerprint of the source files
+ and compares with a current fingerprint. A fingerprint is a current fingerprint,
+ if the last modification time and the size of the source files was not changed.
+ If none of the inputs or outputs have changed, Gradle can skip that task.
+
+
+
Additionally, `Gradle` stores fingerprints of previous builds enabling quick project builds,
for example when switching from one branch to another - known as the -
[Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).
-
### EO build cache
EO code uses the `Maven` build system to build.
For this purpose, the `eo-maven-plugin` plugin was created,
which contains the necessary goals for working with EO code.
As mentioned earlier, the build of projects in `Maven` occurs in a specific order of phases.
-In the diagram you can observe the main phases and their goals for the EO last version of the compiler:
+In the diagram you can observe the main phases and their goals for the EO:
@@ -154,6 +172,10 @@ which used the same methods.
The difference between interfaces is that `Footprint` checks the EO version of the compiler,
while the rest of the checks are exactly the same.
+In this chapter ` the source file` - is a file, which `Mojo` receives,
+`the cached file` - is a file with the result of executing `Mojo`,
+`the program file` - is initial EO program file (`program.eo`).
+
The disadvantages of initial caching in EO include:
* The cached file is actual if the compilation time and the time of saving to the cache are equal.
@@ -161,11 +183,12 @@ The disadvantages of initial caching in EO include:
* Each goal uses own classes and interfaces for data caching, making the code difficult to extend and read.
-
To address these disadvantages, the following solutions are proposed:
-
-
1) Create a new class `Cache` responsible for data verification, saving to cache and loading from cache.
+Since each `Mojo` that involves caching has directories for saving and caching results,
+we just need to create a class responsible for saving and loading their cache data.
+Each `Mojo` will have the own class `Cache` with own a list of validations.
+If all `Mojos` have the same validations, then one `Cache` is enough.
```
public class Cache {
@@ -182,11 +205,8 @@ public class Cache {
}
```
-
`List` is a list of validations. Validations are implemented from the `CacheValidation` interface.
-Different `Mojo` can use different validations.
-
-
+The `CacheValidation` interface has the only method, that must contain one test condition.
```
public interface CacheValidation {
boolean validate(final Path source, final Path cache) throws IOException;
@@ -194,22 +214,26 @@ public interface CacheValidation {
```
2) To avoid reading from disk, we will use file paths `Path`.
-The classes `Path` and `Files` have methods to obtain the necessary information.
+The classes `Path` and `Files` have methods to obtain the necessary information - the file name
+and the time of the last modification. The file name is necessary to find the cached file in the directory.
+The time of the last modification is necessary to check the source file is older(or equal) than the cached file.
+These conditions should be enough for us, since the build of projects in Maven is linear.
3) Searching for a cached data will use the following conditions:
* The source file and cached file should have same file name;
- * Each saving cached file `Mojo` should have a cache directory and a result directory.
+ * Each `Mojo`, involved caching, should have a cache directory and a directory of result files.
+ The directory of result files is directory of source files for next `Mojo`.
* The time of the last modification of the source file should be earlier or equal than cached file.
-There is an EO program `program.eo`, which is launched for the first time.
-The cache of each `Mojo` will save the execution results.
-If this program is run again, these `Mojo` will receive data from the cache,
-without wasting time and computer resources on recompilation.
-If we change something in the `program.eo` file, the program will have to be recompiled,
-since the last modification time the source file will be later than the cached file.
-As a result of `Mojo` work, the cache was overwritten.
+Example: there is an EO program `program.eo`, which is launched for the first time;
+the cache of each `Mojo` will save the execution results in cache directory and result directory;
+when this program is run again, these `Mojo` will receive data from the cache,
+without executing of task and rewriting of result.
+If we change something in the `program.eo` file or the source files,
+the program or will have to be executed again or the execution result of `Mojo` was overwritten.
+This way the program will be protected from artificial changes during the build process.
### Conclusion
From ad6e8eb0a85d742e67315650e8dfdff319d9fa56 Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Tue, 19 Mar 2024 00:20:53 +0300
Subject: [PATCH 11/17] feat(#56):fix about ccache
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 54 ++++++++-----------
1 file changed, 21 insertions(+), 33 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index d0066b6..55a3821 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -29,33 +29,24 @@ In compiled programming languages, building a project containing many source cod
This time is spent on loading of libraries, preparing, optimizing, checking the code, and so on.
To speed up the assembly of compiled languages, [ccache](https://ccache.dev)
and [sccache](https://github.com/mozilla/sccache) are used.
-Let's look at the assembly scheme using C++ as an example
-to imagine the build process in compiled languages:
+Let's look at the assembly scheme using C++ as an example:
-1) First, preprocessor gets the input files. The input files are source files `.cpp` and header files `.h`.
-The result is a single edited file `.cpp` with human-readable code that the compiler will get.
-
-
-2) The compiler receives the edited code file `.cpp` and converts it into machine code, presented in an object file.
-At the compilation stage, parsing occurs, which checks whether the code matches
-rules of a specific programming language.
+1) First, preprocessor retrieves the source code files,
+which consist of both source files `.cpp` and header files `.h`.
+The result is a single file `.cpp` with human-readable code that the compiler will get.
+2) The compiler receives the edited code file `.cpp` and converts it into object file - `.obj`.
+At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
At the end, the compiler optimizes the resulting machine code and produces an object file.
-To speed up compilation, different files of the same project are compiled in parallel.
-
+To speed up compilation, different files of the same project might be compiled in parallel.
3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) gets object files.
The result of the linker is an executable `.exe` file.
-
-As a result, in compiled languages, multiple files are simultaneously and independently converted
-into machine code at the compilation stage.
-This machine code is then combined into one executable file.
-
-
-`ccache` hash algorithm, for the hashing of information to find cached files fast.
+
+`ccache` has hash algorithm, for the hashing of information to find cached files fast.
The [`ccache` hash](https://ccache.dev/manual/4.8.2.html#_common_hashed_information)
includes information:
* the file contents
@@ -70,16 +61,13 @@ A compressed machine code file is placed in the cache using the received key.
`ccache` has two main caching methods:
1) `Direct mode` - hash is generated based on the source code.
-`Direct mode` compiles the program faster, since the preprocessor step is skipped.
-However,the header files are not checked for changes, so the project may be built with not verified header files.
-2) `Preprocessor mode` - hash is generated based on the result of preprocessor.
+`Direct mode` allows to build the program faster, since the preprocessor step is skipped.
+However, header files are not checked for changes, so the project may be built with not verified header files.
+2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step.
`Preprocessor mode` is slower than `direct mode`, but the project is built with verified header files.
`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage.
-And `sccache` includes support for caching the compilation of C/C++ code,
-[Rust](https://github.com/mozilla/sccache/blob/main/docs/Rust.md), as well as NVIDIA's CUDA using
-[nvcc](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html),
-and [clang](https://llvm.org/docs/CompileCudaWithLLVM.html), while `ccache` works with C and C++ code.
+And `sccache` supports a wider range of languages, while ccache focuses on caching C and C++ compiler.
### Maven
@@ -95,14 +83,14 @@ which consist of `phases`. `Phases` consist of sets of `goals`.
In `Maven` all phases and goals are executed strictly in order, linearly.
-But in `Maven` there is no build-time caching as such.
+`Maven` uses added extensions from Gradle for caching.
`Maven` suggests rebuilding only changed project modules to speed up the build process.
### Gradle
But unlike `Maven`, [Gradle](https://gradle.org) builds projects using a task graph -
[Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph),
in which some tasks can be executed synchronously.
-To speed up project builds, `Gradle` employs incremental builds
+To speed up project builds, `Gradle` employs
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work).
For an incremental build to work, the tasks used to build the project must have specified
source and output files.
@@ -120,13 +108,13 @@ task myTask {
`Gradle` uses this information to determine if a task is up-to-date and needs to perform any work.
How work `Incremental build`:
-1) Before executing a task, `Gradle` makes a [fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
+1) Before executing a task, `Gradle` creates a [fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
of the path and contents of the source files and saves it.
-2) `Gradle` executes a task and saves a fingerprint of the path
- and contents of the output files.
-3) Before each rebuilding of task, `Gradle` makes a fingerprint of the source files
- and compares with a current fingerprint. A fingerprint is a current fingerprint,
- if the last modification time and the size of the source files was not changed.
+2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files.
+3) Before each rebuilding of the task, `Gradle` generates a fingerprint of the source files
+ and compares it with the current fingerprint.
+ The fingerprint is considered current if the last modification time
+ and the size of the source files have not changed.
If none of the inputs or outputs have changed, Gradle can skip that task.
From 9e6a73624414b68d4b2dc856bab6a984ee39b0bb Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Mon, 25 Mar 2024 14:39:09 +0300
Subject: [PATCH 12/17] feat(#56):fix text mvn, gradle and eo
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 238 ++++++++++--------
1 file changed, 136 insertions(+), 102 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index 55a3821..7fab3d9 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -11,24 +11,23 @@ In [EO](https://github.com/objectionary/eo), caching is used to speed up program
Recently we found a caching
[bug](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
for EO version `0.34.0`. The bug occurred because the old verification method
-contains a comparison of the compilation time and caching time to search for the cached file.
+used compilation time and caching time to search for a cached file.
This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
We came to the conclusion that we need caching with a reliable verification method.
-And this verification method should not use the information that the cached file contains.
+Furthermore, this verification method should refrain from reading the file content.
-The goal of this blog is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
-and to implement effective caching in [EO](https://github.com/objectionary/eo).
+The goal is to implement effective caching in EO.
+To achieve the goal, we will briefly look at how frequently used build systems (such as ccache, Maven, Gradle)
+in order to gain a deeper understanding of the caching concepts employed within them and to development caching in EO.
## Build caching of existing build systems
### ccache/sccache
-In compiled programming languages, building a project containing many source code files takes a long time.
+In compiled programming languages, building a project with many source code files takes a long time.
This time is spent on loading of libraries, preparing, optimizing, checking the code, and so on.
-To speed up the assembly of compiled languages, [ccache](https://ccache.dev)
-and [sccache](https://github.com/mozilla/sccache) are used.
Let's look at the assembly scheme using C++ as an example:
@@ -38,62 +37,49 @@ Let's look at the assembly scheme using C++ as an example:
1) First, preprocessor retrieves the source code files,
which consist of both source files `.cpp` and header files `.h`.
The result is a single file `.cpp` with human-readable code that the compiler will get.
-2) The compiler receives the edited code file `.cpp` and converts it into object file - `.obj`.
+2) The compiler receives the file `.cpp` from the preprocessor and converts it into object file - `.obj`.
At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
At the end, the compiler optimizes the resulting machine code and produces an object file.
To speed up compilation, different files of the same project might be compiled in parallel.
3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) gets object files.
The result of the linker is an executable `.exe` file.
-
-`ccache` has hash algorithm, for the hashing of information to find cached files fast.
-The [`ccache` hash](https://ccache.dev/manual/4.8.2.html#_common_hashed_information)
-includes information:
+
+To speed up the build of compiled languages, [ccache](https://ccache.dev)
+and [sccache](https://github.com/mozilla/sccache) are used.
+`ccache` uses the hash algorithm for the hashing of code at certain stages of the build.
+`ccache` uses the hash to save a code in the cache.
+The [`ccache` hash](https://ccache.dev/manual/4.8.2.html#_common_hashed_information) is
+based on:
* the file contents
* the current directory of the file
* the name of the compiler
* the compiler’s size and modification time
-* extensions used by the compiler.
-
+* extensions used by the compiler.
-A compressed machine code file is placed in the cache using the received key.
-
-
-`ccache` has two main caching methods:
-1) `Direct mode` - hash is generated based on the source code.
-`Direct mode` allows to build the program faster, since the preprocessor step is skipped.
-However, header files are not checked for changes, so the project may be built with not verified header files.
+Moreover, `ccache` has two types of the hashing:
+1) `Direct mode` - the hash is generated based on the source code only.
+This mode allows to build the program faster, since the preprocessor step is skipped.
+When using this mode, the user must be sure that the external libraries, using in a project, have not changed.
+Otherwise, the project will build with errors.
2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step.
-`Preprocessor mode` is slower than `direct mode`, but the project is built with verified header files.
+`Preprocessor mode` is slower than `direct mode`, but the project is built without errors.
`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage.
-And `sccache` supports a wider range of languages, while ccache focuses on caching C and C++ compiler.
-
-
-### Maven
-[Maven](https://maven.apache.org) automates and manages Java-project builds.
-Building a project in `Maven` is completed in three
-[Maven LifeCycles](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
-which consist of `phases`. `Phases` consist of sets of `goals`.
-
-`Maven` has default `phases` and `goals` for building any projects:
-
-
-
-
+And `sccache` supports a wider range of languages, while `ccache` focuses on caching C and C++ compiler.
-In `Maven` all phases and goals are executed strictly in order, linearly.
-`Maven` uses added extensions from Gradle for caching.
-`Maven` suggests rebuilding only changed project modules to speed up the build process.
### Gradle
-But unlike `Maven`, [Gradle](https://gradle.org) builds projects using a task graph -
-[Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph),
-in which some tasks can be executed synchronously.
-To speed up project builds, `Gradle` employs
-[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work).
+[Gradle](https://gradle.org) builds projects using a
+[task graph](https://docs.gradle.org/current/userguide/build_lifecycle.html) that allows for synchronous execution
+of certain tasks.
+`Gradle` employs
+[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work),
+to speed up project builds.
For an incremental build to work, the tasks used to build the project must have specified
source and output files.
+The provided code snippet demonstrates the implementation of a custom task in Gradle,
+showcasing how inputs and outputs are specified to enable `Incremental build`:
```
task myTask {
inputs.dir 'src/main/java/MyTask.somebody' // Specify the input directory
@@ -107,38 +93,76 @@ task myTask {
```
`Gradle` uses this information to determine if a task is up-to-date and needs to perform any work.
-How work `Incremental build`:
-1) Before executing a task, `Gradle` creates a [fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
+To understand how `Incremental build` works, consider the following steps:
+1) Before executing a task for the first time, `Gradle` takes a
+ [fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
of the path and contents of the source files and saves it.
2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files.
-3) Before each rebuilding of the task, `Gradle` generates a fingerprint of the source files
+3) Before each rebuilding of the task, `Gradle` generates a new fingerprint of the source files
and compares it with the current fingerprint.
The fingerprint is considered current if the last modification time
and the size of the source files have not changed.
If none of the inputs or outputs have changed, Gradle can skip that task.
+In addition to `Incremental build`, `Gradle` also stores fingerprints of previous builds, enabling quick project builds,
+for example when switching from one branch to another. This feature is known as
+the [Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).
+
+
+### Maven
+[Maven](https://maven.apache.org) automates and manages Java-project builds.
+`Maven` is based on the concept of
+[Maven LifeCycles](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
+which include default, clean, and site lifecycles.
+Each lifecycle consists of `phases` and these `phases` consist of sets of `goals`.
+
+In Maven, there are default phases and goals for building any projects:
-Additionally, `Gradle` stores fingerprints of previous builds enabling quick project builds,
-for example when switching from one branch to another - known as the -
-[Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).
+
+
+
+By default, the `phases` in Maven are inherently connected within the build lifecycle.
+Each `phase` represents a specific task, and the execution order of `goals` within `phases` is determined
+by the default Maven lifecycle bindings. This means that while each `phase` operates as a series of individual tasks,
+they are part of a cohesive build lifecycle, and their execution order is predefined by Maven.
+
+
+`Maven` supports `Incremental build` through plugins the `takari-lifecycle-plugin` and
+`maven-build-cache-extension`.
+The [takari-lifecycle-plugin](http://takari.io/book/40-lifecycle.html) is an alternative to the default Maven lifecycle
+(building JAR files). Its distinctive feature is the use of a single universal plugin with the same functionality
+as five separate plugins for the standard lifecycle, but with significantly fewer dependencies. As a result,
+it provides a much faster startup, more optimal operation, and lower resource consumption.
+This leads to a significant increase in performance when compiling complex projects with a large number of modules.
+
+The [maven-build-cache-extension](https://maven.apache.org/extensions/maven-build-cache-extension/)
+is used for large Maven projects that have a significant number of small modules.
+This plugin takes a key for a project module, it encapsulates the essential aspects of the module,
+including the source code and the configuration of the plugins used within it.
+Projects with the same key are considered current (unchanged) and can be efficiently restored from the cache.
+Conversely, projects that generate different keys are deemed outdated (changed),
+prompting the cache to initiate a complete rebuild for them. In the event of a cache miss,
+where an outdated project requires a complete rebuild,
+the cache seamlessly delegates the build work to the standard Maven core,
+without interfering with the build execution logic.
+This ensures that only the changed modules within the project are rebuilt,
+minimizing unnecessary overhead and optimizing the build process.
### EO build cache
-EO code uses the `Maven` build system to build.
-For this purpose, the `eo-maven-plugin` plugin was created,
-which contains the necessary goals for working with EO code.
-As mentioned earlier, the build of projects in `Maven` occurs in a specific order of phases.
-In the diagram you can observe the main phases and their goals for the EO:
+The EO code uses the `Maven` for building projects.
+For this purpose, there is the `eo-maven-plugin` containing the essential goals for working with EO code.
+As previously mentioned, the build of projects in Maven follows a specific order of phases.
+Below is a diagram illustrating the main phases and their corresponding goals for the EO:
-In [Picture 3](/images/EO.svg) the goals from the `eo-maven-plugin`
-are highlighted in green.
+In [Picture 3](/images/EO.svg) the goals of the `eo-maven-plugin` are highlighted in green.
However, the actual work with EO code takes place in `AssembleMojo`.
@@ -150,33 +174,42 @@ However, the actual work with EO code takes place in `AssembleMojo`.
-Each goal in `AssembleMojo` is a specific compilation step for EO code, and we need to use
-caching at each step to speed up the build of the EO program.
-
+Each goal within `AssembleMojo` is a distinct compilation step for EO code.
+These tasks happen one after the other, and each task relies on the output of the one before it.
+To speed up the EO program rebuild process, it is helpful to save the results of each goal.
+This avoids repeating actions and makes the compilation more efficient.
+Using caching methods significantly speeds up the build process.
-In EO version `0.34.0`,
-caching used unrelated `Footprint` and `Optimization` interfaces for different `Mojo`,
-which used the same methods.
-The difference between interfaces is that `Footprint` checks the EO version of the compiler,
-while the rest of the checks are exactly the same.
-In this chapter ` the source file` - is a file, which `Mojo` receives,
-`the cached file` - is a file with the result of executing `Mojo`,
-`the program file` - is initial EO program file (`program.eo`).
+In this chapter, we introduce the keywords:
+* `the source file`: This file serves as the input for goal operations.
+* `the cached file`: This file contains the results of goal's execution.
-The disadvantages of initial caching in EO include:
-* The cached file is actual if the compilation time and the time of saving to the cache are equal.
-* Verification data is read from a file on file system.
-* Each goal uses own classes and interfaces for data caching, making the code difficult to extend and read.
+The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`,
+both of which derive from the `SafeMojo` class.
+These caching interfaces shared similar logic, but with minor differences.
+For instance, `Footprint` verifies the EO version of the compiler, whereas the remaining checks are identical.
+Additionally, the conditions for searching data in the cache had errors.
+The cached file is considered valid if the end time of goal's execution
+and the time of saving goal's result to the cache are equal.
+Due to this issue, the program behaved incorrectly, because saving the goal's result to the cache is not instantaneous.
+After conducting an in-depth analysis of the project's incorrect operation,
+several disadvantages of the previous caching mechanism in EO were brought to light:
+* Incorrect search conditions for data in the cache.
+* The verification method requires reading the file content, which results in inefficiencies.
+* The presence of multiple caching mechanisms creates challenges in identifying and rectifying caching errors.
+* Employing multiple caching mechanisms for similar entities is a suboptimal practice,
+leading to redundancy and complicating the caching infrastructure.
To address these disadvantages, the following solutions are proposed:
-1) Create a new class `Cache` responsible for data verification, saving to cache and loading from cache.
-Since each `Mojo` that involves caching has directories for saving and caching results,
-we just need to create a class responsible for saving and loading their cache data.
-Each `Mojo` will have the own class `Cache` with own a list of validations.
-If all `Mojos` have the same validations, then one `Cache` is enough.
+1) Creating a unified caching mechanism for all goals associated in EO code compilation.
+This mechanism, represented by the `Cache` class, will assume responsibility for data validation,
+cache storage, and retrieval.
+To improve the flexibility for different data verification conditions,
+the constructor of the `Cache` class will accept a list of validations.
+Here's the corresponding code:
```
public class Cache {
@@ -193,47 +226,48 @@ public class Cache {
}
```
-`List` is a list of validations. Validations are implemented from the `CacheValidation` interface.
-The `CacheValidation` interface has the only method, that must contain one test condition.
+The `List` represents a list of validations implemented from the `CacheValidation` interface.
+This interface defines the structure for validations within the `Cache` class.
+The `CacheValidation` interface has the only method ensuring that each validation contains a specific test condition.
+
```
public interface CacheValidation {
boolean validate(final Path source, final Path cache) throws IOException;
}
```
-2) To avoid reading from disk, we will use file paths `Path`.
-The classes `Path` and `Files` have methods to obtain the necessary information - the file name
-and the time of the last modification. The file name is necessary to find the cached file in the directory.
-The time of the last modification is necessary to check the source file is older(or equal) than the cached file.
-These conditions should be enough for us, since the build of projects in Maven is linear.
+2) In order to minimize disk access, we will utilize file paths represented by the `Path` class.
+By leveraging methods provided by the `Path` and `Files` classes,
+we can obtain essential information such as the file name and the time of the last modification.
+The file name plays a crucial role in locating the cached file within the directory,
+while the time of the last modification enables us to determine whether
+the source file is older or equal in age to the cached file.
+Given that the project build process in Maven is linear,
+these conditions are deemed sufficient for our caching mechanism.
3) Searching for a cached data will use the following conditions:
- * The source file and cached file should have same file name;
- * Each `Mojo`, involved caching, should have a cache directory and a directory of result files.
- The directory of result files is directory of source files for next `Mojo`.
+ * `The source file` and `the cached file` should have same file name;
+ * Each goal involved caching should have both a cache directory and a directory of result files.
+ The directory of result files corresponds to the directory of source files for the subsequent goal.
* The time of the last modification of the source file should be earlier or equal than cached file.
-Example: there is an EO program `program.eo`, which is launched for the first time;
-the cache of each `Mojo` will save the execution results in cache directory and result directory;
-when this program is run again, these `Mojo` will receive data from the cache,
+Example: Let's consider an EO program named `program.eo`, which is executed for the first time.
+The cache of each goal will save the execution results in the cache directory and the result directory.
+When this program is run again without changes, these goal will receive data from the cache,
without executing of task and rewriting of result.
-If we change something in the `program.eo` file or the source files,
-the program or will have to be executed again or the execution result of `Mojo` was overwritten.
-This way the program will be protected from artificial changes during the build process.
+However, if we make changes to the `program.eo` file or `the source files` of the goals and execute again,
+the execution result of goals was overwritten in the directory of result files and the cache directory.
+This approach effectively protects the program from artificial changes during the build process.
### Conclusion
-In this blog, we showed that `Maven` builds the EO code using the goals of the `eo-maven-plugin`.
-Since the Maven goals work in a strict order and linearly,
-we only need to check that the last modification time of the source files is not younger than the cached files.
-The cached file and the source file should have the same name
-(but not the same file format, for example - name.eo and name.xml).
-This condition is necessary so that you can quickly find the cached file in the file system.
-Each Mojo participating in caching should have its own cache directory.
-
-
+In this article, we explored various build systems and their caching methods.
+We were motivated to find an efficient caching approach for EO due to issues discovered during bug investigation.
+The previous caching mechanism was flawed logically and architecturally, making it ineffective.
+As a result, we discussed the problems, suggest solutions, and outline the criteria
+for implementing a new caching system in EO.
From daab2ce7a18872b9c36950e758bd08e61bf26a98 Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Thu, 28 Mar 2024 17:03:44 +0300
Subject: [PATCH 13/17] feat(#56):change maven caching and eo caching
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 172 +++++++-----------
1 file changed, 64 insertions(+), 108 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index 7fab3d9..2e14657 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -18,8 +18,8 @@ We came to the conclusion that we need caching with a reliable verification meth
Furthermore, this verification method should refrain from reading the file content.
The goal is to implement effective caching in EO.
-To achieve the goal, we will briefly look at how frequently used build systems (such as ccache, Maven, Gradle)
-in order to gain a deeper understanding of the caching concepts employed within them and to development caching in EO.
+To achieve the goal, we will briefly look at how well-known used build systems (such as ccache, Maven, Gradle)
+in order to gain a deeper understanding of the caching concepts employed within them.
@@ -37,7 +37,7 @@ Let's look at the assembly scheme using C++ as an example:
1) First, preprocessor retrieves the source code files,
which consist of both source files `.cpp` and header files `.h`.
The result is a single file `.cpp` with human-readable code that the compiler will get.
-2) The compiler receives the file `.cpp` from the preprocessor and converts it into object file - `.obj`.
+2) The compiler receives the file `.cpp` from the preprocessor and compiles it into an object file - `.obj`.
At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
At the end, the compiler optimizes the resulting machine code and produces an object file.
To speed up compilation, different files of the same project might be compiled in parallel.
@@ -48,7 +48,11 @@ The result of the linker is an executable `.exe` file.
To speed up the build of compiled languages, [ccache](https://ccache.dev)
and [sccache](https://github.com/mozilla/sccache) are used.
`ccache` uses the hash algorithm for the hashing of code at certain stages of the build.
-`ccache` uses the hash to save a code in the cache.
+`ccache` uses the hash to save a code in the cache.
+When compiling a file, its hash is calculated.
+If the file is already present in the registry of compiled files, the file will not be compiled again.
+Instead, the previously compiled binary file will be utilized.
+This approach can significantly accelerate the build process of certain packages, reducing build times by 5-10 times.
The [`ccache` hash](https://ccache.dev/manual/4.8.2.html#_common_hashed_information) is
based on:
* the file contents
@@ -58,14 +62,14 @@ based on:
* extensions used by the compiler.
Moreover, `ccache` has two types of the hashing:
-1) `Direct mode` - the hash is generated based on the source code only.
-This mode allows to build the program faster, since the preprocessor step is skipped.
-When using this mode, the user must be sure that the external libraries, using in a project, have not changed.
-Otherwise, the project will build with errors.
+1) `Direct mode` - the hash is generated based on the source code only.
+When using this mode, the user must ensure that the external libraries used in a project have not changed.
+Otherwise, the project will fail to build, resulting in errors.
2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step.
-`Preprocessor mode` is slower than `direct mode`, but the project is built without errors.
-`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage.
+
+`Sccache` is similar in purpose to `ccache` but provides more functionality.
+`Sccache` allows to store cached files not only locally, but also in a cloud data storage.
And `sccache` supports a wider range of languages, while `ccache` focuses on caching C and C++ compiler.
@@ -77,32 +81,32 @@ of certain tasks.
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work),
to speed up project builds.
For an incremental build to work, the tasks used to build the project must have specified
-source and output files.
+input and output files.
The provided code snippet demonstrates the implementation of a custom task in Gradle,
showcasing how inputs and outputs are specified to enable `Incremental build`:
```
task myTask {
- inputs.dir 'src/main/java/MyTask.somebody' // Specify the input directory
- outputs.dir 'build/classes/java/main/MyTask.somebody' // Specify the output directory
-
+ inputs.file 'src/main/java/MyTask.somebody' // Specify the input file
+ outputs.file 'build/classes/java/main/MyTask.somebody' // Specify the output file
+
doLast {
// Task actions go here
// This code will only be executed if the inputs or outputs have changed
}
}
```
-`Gradle` uses this information to determine if a task is up-to-date and needs to perform any work.
+
To understand how `Incremental build` works, consider the following steps:
-1) Before executing a task for the first time, `Gradle` takes a
+1) Before executing a task, `Gradle` takes a
[fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
- of the path and contents of the source files and saves it.
+ of the path and contents of the inputs files and saves it.
2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files.
-3) Before each rebuilding of the task, `Gradle` generates a new fingerprint of the source files
- and compares it with the current fingerprint.
+3) Then, when Gradle starts a project build again, it generates a new fingerprint for the same files.
+ If the new fingerprint has not changed, Gradle can safely skip this task.
+ In the opposite case, the task needs to perform an action and to rewrite outputs.
The fingerprint is considered current if the last modification time
- and the size of the source files have not changed.
- If none of the inputs or outputs have changed, Gradle can skip that task.
+ and the size of the source files have not changed.
In addition to `Incremental build`, `Gradle` also stores fingerprints of previous builds, enabling quick project builds,
@@ -125,30 +129,27 @@ In Maven, there are default phases and goals for building any projects:
By default, the `phases` in Maven are inherently connected within the build lifecycle.
Each `phase` represents a specific task, and the execution order of `goals` within `phases` is determined
-by the default Maven lifecycle bindings. This means that while each `phase` operates as a series of individual tasks,
-they are part of a cohesive build lifecycle, and their execution order is predefined by Maven.
+by the default Maven lifecycle bindings. This means that while each `phase` operates as a series of individual tasks
+and their execution order is predefined by Maven.
-`Maven` supports `Incremental build` through plugins the `takari-lifecycle-plugin` and
-`maven-build-cache-extension`.
-The [takari-lifecycle-plugin](http://takari.io/book/40-lifecycle.html) is an alternative to the default Maven lifecycle
+`Maven` utilizes caching mechanisms through the `takari-lifecycle-plugin` and `maven-build-cache-extension`:
+* The [takari-lifecycle-plugin](http://takari.io/book/40-lifecycle.html) is an alternative to the default Maven lifecycle
(building JAR files). Its distinctive feature is the use of a single universal plugin with the same functionality
as five separate plugins for the standard lifecycle, but with significantly fewer dependencies. As a result,
it provides a much faster startup, more optimal operation, and lower resource consumption.
This leads to a significant increase in performance when compiling complex projects with a large number of modules.
-The [maven-build-cache-extension](https://maven.apache.org/extensions/maven-build-cache-extension/)
-is used for large Maven projects that have a significant number of small modules.
-This plugin takes a key for a project module, it encapsulates the essential aspects of the module,
+* The [maven-build-cache-extension](https://maven.apache.org/extensions/maven-build-cache-extension/)
+is used for large Maven projects that have a significant number of small `modules`.
+A `module` refers to a subproject within a larger project.
+Each `module` has its own `pom.xm` file, and there is an aggregator `pom.xml` that consolidates all the `modules`.
+This plugin takes a key for a `module`, it encapsulates the essential aspects of the `module`,
including the source code and the configuration of the plugins used within it.
-Projects with the same key are considered current (unchanged) and can be efficiently restored from the cache.
-Conversely, projects that generate different keys are deemed outdated (changed),
-prompting the cache to initiate a complete rebuild for them. In the event of a cache miss,
-where an outdated project requires a complete rebuild,
-the cache seamlessly delegates the build work to the standard Maven core,
+`Modules` with the same key are current or unchanged and the cache can efficiently restore them.
+Conversely, the cache seamlessly delegates the build work to the standard Maven core,
without interfering with the build execution logic.
-This ensures that only the changed modules within the project are rebuilt,
-minimizing unnecessary overhead and optimizing the build process.
+`maven-build-cache-extension` ensures that only the changed `modules` within the project will rebuild.
### EO build cache
@@ -176,18 +177,11 @@ However, the actual work with EO code takes place in `AssembleMojo`.
Each goal within `AssembleMojo` is a distinct compilation step for EO code.
These tasks happen one after the other, and each task relies on the output of the one before it.
-To speed up the EO program rebuild process, it is helpful to save the results of each goal.
-This avoids repeating actions and makes the compilation more efficient.
-Using caching methods significantly speeds up the build process.
+Each task has directories for input and output data, as well as a directory for storing cached data.
+Using the program name, each task can receive and store data.
-In this chapter, we introduce the keywords:
-* `the source file`: This file serves as the input for goal operations.
-* `the cached file`: This file contains the results of goal's execution.
-
-
-The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`,
-both of which derive from the `SafeMojo` class.
+The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`.
These caching interfaces shared similar logic, but with minor differences.
For instance, `Footprint` verifies the EO version of the compiler, whereas the remaining checks are identical.
Additionally, the conditions for searching data in the cache had errors.
@@ -203,71 +197,33 @@ several disadvantages of the previous caching mechanism in EO were brought to li
leading to redundancy and complicating the caching infrastructure.
-To address these disadvantages, the following solutions are proposed:
-1) Creating a unified caching mechanism for all goals associated in EO code compilation.
-This mechanism, represented by the `Cache` class, will assume responsibility for data validation,
-cache storage, and retrieval.
-To improve the flexibility for different data verification conditions,
-the constructor of the `Cache` class will accept a list of validations.
-Here's the corresponding code:
-
-```
-public class Cache {
-
- private List validations;
-
- public Cache(final List cv) {
- this.validations = cv;
- }
-
- public Optional load(final Path source, final Path cache) {...};
-
- public void save(final Path cache, final Scalar program, final Path relative) {...};
-}
-```
-
-The `List` represents a list of validations implemented from the `CacheValidation` interface.
-This interface defines the structure for validations within the `Cache` class.
-The `CacheValidation` interface has the only method ensuring that each validation contains a specific test condition.
-
-```
-public interface CacheValidation {
- boolean validate(final Path source, final Path cache) throws IOException;
-}
-```
-
-2) In order to minimize disk access, we will utilize file paths represented by the `Path` class.
-By leveraging methods provided by the `Path` and `Files` classes,
-we can obtain essential information such as the file name and the time of the last modification.
-The file name plays a crucial role in locating the cached file within the directory,
-while the time of the last modification enables us to determine whether
-the source file is older or equal in age to the cached file.
-Given that the project build process in Maven is linear,
-these conditions are deemed sufficient for our caching mechanism.
-
-
-3) Searching for a cached data will use the following conditions:
- * `The source file` and `the cached file` should have same file name;
- * Each goal involved caching should have both a cache directory and a directory of result files.
- The directory of result files corresponds to the directory of source files for the subsequent goal.
- * The time of the last modification of the source file should be earlier or equal than cached file.
-
-
-Example: Let's consider an EO program named `program.eo`, which is executed for the first time.
-The cache of each goal will save the execution results in the cache directory and the result directory.
-When this program is run again without changes, these goal will receive data from the cache,
-without executing of task and rewriting of result.
-However, if we make changes to the `program.eo` file or `the source files` of the goals and execute again,
-the execution result of goals was overwritten in the directory of result files and the cache directory.
-This approach effectively protects the program from artificial changes during the build process.
+To address caching challenges in EO, we closely examined existing caching systems.
+Maven's caching mechanisms operate at the level of `phases` and individual project modules.
+However, we require a caching mechanism at the level of `goals`.
+Consequently, the existing caching systems in Maven do not align with our requirements for resolving present issues.
+Furthermore, we cannot use the `ccache` as the basis for creating caching in EO because `ccache` is a high-level tool
+and cannot work with individual compilation tasks.
+The concept of `Gradle Incremental build` bears resemblance to a tool that is essential for our purposes.
+It has the capability to manage separate compilation tasks based on inputs and outputs.
+However, an incremental build in Gradle may be redundant for the EO.
+In contrast to other programming languages, EO currently lacks pre-existing libraries that can be integrated
+into the project.
+Consequently, there is no need to generate a fingerprint for each task's data.
+Instead, it suffices to verify the last modification time of the files involved in EO compilation.
+The modification time of the preceding task must not exceed that of the subsequent one.
+As each task possesses directories for input and output data, accessing the desired file
+via an absolute path enables retrieval of essential information, as file name and last modified time,
+from the file attributes without reading the file context.
### Conclusion
-In this article, we explored various build systems and their caching methods.
-We were motivated to find an efficient caching approach for EO due to issues discovered during bug investigation.
-The previous caching mechanism was flawed logically and architecturally, making it ineffective.
-As a result, we discussed the problems, suggest solutions, and outline the criteria
-for implementing a new caching system in EO.
+Summarizing the work completed, we can outline the following:
+1) We highlighted the main problems of the current caching mechanism in EO.
+The problems stem from both the logic of the code and its architecture.
+2) We examined existing project build systems and realized that the EO language is much simpler
+than existing programming languages.
+This realization led us to conclude that existing caching methods are redundant for EO.
+3) We proposed ideas to solve problems in the current caching implementation.
From 25396af6900edc20b82de7a6296c374fe5d8df2c Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Tue, 2 Apr 2024 17:36:50 +0300
Subject: [PATCH 14/17] feat(#56):added conclusion after each build system,
deleted conclusion
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 78 +++++++++----------
1 file changed, 35 insertions(+), 43 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index 2e14657..8ff6800 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -9,7 +9,7 @@ author: Alekseeva Yana
## Introduction
In [EO](https://github.com/objectionary/eo), caching is used to speed up program compilation.
Recently we found a caching
-[bug](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
+[bug](https://github.com/objectionary/eo/issues/2790) between goals in `eo-maven-plugin`
for EO version `0.34.0`. The bug occurred because the old verification method
used compilation time and caching time to search for a cached file.
This is not the most reliable verification method,
@@ -23,7 +23,7 @@ in order to gain a deeper understanding of the caching concepts employed within
-## Build caching of existing build systems
+## Caching in Other Build Systems
### ccache/sccache
In compiled programming languages, building a project with many source code files takes a long time.
@@ -41,14 +41,13 @@ The result is a single file `.cpp` with human-readable code that the compiler wi
At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
At the end, the compiler optimizes the resulting machine code and produces an object file.
To speed up compilation, different files of the same project might be compiled in parallel.
-3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) gets object files.
-The result of the linker is an executable `.exe` file.
+3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) combines object files
+into an executable `.exe` file.
To speed up the build of compiled languages, [ccache](https://ccache.dev)
and [sccache](https://github.com/mozilla/sccache) are used.
`ccache` uses the hash algorithm for the hashing of code at certain stages of the build.
-`ccache` uses the hash to save a code in the cache.
When compiling a file, its hash is calculated.
If the file is already present in the registry of compiled files, the file will not be compiled again.
Instead, the previously compiled binary file will be utilized.
@@ -64,7 +63,7 @@ based on:
Moreover, `ccache` has two types of the hashing:
1) `Direct mode` - the hash is generated based on the source code only.
When using this mode, the user must ensure that the external libraries used in a project have not changed.
-Otherwise, the project will fail to build, resulting in errors.
+Otherwise, the project might fail to build, resulting in errors.
2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step.
@@ -73,6 +72,11 @@ Otherwise, the project will fail to build, resulting in errors.
And `sccache` supports a wider range of languages, while `ccache` focuses on caching C and C++ compiler.
+`ccache` is a high-level tool and cannot work with individual compilation tasks,
+therefore `ccache` is not suitable for solving our problems.
+However, the concept of non-local data storage could potentially be incorporated during the development of the EO.
+
+
### Gradle
[Gradle](https://gradle.org) builds projects using a
[task graph](https://docs.gradle.org/current/userguide/build_lifecycle.html) that allows for synchronous execution
@@ -98,22 +102,27 @@ task myTask {
To understand how `Incremental build` works, consider the following steps:
-1) Before executing a task, `Gradle` takes a
- [fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
- of the path and contents of the inputs files and saves it.
-2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files.
-3) Then, when Gradle starts a project build again, it generates a new fingerprint for the same files.
- If the new fingerprint has not changed, Gradle can safely skip this task.
- In the opposite case, the task needs to perform an action and to rewrite outputs.
- The fingerprint is considered current if the last modification time
+1) Before executing a task, `Gradle` takes a hash of the path and contents of the inputs files and saves it.
+ The hash is considered current if the last modification time
and the size of the source files have not changed.
+2) Then `Gradle` executes the task and saves a hash of the path and contents of the output files.
+3) Then, when Gradle starts a project build again, it generates a new hash for the same files.
+ If the new hash is current, Gradle can safely skip this task.
+ In the opposite case, the task performs an action again and rewrites outputs.
-In addition to `Incremental build`, `Gradle` also stores fingerprints of previous builds, enabling quick project builds,
-for example when switching from one branch to another. This feature is known as
+In addition to `Incremental build`, `Gradle` also stores hashes of previous builds, enabling quick project builds,
+for example when switching from one git branch to another. This feature is known as
the [Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).
+The concept of `Gradle Incremental build` bears resemblance to a tool that is essential for our purposes.
+It has the capability to manage separate compilation tasks based on inputs and outputs.
+However, an incremental build in Gradle may be redundant for the EO.
+In contrast to other programming languages, EO currently lacks pre-existing libraries that can be integrated
+into the project. Consequently, there is no need to generate a fingerprint for each task's data.
+
+
### Maven
[Maven](https://maven.apache.org) automates and manages Java-project builds.
`Maven` is based on the concept of
@@ -144,14 +153,17 @@ This leads to a significant increase in performance when compiling complex proje
is used for large Maven projects that have a significant number of small `modules`.
A `module` refers to a subproject within a larger project.
Each `module` has its own `pom.xm` file, and there is an aggregator `pom.xml` that consolidates all the `modules`.
-This plugin takes a key for a `module`, it encapsulates the essential aspects of the `module`,
+This plugin takes a hash for a `module`, it encapsulates the essential aspects of the `module`,
including the source code and the configuration of the plugins used within it.
-`Modules` with the same key are current or unchanged and the cache can efficiently restore them.
-Conversely, the cache seamlessly delegates the build work to the standard Maven core,
+`Modules` with the same hash are current or unchanged and the cache can efficiently restore them.
+In the opposite case, the cache seamlessly delegates the build work to the standard Maven core,
without interfering with the build execution logic.
`maven-build-cache-extension` ensures that only the changed `modules` within the project will rebuild.
+Maven's caching mechanisms operate at the level of `phases` and individual project modules.
+Therefore, existing caching systems in Maven do not align with our requirements for resolving present issues.
+
### EO build cache
The EO code uses the `Maven` for building projects.
@@ -184,9 +196,7 @@ Using the program name, each task can receive and store data.
The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`.
These caching interfaces shared similar logic, but with minor differences.
For instance, `Footprint` verifies the EO version of the compiler, whereas the remaining checks are identical.
-Additionally, the conditions for searching data in the cache had errors.
-The cached file is considered valid if the end time of goal's execution
-and the time of saving goal's result to the cache are equal.
+Additionally, the conditions for searching data in the cache had errors.
Due to this issue, the program behaved incorrectly, because saving the goal's result to the cache is not instantaneous.
After conducting an in-depth analysis of the project's incorrect operation,
several disadvantages of the previous caching mechanism in EO were brought to light:
@@ -197,18 +207,9 @@ several disadvantages of the previous caching mechanism in EO were brought to li
leading to redundancy and complicating the caching infrastructure.
-To address caching challenges in EO, we closely examined existing caching systems.
-Maven's caching mechanisms operate at the level of `phases` and individual project modules.
-However, we require a caching mechanism at the level of `goals`.
-Consequently, the existing caching systems in Maven do not align with our requirements for resolving present issues.
-Furthermore, we cannot use the `ccache` as the basis for creating caching in EO because `ccache` is a high-level tool
-and cannot work with individual compilation tasks.
-The concept of `Gradle Incremental build` bears resemblance to a tool that is essential for our purposes.
-It has the capability to manage separate compilation tasks based on inputs and outputs.
-However, an incremental build in Gradle may be redundant for the EO.
-In contrast to other programming languages, EO currently lacks pre-existing libraries that can be integrated
-into the project.
-Consequently, there is no need to generate a fingerprint for each task's data.
+To address caching challenges in EO, we closely examined existing caching systems. However, we cannot use them.
+We require a caching mechanism at the level of `goals`.
+In fact, we don't need to invent a new caching mechanism for EO.
Instead, it suffices to verify the last modification time of the files involved in EO compilation.
The modification time of the preceding task must not exceed that of the subsequent one.
As each task possesses directories for input and output data, accessing the desired file
@@ -216,15 +217,6 @@ via an absolute path enables retrieval of essential information, as file name an
from the file attributes without reading the file context.
-### Conclusion
-Summarizing the work completed, we can outline the following:
-1) We highlighted the main problems of the current caching mechanism in EO.
-The problems stem from both the logic of the code and its architecture.
-2) We examined existing project build systems and realized that the EO language is much simpler
-than existing programming languages.
-This realization led us to conclude that existing caching methods are redundant for EO.
-3) We proposed ideas to solve problems in the current caching implementation.
-
From 5c065a595fb8f21e61e70c944f1318c4240f6a59 Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Thu, 18 Apr 2024 17:51:23 +0300
Subject: [PATCH 15/17] feat(#56):fix maven and added diagrams about EO cache
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 136 ++++++++++++------
images/RewritingInCacheEO1.svg | 63 ++++++++
images/RewritingInCacheEO2.svg | 63 ++++++++
images/SavingInCacheEO.svg | 63 ++++++++
4 files changed, 282 insertions(+), 43 deletions(-)
create mode 100644 images/RewritingInCacheEO1.svg
create mode 100644 images/RewritingInCacheEO2.svg
create mode 100644 images/SavingInCacheEO.svg
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index 8ff6800..cce578f 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -28,7 +28,7 @@ in order to gain a deeper understanding of the caching concepts employed within
### ccache/sccache
In compiled programming languages, building a project with many source code files takes a long time.
This time is spent on loading of libraries, preparing, optimizing, checking the code, and so on.
-Let's look at the assembly scheme using C++ as an example:
+Let's look at the assembly scheme using C++ as an example [Picture 1](/images/defaultCPhase.svg):
@@ -72,22 +72,21 @@ Otherwise, the project might fail to build, resulting in errors.
And `sccache` supports a wider range of languages, while `ccache` focuses on caching C and C++ compiler.
-`ccache` is a high-level tool and cannot work with individual compilation tasks,
-therefore `ccache` is not suitable for solving our problems.
-However, the concept of non-local data storage could potentially be incorporated during the development of the EO.
+`ccache` cannot work with individual compilation tasks (e.g. `Maven goal` or `Gradle task`).
+However, the hashing approach and the concept of non-local data storage could potentially
+be incorporated during the development of the EO caching mechanism.
### Gradle
[Gradle](https://gradle.org) builds projects using a
[task graph](https://docs.gradle.org/current/userguide/build_lifecycle.html) that allows for synchronous execution
-of certain tasks.
+of certain tasks. A task represents a unit of work in `Gradle` project.
`Gradle` employs
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work),
to speed up project builds.
For an incremental build to work, the tasks used to build the project must have specified
input and output files.
-The provided code snippet demonstrates the implementation of a custom task in Gradle,
-showcasing how inputs and outputs are specified to enable `Incremental build`:
+The provided code snippet demonstrates the implementation of a task in Gradle:
```
task myTask {
inputs.file 'src/main/java/MyTask.somebody' // Specify the input file
@@ -102,25 +101,25 @@ task myTask {
To understand how `Incremental build` works, consider the following steps:
-1) Before executing a task, `Gradle` takes a hash of the path and contents of the inputs files and saves it.
- The hash is considered current if the last modification time
- and the size of the source files have not changed.
-2) Then `Gradle` executes the task and saves a hash of the path and contents of the output files.
+`Incremental build` uses a hash to detect changes in the inputs and the outputs.
+The single hash contains the paths and the contents of all the input files or output files.
+1) Before executing a task, `Gradle` takes a hash of the input files and saves it.
+ The hash is considered valid if the last modification time and the size of the source files have not changed.
+2) Then `Gradle` executes the task and saves a hash of the output files.
3) Then, when Gradle starts a project build again, it generates a new hash for the same files.
- If the new hash is current, Gradle can safely skip this task.
+ If the new hash is valid, Gradle can safely skip this task.
In the opposite case, the task performs an action again and rewrites outputs.
-In addition to `Incremental build`, `Gradle` also stores hashes of previous builds, enabling quick project builds,
+In addition to `Incremental build`, `Gradle` also stores hash of previous each build, enabling quick project builds,
for example when switching from one git branch to another. This feature is known as
the [Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).
-The concept of `Gradle Incremental build` bears resemblance to a tool that is essential for our purposes.
-It has the capability to manage separate compilation tasks based on inputs and outputs.
-However, an incremental build in Gradle may be redundant for the EO.
-In contrast to other programming languages, EO currently lacks pre-existing libraries that can be integrated
-into the project. Consequently, there is no need to generate a fingerprint for each task's data.
+`Gradle Incremental build` can manage separate compilation tasks based on inputs and outputs.
+And the EO compiler consists from a unit of work in `Maven` (the last section contains a detailed description).
+Steps of the EO compiler can have input and output files.
+Building upon the concept of `Gradle Incremental Build`, we can use its principles to develop the EO caching mechanism.
### Maven
@@ -129,6 +128,7 @@ into the project. Consequently, there is no need to generate a fingerprint for e
[Maven LifeCycles](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
which include default, clean, and site lifecycles.
Each lifecycle consists of `phases` and these `phases` consist of sets of `goals`.
+One `phase` can consist of several `goals`.
In Maven, there are default phases and goals for building any projects:
@@ -136,33 +136,54 @@ In Maven, there are default phases and goals for building any projects:
-By default, the `phases` in Maven are inherently connected within the build lifecycle.
-Each `phase` represents a specific task, and the execution order of `goals` within `phases` is determined
-by the default Maven lifecycle bindings. This means that while each `phase` operates as a series of individual tasks
-and their execution order is predefined by Maven.
+In Maven, the `phases` are inherently interconnected within the build lifecycle.
+A `phase` represents a specific task, and the execution order of `phases` is determined by the default Maven
+lifecycle bindings. Each `phase` functions as a series of individual tasks known as `goals`.
+There are `goals` tied to the Maven lifecycle, as shown in [Picture 2](/images/defaultPhaseMaven.svg).
+It's also possible to add a new `goal` to a desired phase by modifying the `pom.xml` file.
+Additionally, Maven also supports `goals` that are not bound to any build phase
+and can be executed outside the build lifecycle, directly through the command line.
+The sequence of achieving `goals` is as follows:
+1) The `goals` tied to the Maven lifecycle are executed first.
+2) The `goals` added to the `pom.xml` file are executed second.
+3) The `goals` that are not tied to `phases` can be executed last.
-`Maven` utilizes caching mechanisms through the `takari-lifecycle-plugin` and `maven-build-cache-extension`:
+`Maven` can utilize caching mechanisms through the `takari-lifecycle-plugin` and `maven-build-cache-extension`:
+
* The [takari-lifecycle-plugin](http://takari.io/book/40-lifecycle.html) is an alternative to the default Maven lifecycle
-(building JAR files). Its distinctive feature is the use of a single universal plugin with the same functionality
-as five separate plugins for the standard lifecycle, but with significantly fewer dependencies. As a result,
-it provides a much faster startup, more optimal operation, and lower resource consumption.
-This leads to a significant increase in performance when compiling complex projects with a large number of modules.
+(building JAR files). Its distinct feature lies in the use of a single universal plugin with the equivalent
+functionality to plugins for the standard lifecycle, but with significantly fewer dependencies. This plugin leverages
+[The Takari Incremental API](https://github.com/takari/io.takari.incrementalbuild),
+which introduces the concept of `builders`. These `builders` are user-provided public non-abstract
+top-level classes that implement specific build actions, denoted as methods annotated `@Builder`.
+They can produce various types of outputs, including generated/output files on the filesystem,
+build messages, and project model mutations. For each `@Builder` annotated method, a maven mojo,
+which represents a maven `goal`, is generated.
+When a `builder` is run for a given set of inputs, it produces and saves to the specified directory the same outputs.
+Any changes in the inputs result in the removal of outputs.
+
* The [maven-build-cache-extension](https://maven.apache.org/extensions/maven-build-cache-extension/)
-is used for large Maven projects that have a significant number of small `modules`.
+is utilized for large Maven projects that have a significant number of small `modules`.
A `module` refers to a subproject within a larger project.
Each `module` has its own `pom.xm` file, and there is an aggregator `pom.xml` that consolidates all the `modules`.
-This plugin takes a hash for a `module`, it encapsulates the essential aspects of the `module`,
-including the source code and the configuration of the plugins used within it.
-`Modules` with the same hash are current or unchanged and the cache can efficiently restore them.
+This plugin takes a hash from `module` inputs and stores outputs in the cache.
+The cache restores unchanged `modules`.
In the opposite case, the cache seamlessly delegates the build work to the standard Maven core,
without interfering with the build execution logic.
-`maven-build-cache-extension` ensures that only the changed `modules` within the project will rebuild.
+
+
+Let's clarify upfront that the Maven Build Cache Extension is not suitable for caching EO compilation stages,
+as it is designed for caching at the module level within a project and not for individual tasks.
-Maven's caching mechanisms operate at the level of `phases` and individual project modules.
-Therefore, existing caching systems in Maven do not align with our requirements for resolving present issues.
+Special attention should be given to the Takari Incremental API.
+This API can be applied to cache EO compilation stages as it operates with `goals`.
+It does not use hashing algorithms, which can slow down project build times,
+and it does not have separate cache directories.
+Each `builder` has own directories for input and output data related to their work.
+The operational principle of the Takari Incremental API is similar to the operation of caching in EO.
### EO build cache
@@ -199,7 +220,7 @@ For instance, `Footprint` verifies the EO version of the compiler, whereas the r
Additionally, the conditions for searching data in the cache had errors.
Due to this issue, the program behaved incorrectly, because saving the goal's result to the cache is not instantaneous.
After conducting an in-depth analysis of the project's incorrect operation,
-several disadvantages of the previous caching mechanism in EO were brought to light:
+several disadvantages of the previous EO caching mechanism were brought to light:
* Incorrect search conditions for data in the cache.
* The verification method requires reading the file content, which results in inefficiencies.
* The presence of multiple caching mechanisms creates challenges in identifying and rectifying caching errors.
@@ -207,18 +228,47 @@ several disadvantages of the previous caching mechanism in EO were brought to li
leading to redundancy and complicating the caching infrastructure.
-To address caching challenges in EO, we closely examined existing caching systems. However, we cannot use them.
-We require a caching mechanism at the level of `goals`.
-In fact, we don't need to invent a new caching mechanism for EO.
-Instead, it suffices to verify the last modification time of the files involved in EO compilation.
-The modification time of the preceding task must not exceed that of the subsequent one.
-As each task possesses directories for input and output data, accessing the desired file
-via an absolute path enables retrieval of essential information, as file name and last modified time,
-from the file attributes without reading the file context.
+In tackling caching challenges within EO, we conducted a thorough evaluation of current caching systems.
+Most existing caching systems are not suitable for the EO project.
+However, one candidate emerged as a potential solution for caching EO compilation stages: the Takari Incremental API.
+The Takari Incremental API exhibits key similarities with the EO caching system,
+notably in its utilization of inputs and outputs directories, absence of a hash for data storage and retrieval,
+and compatibility with Maven goals.
+However, it diverges from the EO caching approach in one significant aspect – the absence of a distinct cache directory.
+
+We can try to use this API or implement our own caching approach, correcting the disadvantages found.
+The envisioned approach involves the creation of a singular class responsible
+for storing and retrieving data from the cache.
+The logic for checking the relevance of cached data is presented below:
+1) We create EO program, named "example".
+ Intermediate files during compilation of this program will have the same name, but not the format
+ (e.g. `example.eo`, `example.xml`).
+2) When the EO compiler compiles this program task, it saves intermediate files of compilation steps into cache.
+ Each compilation step has own caching directory.
+3) When the EO compiler starts a project build again, it will check if there is a file, named "example",
+ in the cache of each step. If such a file exists,
+ then it is enough to check that the last modification time of this file at the current step
+ is later than at the previous step. If this condition is true,
+ then the finished file can be retrieved from the cache.
+ Below is a diagram illustrating the EO compilation steps, which have caching directory for EO version `0.34.0`:
+
+
+
+
+4) If the EO program file [Picture 5](/images/RewritingInCacheEO1.svg)
+ or an intermediate file [Picture 6](/images/RewritingInCacheEO2.svg) have changed,
+ then the previously cached files becomes invalid.
+ In this case, the compilation step performs an action again and rewrites outputs.
+
+
+
+
+
+
diff --git a/images/RewritingInCacheEO1.svg b/images/RewritingInCacheEO1.svg
new file mode 100644
index 0000000..e1b2fc4
--- /dev/null
+++ b/images/RewritingInCacheEO1.svg
@@ -0,0 +1,63 @@
+
+
+
+
\ No newline at end of file
diff --git a/images/RewritingInCacheEO2.svg b/images/RewritingInCacheEO2.svg
new file mode 100644
index 0000000..f1a31d7
--- /dev/null
+++ b/images/RewritingInCacheEO2.svg
@@ -0,0 +1,63 @@
+
+
+
+
\ No newline at end of file
diff --git a/images/SavingInCacheEO.svg b/images/SavingInCacheEO.svg
new file mode 100644
index 0000000..31564aa
--- /dev/null
+++ b/images/SavingInCacheEO.svg
@@ -0,0 +1,63 @@
+
+
+
+
\ No newline at end of file
From 8f2736813b99e127d4dce384a3c2935ae956c17c Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Tue, 21 May 2024 17:29:06 +0300
Subject: [PATCH 16/17] feat(#56):fix grammar and text
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 58 ++++++-------------
1 file changed, 18 insertions(+), 40 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index cce578f..e3efb9f 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -14,7 +14,7 @@ for EO version `0.34.0`. The bug occurred because the old verification method
used compilation time and caching time to search for a cached file.
This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
-We came to the conclusion that we need caching with a reliable verification method.
+We came to conclusion that we need caching with a reliable verification method.
Furthermore, this verification method should refrain from reading the file content.
The goal is to implement effective caching in EO.
@@ -38,8 +38,7 @@ Let's look at the assembly scheme using C++ as an example [Picture 1](/images/de
which consist of both source files `.cpp` and header files `.h`.
The result is a single file `.cpp` with human-readable code that the compiler will get.
2) The compiler receives the file `.cpp` from the preprocessor and compiles it into an object file - `.obj`.
-At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
-At the end, the compiler optimizes the resulting machine code and produces an object file.
+At the compilation stage, parser checks whether the code matches rules of a specific programming language.
To speed up compilation, different files of the same project might be compiled in parallel.
3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) combines object files
into an executable `.exe` file.
@@ -51,7 +50,7 @@ and [sccache](https://github.com/mozilla/sccache) are used.
When compiling a file, its hash is calculated.
If the file is already present in the registry of compiled files, the file will not be compiled again.
Instead, the previously compiled binary file will be utilized.
-This approach can significantly accelerate the build process of certain packages, reducing build times by 5-10 times.
+This approach can significantly accelerate the build process of certain packages.
The [`ccache` hash](https://ccache.dev/manual/4.8.2.html#_common_hashed_information) is
based on:
* the file contents
@@ -81,26 +80,12 @@ be incorporated during the development of the EO caching mechanism.
[Gradle](https://gradle.org) builds projects using a
[task graph](https://docs.gradle.org/current/userguide/build_lifecycle.html) that allows for synchronous execution
of certain tasks. A task represents a unit of work in `Gradle` project.
+
+
`Gradle` employs
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work),
to speed up project builds.
-For an incremental build to work, the tasks used to build the project must have specified
-input and output files.
-The provided code snippet demonstrates the implementation of a task in Gradle:
-```
-task myTask {
- inputs.file 'src/main/java/MyTask.somebody' // Specify the input file
- outputs.file 'build/classes/java/main/MyTask.somebody' // Specify the output file
-
- doLast {
- // Task actions go here
- // This code will only be executed if the inputs or outputs have changed
- }
-}
-```
-
-
-To understand how `Incremental build` works, consider the following steps:
+To enable an incremental build, the project tasks must specify their input and output files.
`Incremental build` uses a hash to detect changes in the inputs and the outputs.
The single hash contains the paths and the contents of all the input files or output files.
1) Before executing a task, `Gradle` takes a hash of the input files and saves it.
@@ -111,13 +96,13 @@ The single hash contains the paths and the contents of all the input files or ou
In the opposite case, the task performs an action again and rewrites outputs.
-In addition to `Incremental build`, `Gradle` also stores hash of previous each build, enabling quick project builds,
+In addition to `Incremental build`, `Gradle` also stores hash of each previous build, enabling quick project builds,
for example when switching from one git branch to another. This feature is known as
the [Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).
`Gradle Incremental build` can manage separate compilation tasks based on inputs and outputs.
-And the EO compiler consists from a unit of work in `Maven` (the last section contains a detailed description).
+And the EO compiler consists of a unit of work in `Maven` (the last section contains a detailed description).
Steps of the EO compiler can have input and output files.
Building upon the concept of `Gradle Incremental Build`, we can use its principles to develop the EO caching mechanism.
@@ -126,9 +111,7 @@ Building upon the concept of `Gradle Incremental Build`, we can use its principl
[Maven](https://maven.apache.org) automates and manages Java-project builds.
`Maven` is based on the concept of
[Maven LifeCycles](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
-which include default, clean, and site lifecycles.
-Each lifecycle consists of `phases` and these `phases` consist of sets of `goals`.
-One `phase` can consist of several `goals`.
+which includes default, clean, and site lifecycles.
In Maven, there are default phases and goals for building any projects:
@@ -143,10 +126,6 @@ There are `goals` tied to the Maven lifecycle, as shown in [Picture 2](/images/d
It's also possible to add a new `goal` to a desired phase by modifying the `pom.xml` file.
Additionally, Maven also supports `goals` that are not bound to any build phase
and can be executed outside the build lifecycle, directly through the command line.
-The sequence of achieving `goals` is as follows:
-1) The `goals` tied to the Maven lifecycle are executed first.
-2) The `goals` added to the `pom.xml` file are executed second.
-3) The `goals` that are not tied to `phases` can be executed last.
`Maven` can utilize caching mechanisms through the `takari-lifecycle-plugin` and `maven-build-cache-extension`:
@@ -156,9 +135,9 @@ The sequence of achieving `goals` is as follows:
functionality to plugins for the standard lifecycle, but with significantly fewer dependencies. This plugin leverages
[The Takari Incremental API](https://github.com/takari/io.takari.incrementalbuild),
which introduces the concept of `builders`. These `builders` are user-provided public non-abstract
-top-level classes that implement specific build actions, denoted as methods annotated `@Builder`.
+top-level classes that implement specific build actions.
They can produce various types of outputs, including generated/output files on the filesystem,
-build messages, and project model mutations. For each `@Builder` annotated method, a maven mojo,
+build messages, and project model mutations. For each `builder` annotated method, a maven mojo,
which represents a maven `goal`, is generated.
When a `builder` is run for a given set of inputs, it produces and saves to the specified directory the same outputs.
Any changes in the inputs result in the removal of outputs.
@@ -183,7 +162,6 @@ This API can be applied to cache EO compilation stages as it operates with `goal
It does not use hashing algorithms, which can slow down project build times,
and it does not have separate cache directories.
Each `builder` has own directories for input and output data related to their work.
-The operational principle of the Takari Incremental API is similar to the operation of caching in EO.
### EO build cache
@@ -209,9 +187,9 @@ However, the actual work with EO code takes place in `AssembleMojo`.
Each goal within `AssembleMojo` is a distinct compilation step for EO code.
-These tasks happen one after the other, and each task relies on the output of the one before it.
-Each task has directories for input and output data, as well as a directory for storing cached data.
-Using the program name, each task can receive and store data.
+These goals happen one after the other. Each goal has directories for input and output data,
+as well as a directory for storing cached data.
+Using the program name, each goal can receive and store data.
The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`.
@@ -243,10 +221,10 @@ The logic for checking the relevance of cached data is presented below:
1) We create EO program, named "example".
Intermediate files during compilation of this program will have the same name, but not the format
(e.g. `example.eo`, `example.xml`).
-2) When the EO compiler compiles this program task, it saves intermediate files of compilation steps into cache.
- Each compilation step has own caching directory.
+2) When the EO compiler compiles this program task, it saves files of compilation steps into cache.
+ Each compilation step has its own caching directory.
3) When the EO compiler starts a project build again, it will check if there is a file, named "example",
- in the cache of each step. If such a file exists,
+ in the cache of step. If such a file exists,
then it is enough to check that the last modification time of this file at the current step
is later than at the previous step. If this condition is true,
then the finished file can be retrieved from the cache.
@@ -258,7 +236,7 @@ The logic for checking the relevance of cached data is presented below:
4) If the EO program file [Picture 5](/images/RewritingInCacheEO1.svg)
or an intermediate file [Picture 6](/images/RewritingInCacheEO2.svg) have changed,
- then the previously cached files becomes invalid.
+ then the previously cached files become invalid.
In this case, the compilation step performs an action again and rewrites outputs.
From f25314b0491a8181bab853615d9ca1fed6669465 Mon Sep 17 00:00:00 2001
From: Alekseeva Yana
Date: Wed, 12 Jun 2024 21:53:14 +0300
Subject: [PATCH 17/17] feat(#56):delete extra lines and fix text (eo cache)
---
_posts/2024/2024-02-06-about-caching-in-eo.md | 46 ++++++-------------
1 file changed, 13 insertions(+), 33 deletions(-)
diff --git a/_posts/2024/2024-02-06-about-caching-in-eo.md b/_posts/2024/2024-02-06-about-caching-in-eo.md
index e3efb9f..2c62736 100644
--- a/_posts/2024/2024-02-06-about-caching-in-eo.md
+++ b/_posts/2024/2024-02-06-about-caching-in-eo.md
@@ -43,7 +43,6 @@ To speed up compilation, different files of the same project might be compiled i
3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) combines object files
into an executable `.exe` file.
-
To speed up the build of compiled languages, [ccache](https://ccache.dev)
and [sccache](https://github.com/mozilla/sccache) are used.
`ccache` uses the hash algorithm for the hashing of code at certain stages of the build.
@@ -65,23 +64,19 @@ When using this mode, the user must ensure that the external libraries used in a
Otherwise, the project might fail to build, resulting in errors.
2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step.
-
`Sccache` is similar in purpose to `ccache` but provides more functionality.
`Sccache` allows to store cached files not only locally, but also in a cloud data storage.
And `sccache` supports a wider range of languages, while `ccache` focuses on caching C and C++ compiler.
-
`ccache` cannot work with individual compilation tasks (e.g. `Maven goal` or `Gradle task`).
However, the hashing approach and the concept of non-local data storage could potentially
be incorporated during the development of the EO caching mechanism.
-
### Gradle
[Gradle](https://gradle.org) builds projects using a
[task graph](https://docs.gradle.org/current/userguide/build_lifecycle.html) that allows for synchronous execution
of certain tasks. A task represents a unit of work in `Gradle` project.
-
`Gradle` employs
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work),
to speed up project builds.
@@ -95,18 +90,15 @@ The single hash contains the paths and the contents of all the input files or ou
If the new hash is valid, Gradle can safely skip this task.
In the opposite case, the task performs an action again and rewrites outputs.
-
In addition to `Incremental build`, `Gradle` also stores hash of each previous build, enabling quick project builds,
for example when switching from one git branch to another. This feature is known as
the [Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).
-
`Gradle Incremental build` can manage separate compilation tasks based on inputs and outputs.
And the EO compiler consists of a unit of work in `Maven` (the last section contains a detailed description).
Steps of the EO compiler can have input and output files.
Building upon the concept of `Gradle Incremental Build`, we can use its principles to develop the EO caching mechanism.
-
### Maven
[Maven](https://maven.apache.org) automates and manages Java-project builds.
`Maven` is based on the concept of
@@ -127,22 +119,17 @@ It's also possible to add a new `goal` to a desired phase by modifying the `pom.
Additionally, Maven also supports `goals` that are not bound to any build phase
and can be executed outside the build lifecycle, directly through the command line.
-
`Maven` can utilize caching mechanisms through the `takari-lifecycle-plugin` and `maven-build-cache-extension`:
-
* The [takari-lifecycle-plugin](http://takari.io/book/40-lifecycle.html) is an alternative to the default Maven lifecycle
(building JAR files). Its distinct feature lies in the use of a single universal plugin with the equivalent
functionality to plugins for the standard lifecycle, but with significantly fewer dependencies. This plugin leverages
[The Takari Incremental API](https://github.com/takari/io.takari.incrementalbuild),
which introduces the concept of `builders`. These `builders` are user-provided public non-abstract
-top-level classes that implement specific build actions.
-They can produce various types of outputs, including generated/output files on the filesystem,
-build messages, and project model mutations. For each `builder` annotated method, a maven mojo,
-which represents a maven `goal`, is generated.
-When a `builder` is run for a given set of inputs, it produces and saves to the specified directory the same outputs.
+top-level classes that implement specific build actions. `Builders` work as part of the Maven lifecycle.
+When a `builder` is run for a given set of inputs, it produces and saves to the specified directory the same outputs.
Any changes in the inputs result in the removal of outputs.
-
-
+They can produce various types of outputs, including generated/output files on the filesystem,
+build messages, and project model mutations.
* The [maven-build-cache-extension](https://maven.apache.org/extensions/maven-build-cache-extension/)
is utilized for large Maven projects that have a significant number of small `modules`.
A `module` refers to a subproject within a larger project.
@@ -151,17 +138,15 @@ This plugin takes a hash from `module` inputs and stores outputs in the cache.
The cache restores unchanged `modules`.
In the opposite case, the cache seamlessly delegates the build work to the standard Maven core,
without interfering with the build execution logic.
-
Let's clarify upfront that the Maven Build Cache Extension is not suitable for caching EO compilation stages,
as it is designed for caching at the module level within a project and not for individual tasks.
-
Special attention should be given to the Takari Incremental API.
This API can be applied to cache EO compilation stages as it operates with `goals`.
It does not use hashing algorithms, which can slow down project build times,
-and it does not have separate cache directories.
-Each `builder` has own directories for input and output data related to their work.
+and it does not have separate cache directories. The Takari checks the last modification time of the input files.
+It doesn't create a hash.
### EO build cache
@@ -176,12 +161,10 @@ Below is a diagram illustrating the main phases and their corresponding goals fo
In [Picture 3](/images/EO.svg) the goals of the `eo-maven-plugin` are highlighted in green.
-
However, the actual work with EO code takes place in `AssembleMojo`.
`AssembleMojo` is the goal consisting of other goals that work with the EO file, as shown in
[Picture 4](/images/AssembleMojo.svg).
-
@@ -191,7 +174,6 @@ These goals happen one after the other. Each goal has directories for input and
as well as a directory for storing cached data.
Using the program name, each goal can receive and store data.
-
The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`.
These caching interfaces shared similar logic, but with minor differences.
For instance, `Footprint` verifies the EO version of the compiler, whereas the remaining checks are identical.
@@ -205,7 +187,6 @@ several disadvantages of the previous EO caching mechanism were brought to light
* Employing multiple caching mechanisms for similar entities is a suboptimal practice,
leading to redundancy and complicating the caching infrastructure.
-
In tackling caching challenges within EO, we conducted a thorough evaluation of current caching systems.
Most existing caching systems are not suitable for the EO project.
However, one candidate emerged as a potential solution for caching EO compilation stages: the Takari Incremental API.
@@ -220,13 +201,13 @@ for storing and retrieving data from the cache.
The logic for checking the relevance of cached data is presented below:
1) We create EO program, named "example".
Intermediate files during compilation of this program will have the same name, but not the format
- (e.g. `example.eo`, `example.xml`).
-2) When the EO compiler compiles this program task, it saves files of compilation steps into cache.
- Each compilation step has its own caching directory.
-3) When the EO compiler starts a project build again, it will check if there is a file, named "example",
+ (e.g. `example.eo`, `example.xml`).
+ When the EO compiler assembles this program task, it saves files of compilation steps into cache.
+ Each compilation step has its own caching directory and an input file directory.
+2) When the EO compiler starts a project build again, it will check if there is the input file, named "example",
in the cache of step. If such a file exists,
- then it is enough to check that the last modification time of this file at the current step
- is later than at the previous step. If this condition is true,
+ then it is enough to check that the last modification time of cached file at the current step
+ is later than the input file. If this condition is true,
then the finished file can be retrieved from the cache.
Below is a diagram illustrating the EO compilation steps, which have caching directory for EO version `0.34.0`:
@@ -235,7 +216,7 @@ The logic for checking the relevance of cached data is presented below:
4) If the EO program file [Picture 5](/images/RewritingInCacheEO1.svg)
- or an intermediate file [Picture 6](/images/RewritingInCacheEO2.svg) have changed,
+ or any input file [Picture 6](/images/RewritingInCacheEO2.svg) have changed,
then the previously cached files become invalid.
In this case, the compilation step performs an action again and rewrites outputs.
@@ -243,7 +224,6 @@ The logic for checking the relevance of cached data is presented below:
-