This is a experimental translator of C (ISO/IEC 9899:2018) programs to EO programs.
Assuming, you are on Ubuntu 22.04+:
$ apt update
$ apt install -y software-properties-common
$ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F7C91591CC543ECA
$ add-apt-repository 'deb http://c2eo.polystat.org/debian/ c2eo-rep non-free main contrib'
$ apt-get install -y clang
$ apt-get install -y c2eo
Then, just run:
$ c2eo <path-to-c-file-name> <eo-file-name>.eo
You can also use yegor256/c2eo image via Docker:
$ docker run -v $(pwd):/eo yegor256/c2eo:<tag> hello.c hello.eo
Assuming you have hello.c
in the current directory, the hello.eo
will be created next to it.
We do not support the utility for other distributions and operating systems yet. However, you can try to build the project from source at your own risk.
Again, we recommend Ubuntu 22.04+ and you will need wget 1.21+, tar 1.30+, git 2.32.+, cmake 3.18+, gcc 11.2.+, g++ 11.2.+, ninja-build 1.10.1+, clang 14.0.0+ and python3 3.10.0+. You will also need requirements for the EO project (Maven 3.3+ and Java 8+)
Then, you need to install GTest 1.12.1+
$ apt install libgtest-dev googletest
$ cd /usr/src/googletest
$ cmake .
$ make
$ lib
$ cp *.a /usr/local/lib
After that, you need to install LLVM/Clang 12.0.1 or you may use an alternative way below this code:
$ wget https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-12.0.1.tar.gz
$ tar -xvf llvmorg-12.0.1.tar.gz
$ mv ./llvm-project-llvmorg-12.0.1 ./llvm-clang
$ cd llvm-clang
$ mkdir build && cd $_
$ cmake --no-warn-unused-cli -DBUILD_SHARED_LIBS:STRING=ON -DLLVM_TARGETS_TO_BUILD:STRING=X86 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE "-DLLVM_ENABLE_PROJECTS:STRING=clang;compiler-rt" -DCMAKE_BUILD_TYPE:STRING=Debug -DLLVM_OPTIMIZED_TABLEGEN:STRING=ON -DLLVM_USE_SPLIT_DWARF:STRING=ON -DLLVM_USE_LINKER:STRING=gold ../llvm -G Ninja
$ cmake --build . --config Debug --target all -j 10 -- -j1 -l 2
$ cd ../..
You may also try our own pre-packaged archive:
$ apt install megatools
$ megadl 'https://mega.nz/#!cZ9WQCqB!z713CuC-GNFQAXIxZwZxI05zOH4FAOpwYHEElgOZflA'
$ tar -xvf llvm-clang.tar.gz
It is assumed that the llvm-clang
dir is located in the c2eo
dir. If your llvm-clang
is in different place, set the path in that line.
Formally speaking, this is where the preparation can be completed. However, in order to fully work with the project, testing and executing the translated code, you need to study the EO compiler project and fulfill its necessary requirements. After that, it will be possible to proceed with further steps.
All sources files of transpiler are located in project/src/transpiler
. The transpiler's work begins with the code from the source file project/src/transpiler/main.cpp
. Аfter making changes in these files, we will need to rebuild the executable file c2eo
. To do this, you need to go to the project
dir. For the first time, create the build
folder:
$ mkdir build
then go to the build
folder and run the following commands:
$ cmake ..
$ make
As you have already noticed, the project is being built in the project/build
folder. The result of this build is the c2eo
file in project/bin
. Now you have a transpiler and you can convert programs from C to EO. Just run:
$ ./c2eo <path-to-c-file-name> <eo-file-name>.eo
# ./c2eo ../some_dir/example.c example.eo
Your PR will pass the following checks, so before creating PR run these locally to make sure everything is ok:
$ clang-format project/src/transpiler/*.(cpp|h) -i
$ cpplint --filter=-runtime/references,-runtime/string,-build/c++11 project/src/transpiler/**
$ cd project/scripts
$ python3 clang_tidy.py
$ cd project/scripts
$ python3 transpile.py <your_path_to_the_folder>/gcc.c-torture -s gcc -n
$ cd project/scripts
$ python3 test.py -p <your_path_to_the_folder>/c-testcuite -s testcuite -n
- test
$ cd project/scripts
$ python3 test.py -s test
- unit-tests
$ cd project/scripts
$ python3 build_c2eo.py
$ project/bin/
$ ./unit_tests --gtest_filter=*
From project/scripts/
directory:
$ python3 update-release.py -h
usage: update-release.py [-h] [--branch BRANCH] [--version VERSION]
Release maker
optional arguments:
-h, --help show this help message and exit
--version VERSION specify the new version
Example
$ python3 update-release.py --version=0.1.1
To use this script, make sure you have the following packages installed:
$ pip3 install git_config pgpy s3cmd
$ apt install md5deep reprepro gcc cmake dpkg wget tar s3cmd -y
# for the latest version of the cmake package, try:
$ pip3 install cmake
Notes:
- Use
.
as a version delimiter. - This script uses the current date, time, and time zone. Make sure they are configured correctly.
- This script extracts your name and email from
git config
. Make sure you have them.
This script will write automatically generated merges to the changelog file. You can view an approximate list of changes by running the following command in the terminal:
$ git log $(git describe --tags --abbrev=0)..HEAD --merges --oneline --format=" * %h %s by %an <%aE>"
- Build the executable file.
- Create a deb file (basic: HABR)
- Create a repository (basic: UNIXFORUM)
- Upload a repository tree into the bucket's virtual 'directory'.
The following files will be generated
$ tree
.
├── c2eo-X.X.X
│ ├── DEBIAN
│ │ ├── changelog
│ │ ├── control
│ │ ├── copyright
│ │ └── md5sums
│ └── usr
│ ├── bin
│ │ └── c2eo
│ └── lib
│ ├── libclangAnalysis.so
│ ├── libclangAnalysis.so.12
│ ├── ...
│ └── libLLVMTransformUtils.so.12
├── c2eo-X.X.X.deb
├── readme.md
├── repository
│ ├── conf
│ │ └── distributions
│ ├── db
│ │ ├── checksums.db
│ │ ├── contents.cache.db
│ │ ├── packages.db
│ │ ├── references.db
│ │ ├── release.caches.db
│ │ └── version
│ ├── dists
│ │ └── c2eo-rep
│ │ ├── contrib
│ │ │ ├── binary-amd64
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── debian-installer
│ │ │ │ ├── binary-amd64
│ │ │ │ │ ├── Packages
│ │ │ │ │ └── Packages.gz
│ │ │ │ └── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ └── Packages.gz
│ │ │ └── source
│ │ │ ├── Release
│ │ │ └── Sources.gz
│ │ ├── InRelease
│ │ ├── main
│ │ │ ├── binary-amd64
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── debian-installer
│ │ │ │ ├── binary-amd64
│ │ │ │ │ ├── Packages
│ │ │ │ │ └── Packages.gz
│ │ │ │ └── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ └── Packages.gz
│ │ │ └── source
│ │ │ ├── Release
│ │ │ └── Sources.gz
│ │ ├── non-free
│ │ │ ├── binary-amd64
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── debian-installer
│ │ │ │ ├── binary-amd64
│ │ │ │ │ ├── Packages
│ │ │ │ │ └── Packages.gz
│ │ │ │ └── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ └── Packages.gz
│ │ │ └── source
│ │ │ ├── Release
│ │ │ └── Sources.gz
│ │ ├── Release
│ │ └── Release.gpg
│ └── pool
│ └── main
│ └── c
│ └── c2eo
│ └── c2eo_X.X.X_all.deb
├── todo.sh
└── update-release.py
35 directories, 120 files
Then you have to upload ./repository/dists
and ./repository/pool
to c2eo.polystat.org/debian/.
C is a system-level procedural programming language with direct access to the underlying hardware architecture elements, such as memory and registers. EO, on the other hand is a high-level object-oriented language. There are a number of non-trivial mechanisms of translating constructs from the former to the latter, which are explained below:
✔️ Implemented:
- basic data types: double, int, bool, char, string
- const
- arrays
- structures
- unions
- functions
- function call operators
- multiple return
- pointers
- external links
- if-else
- ternary operator
- while
- do-while
- for
- break
- continue
- switch case default
- operators
Let's take the following C code as an example:
double z = 3.14;
In EO, we represent the global memory space as a copy of ram object, which we call global
. Thus, the variable z
would be accessed as a block of 8 bytes inside ram
at the very beginning, since it's the first variable seen. For example, to change the value of z
we write 8 bytes to the 0th position of global
:
ram > global
global.write 0 (3.14.as-bytes)
We transform const like ordinary variable.
const int a = 3;
if (a == 10) {
...
}
a.write-as-int32 3 // only once
if
a.read-as-int32.eq 10
seq
...
True
We can work with enumerated types as well as with constants and substitute numeric values instead of names.
enum State {Working = 1, Failed = 0};
if (10 == Working) {
...
}
if
10.eq 1
seq
...
True
seq
True
If we have fixed-size arrays we can work like with one-dimension array and calculate bias from start for any element and dimensions. In this example, we use a special object address, which makes it more convenient to read and write information from memory from a certain position.
int a[2] = { 5, 6 };
╭─────┬─────╮
| 5 │ 6 │
├─────┼─────┤
| 0th │ 4th │
╰─────┴─────╯
address global-ram 0 > a
a.write (4.mul 0) (5.as-bytes)
a.write (4.mul 1) (6.as-bytes)
We know the size of structures so we generate additional objects that store the bias of the fields of the structure and allow access to them. For nested structures and other types, we can also calculate bias and generate corresponding objects.
struct Rectangle {int x; int y;} rect;
rect.x = 5;
╭───────┬───────╮
| int x │ int y │
├───────┼───────┤
| 0th │ 4th │
╰───────┴───────╯
address global-ram 0 > rect
0 > x
4 > y
(rect.add x).write 5
The size of the union is determined by the nested object with the maximum size. The main feature is that internal objects are located at the beginning of the same address. We do the same with nested structures.
union { int a; int b; } u;
u.a = 5;
╭───────┬───────╮
| int a │ int b │
├───────┼───────┤
| 0th │ 0th │
╰───────┴───────╯
address global-ram 0 > u
0 > a
0 > b
(u.add a).write 5
In a similar way we deal with function call, we calculate the necessary space for arguments (param-start
and param-size
) and local variables in global
for each function call. The variable r
will be "pushed" to global
and accessible by the code inside the function foo
by the 0th position with local offset. The local variable x
will also be pushed to the global
and will be accessible by the 4th with local offset, because the length of int
is four.
Also we use separate copy of ram
named return
for storing function return result. Here, we are trying to simulate the bevaviour of a typical C compiler. The declaration of foo
and its execution may look like this:
double pi = 3.14;
void circle(int r) {
double x = 2 * pi * r;
return x;
}
circle(10);
╭──────────┬───────┬──────────╮
| double z │ int r │ double x │ // variables in global
├──────────┼───────┼──────────┤
| 0th │ 8th │ 12th │ // start position in global
╰──────────┴───────┴──────────╯
address global-ram 0 > pi
[param-start param-size] > circle
global.read param-start > r
global.read (add param-start 4) > x
seq > @
x.write (2.mul (pi.mul r))
return.write x
seq
pi.write 0 3.14
global.write 8 10 // write 10 to circle arguments stack
circle 8 4 // arguments stack start from 8th byte and have 4 bytes for r
The function has input variables and local variables. To determine the amount of memory for input variables, we use two parameters in the function description. For the convenience of accessing local variables, we use the bias local-start
of the local position. To indicate a free position, we use empty-local-position
. We divide the nested function call into several consecutive calls, the result of which is passed to subsequent calls.
long long func1(long long x) {
return x - 111;
}
long long func2(long long x) {
return x - 10;
}
void main() {
long long a;
a = func1(func2(5));
printf("%lld\n", a);
}
[param-start param-size] > func1
add param-start param-size > local-start
add local-start 0 > empty-local-position
address global-ram (param-start.add 0) > x
seq > @
return.write (x.sub 111)
TRUE
[param-start param-size] > func2
add param-start param-size > local-start
add local-start 0 > empty-local-position
address global-ram (param-start.add 0) > x
seq > @
return.write (x.sub 10)
TRUE
[] > main
seq > @
a.write // write func1 return in a
seq
write // write func2 return in temp place
address global-ram (add empty-local-position 0)
seq
write // write 5 to func2 arguments stack
address global-ram (add empty-local-position 0)
5
^.func2 empty-local-position 8
return
^.func1 empty-local-position 8
return
printf "%d\n" a
We generate a record of the result in a separate ram memory object. Further, other functions can read the result from there. To solve the multiple return problem, we can use the goto object in eo. By wrapping the entire function in a similar object, we can interrupt its execution at any time. To do this, you just need to generate a g.forward call for each return.
function {
...
return <result_1>;
...
return <result_2>;
...
return <result_3>;
}
[] > function
goto > @
[g]
seq > @
...
return.write <result_1>
g.forward TRUE
...
return.write <result_2>
g.forward TRUE
...
return.write <result_3>
g.forward TRUE
C code may get an address of a variable, which is either in stack or in global memory:
int f = 7;
void bar() {
int t = 42;
int* p = &t; // local scope
*p = 500; // write from local scope to local
p = &f; // global scope
*p = 500; // write from local scope to global
}
╭───────┬───────┬────────╮
| int f │ int t │ int* p │ // variables in global
├───────┼───────┼────────┤
| 0th │ 4th │ 8th │ // start position in global
╰───────┴───────┴────────╯
However, as in C, our variables are located in global
and have absolute address.
The object param-start
provided as an argument to EO object bar
is a calculated offset in global
addressing the beginning of the frame for function call. Thus, &t
would return param-start + 0
, while &f
would be just 0
:
[param-start] > bar
global.write
8 // int* p
param-start // &t -> function offset position in global space
global.write
8
0 // &f -> address of f in global
seq > @
bar 4
To compile files with any external links, we use the following solution:
In the file where the external call is used, we generate the following alias
#include <string>
strncpy(str2, str1, 8);
+alias c2eo.external.strcpy
strncpy str2 st1 8
Сreating a file of the same name by the specified alias with an empty implementation
+package c2eo.external
[args...] > strncpy
TRUE > @
In EO, we have analogues of if-else and if objects, so we just convert without any significant changes.
if (condition) {
...
}
else {
...
}
// -----------------
if (condition) {
...
}
if-else
condition
seq
...
TRUE
seq // else
...
TRUE
// -----------------
if
condition
seq
...
TRUE
We can turn the ternary operator into the same if-else, only seq must be without True at the end, because its return value will be used.
condition ? a : b
if-else
condition
seq
a
seq // else
b
We can generate of C while on the EO by using goto, conditional operator and analogs for break and continue.
while (condition) {
...
}
goto
[while-loop-label]
while-loop-label.backward > continue
while-loop-label.forward TRUE > break
if > @
condition
seq
body
continue
TRUE
We can generate an analog of C do-while on EO by using nested goto for further checking by a conditional operator and analogs for break and continue.
do {
body
} while (condition)
goto
[do-while-loop-label-1]
do-while-loop-label-1.forward TRUE > break
seq > @
goto
[do-while-loop-label-2]
do-while-loop-label-2.forward TRUE > continue
body > @
if
condition
do-while-loop-label-1.backward
TRUE
We can generate an analog of C for on EO using the nested goto to execute loop-expression after executing the body of the loop, conditional operator and analogs for break and continue.
for(init;condition;loop-expression) {
body
}
init
goto
[for-loop-label-1]
for-loop-label-1.forward TRUE > break
if > @
condition
seq
goto
[for-loop-label-2]
for-loop-label-2.forward TRUE > continue
body > @
loop-expression
for-loop-label-1.backward
TRUE
With goto object we can transofrm any number of breaks in cycle to g.forward TRUE call.
while (condition) {
...
break;
...
}
goto
[while-loop-label]
while-loop-label.backward > continue
while-loop-label.forward TRUE > break
if > @
condition
seq
...
break
...
TRUE
With goto object we can transofrm any number of continue in cycle to g.backward call.
while (condition) {
...
continue;
...
}
goto
[while-loop-label]
while-loop-label.backward > continue
while-loop-label.forward TRUE > break
if > @
condition
seq
...
continue
...
TRUE
We can convert such simple switch statement to goto object.
switch (x): {
case 1:
op1;
break;
case 2:
case 3:
op2;
break;
case 4:
op3;
case 5:
op4;
break;
case 6:
default:
op6:
break;
}
memory > flag
goto > @
[end]
seq > @
write flag 0
if
or (eq x 1) flag
seq
write flag 1
op1
end.forward TRUE
TRUE
if
or (eq x 2) flag
seq
write flag 1
TRUE
if
or (eq x 3) flag
seq
write flag 1
op2
end.forward TRUE
TRUE
if
or (eq x 4) flag
seq
write flag 1
op3
TRUE
if
or (eq x 5) flag
seq
write flag 1
op4
end.forward TRUE
TRUE
if
or (eq x 6) flag
seq
write flag 1
TRUE
op6
end.forward TRUE
TRUE
The table of all C operators and similar objects in the EO.
С | EO |
---|---|
+ | plus |
- | minus |
* | times |
* | write|read-as-<type> |
/ | div |
= | write-as-<type> |
% | mod |
+x | pos |
-x | neg |
++x | pre-inc-<type> |
x++ | post-inc-<type> |
--x | pre-dec-<type> |
x-- | post-dec-<type> |
== | eq |
!= | neq |
< | lt |
<= | lte |
> | gt |
>= | gte |
&& | and |
|| | or |
! | not |
& | bit-and |
& | addr-of |
| | bit-or |
^ | bit-xor |
~ | bit-not |
<< | shift-right |
>> | shift-left |
(type casting) | as-<type> |
x += 10;
For assignment operations, we generate the following constructs
x.write (x.add 10)
In EO, an implementation of at least 8 bytes is used to store floating-point numbers. At the moment, full support for numbers with fewer bytes is not possible. So far, to work with such numbers, we also use 8 bytes for storage.
float b = 5.0; // 4 bytes
write-as-float32 b 5.0 // 8 bytes
At the moment, the largest type in EO is int64, there is no support for uint64 numbers and it crashes with an error at the compilation stage. The current our implementation supports numbers in the range of type uint63 (the first bit is always 0 for correct translation to int).
unsigned long long int c = 18446744073709551615; // max uint64 value
unsigned long long int c = 9223372036854775807; // max int64 value
// [COMPILATION EXCEPTION] the number is too high
write-as-uint64 c 10223372036854775807
// correct
write-as-uint64 d 9223372036854775807
Source: https://stackoverflow.com/questions/840501/how-do-function-pointers-in-c-work
Let's start with a basic function which we will be pointing to:
int addInt(int n, int m) {
return n + m;
}
First thing, let's define a pointer to a function which receives 2 ints and returns an int:
int (*functionPtr)(int, int);
Now we can safely point to our function:
functionPtr = &addInt;
In EO we generate special object call
with array for storing all function call:
[index param-start param-size] > call
at. > @
*
<function_name_1> param-start param-size
addInt param-start param-size // our function has an index of 1
...
<function_name_n_n> param-start param-size
index
Now, if we want to assign the function to a pointer, we replace this expression with a specific index value of this function in our array
write-as-ptr functionPtr 1
Now that we have a pointer to the function, let's use it:
int sum = (*functionPtr)(2, 3); // sum == 5
... // before calling the function, we place its arguments in memory
write-as-int32
sum
call
param-start
param-size
read-as-ptr functionPtr // return 1
Current development at this stage
Passing the pointer to another function is basically the same:
int add2to3(int (*functionPtr)(int, int)) {
return (*functionPtr)(2, 3);
}
We can use function pointers in return values as well (try to keep up, it gets messy):
// this is a function called functionFactory which receives parameter n
// and returns a pointer to another function which receives two ints
// and it returns another int
int (*functionFactory(int n))(int, int) {
printf("Got parameter %d", n);
int (*functionPtr)(int, int) = &addInt;
return functionPtr;
}
But it's much nicer to use a typedef:
typedef int (*myFuncDef)(int, int);
// note that the typedef name is indeed myFuncDef
myFuncDef functionFactory(int n) {
printf("Got parameter %d", n);
myFuncDef functionPtr = &addInt;
return functionPtr;
}
The EO language has a goto object that supports transitions similar to the continue and break statements. We also use it to implement multiple return statements to terminate a function.
The goto statements in the C language provides a variety of jumps to the appropriate labels, violating the principles of structured programming. It cannot be implemented for many situations using the goto object that EO has. Replacing goto in C with other statements requires additional effort. It is very difficult to implement based on AST analysis alone. For example:
if (a) {
A();
goto L3;
}
B();
L1:
if (b) {
L2:
C();
L3:
D();
goto L1;
}
else if (c) {
E();
goto L2;
}
F();
stateDiagram-v2
state "if (a)" as if_1
state "if (b)" as if_2
state "else if (c)" as if_3
state "L1:" as L1
state "L2:" as L2
state "L3:" as L3
state "A();" as A
state "B();" as B
state "C();" as C
state "D();" as D
state "E();" as E
state "F();" as F
[*] --> if_1
if_1 --> A: True
A --> L3
if_1 --> B: False
B --> L1
L1 --> if_2
if_2 --> L2: True
L2 --> C
C --> L3
L3 --> D
D --> L1
if_2 --> if_3: False
if_3 --> E: True
E --> L2
if_3 --> F: False
F --> [*]
To solve this problem in the future, we can propose to implement goto-statement
and goto-label
objects in EO, which directly implement the goto semantics of the C language and are used when transforming from C to EO. The use of these objects in direct EO programming can be disabled. In addition, these constructs can be replaced when performing static analysis of EO programs.
Modern C compilers do not have direct support functions with a variable number of arguments. This is due to the use of new standards for passing arguments through registers. Therefore, an additional library is used for implementation, the interface of which is connected via the stdarg.h
header file. The library describes such constructs as va_list\verb
, va_start
, va_arg
, va_end
and others that are added to a C program at link time. Therefore, the implementation of these functions during transformation is impossible. For example:
#include <stdarg.h>
#include <stdio.h>
double average(int num,...) {
va_list valist;
double sum = 0.0;
int i;
/* initialize valist for num number of arguments */
va_start(valist, num);
/* access all the arguments assigned to valist */
for (i = 0; i < num; i++) {
sum += va_arg(valist, int);
}
/* clean memory reserved for valist */
va_end(valist);
return sum / num;
}
int main() {
printf("Average of 1, 2, 3, 4 = %f\n", average(4, 1, 2, 3, 4));
printf("Average of 1, 2, 3 = %f\n", average(3, 1, 2, 3));
}
In the further development of the project, it may be to try implementing a linker that combines EO library packages with compilation units obtained during the transformation of C programs into a single program. Another possible, but labor-intensive option could be additional parsing of functions that support a variable number of parameters. This option introduces additional features, but leads to deviations from the standard of C compiler.
In the C language, bitwise fields can be formed as structures. They provide access to individual bits of signed and unsigned numbers. Since at the moment EO does not support work at the bit level, the implementation of work with bit fields requires a lot of time and manipulations with logical operations at the byte level. This task was not a priority during the current development.
// memory-optimized date storage structure
struct date {
unsigned int day: 5; // the maximum value of days is 31, so we need 5 bits for this
unsigned int month: 4; // the maximum value of months us 12, so weed 4 bits for this
unsigned int year;
};
struct date d = {15, 7, 2022};
printf("Date size is: %lu bytes\n", sizeof(d)); // 8 bytes instead 12
printf("Date is %d.%d.%d", d.day, d.month, d.year); // 15.7.2022