-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Kalrish Bäakjen
committed
Dec 9, 2017
0 parents
commit 8080d86
Showing
95 changed files
with
3,759 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# json2lua | ||
|
||
This program converts JSON to Lua efficiently. Instead of parsing and loading the entire JSON data at once, which would cost quite some memory, it reads and processes the input in chunks. | ||
|
||
Currently, the project comprises two components: | ||
|
||
- `libjson2lua`, a C++ library implementing a flexible, pull-style JSON-to-Lua parser and converter | ||
- `json2lua`, the C++ program which exposes the required functionality to the command line. | ||
|
||
The library is kept in the project because it is developed for the program, although it might also be useful elsewhere. | ||
|
||
In the future, the program might be expanded to support more encodings and, perhaps, to write bytecode directly. | ||
|
||
|
||
## Build | ||
|
||
The Tup build system manages the process, which comprises three steps: configuration, build and testing. | ||
|
||
### Configuration | ||
Parameters are specified in the usual way, in a `tup.config` file. You might be interested in using variants. | ||
|
||
The steps are as follows: | ||
|
||
1. Choose an appropriate toolchain (`CONFIG_TOOLCHAIN`). Toolchains are defined in [src/toolchains.tup](src/toolchains.tup). | ||
2. Fulfill the config items required or supported by the chosen toolchain (e.g. `CONFIG_CXXFLAGS` for the GNU toolchain). | ||
3. Choose appropriate values for toolchain-independent variables (like `CONFIG_BUFFER_SIZE`). | ||
|
||
[The provided tup.config](tup.config) contains the explanation of toolchain-independent variables. | ||
|
||
Example configurations which may or may not be suitable for your build environment are provided in [configs.tup](configs.tup). You could try to proceed to the build step by using one of them after adding your settings for the toolchain-independent parameters. | ||
|
||
### Build | ||
The build step follows the standard procedure: | ||
|
||
$ tup | ||
|
||
#### Requirements | ||
|
||
- the [Tup build system](http://gittup.org/tup/) | ||
- a C++ toolchain | ||
- [guardcheader](https://github.com/kalrish/guardcheader) | ||
- the standard Lua interpreter | ||
|
||
### Testing | ||
Tests are part of the build as well, and thus managed by Tup. | ||
|
||
Tests have been implemented for the command-line program. As they involve a compiled executable, they can be carried out only if the target architecture for which the program was built matches that on which it was built. They will be run automatically if that's the case, and, if any of them fails, Tup will complain. | ||
|
||
|
||
## Usage | ||
|
||
The accepted command-line depends on the version. | ||
|
||
All versions admit the GNU standard to find it out: | ||
|
||
$ json2lua --version | ||
|
||
This prints `json2lua `_`something VERSION`_, where _VERSION_ is the last token and represents the version. | ||
|
||
### Current version | ||
|
||
The current version handles UTF-8-encoded JSON and outputs ASCII-encoded Lua 5. | ||
|
||
The program shall be invoked as follows: | ||
|
||
$ json2lua INPUT OUTPUT | ||
|
||
_INPUT_ and _OUTPUT_ are the names of the files that shall be read and written, respectively. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tup.creategitignore() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
.gitignore |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
CONFIG_TOOLCHAIN=msvc | ||
CONFIG_CL=cl.exe | ||
#CONFIG_CLFLAGS=/nologo /Od /Gs /GF /RTCsu /MT /EHsc | ||
CONFIG_CLFLAGS=/nologo /std:c++17 /DNDEBUG /Ox /Gs- /GF /GL /Gw /Gy /MT /EHsc | ||
#CONFIG_LINKFLAGS=/nologo /Od /Gs /GF /RTCcsu /MT | ||
CONFIG_LINKFLAGS=/nologo /Ox /Gs- /GF /GL /Gw /Gy /MT /link /LTCG /OPT:REF,ICF |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
CONFIG_TOOLCHAIN=gnu | ||
CONFIG_CXX=g++ | ||
CONFIG_CXXFLAGS=-Wall -Wextra -Wpedantic -Wno-unused-variable -Wno-unused-const-variable -Wno-unused-parameter -Wno-unused-function -Werror -std=c++17 -fno-common -march=native | ||
CONFIG_LDFLAGS=-fno-common -march=native |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
CONFIG_TOOLCHAIN=gnu | ||
CONFIG_CXX=g++ | ||
CONFIG_CXXFLAGS=-Wall -std=c++17 -DNDEBUG -flto -fomit-frame-pointer -fno-common -fdata-sections -ffunction-sections -O3 | ||
CONFIG_LDFLAGS=-flto -fuse-linker-plugin -fomit-frame-pointer -fno-common -O3 -march=native -static -s -Xlinker --as-needed -Xlinker -O1 -Xlinker --gc-sections -Xlinker --sort-common -Xlinker --strip-all -Xlinker --relax |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
include toolchains.tup/@(TOOLCHAIN).tup | ||
|
||
ifdef TARGET | ||
TARGET=@(TARGET) | ||
else | ||
TARGET=@(TUP_PLATFORM) | ||
endif | ||
|
||
ifeq ($(TARGET),win32) | ||
PROGRAM_SUFFIX=.exe | ||
endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
include_rules | ||
|
||
: foreach cbc.cpp libinst.cpp json2lua.cpp | ../lib/<headers> config/buffer_size.hpp |> !cxx |> {objs} | ||
: foreach log.cpp main.cpp utf8.cpp | ../lib/<headers> |> !cxx |> {objs} | ||
: {objs} |> !ld |> json2lua$(PROGRAM_SUFFIX) | <cmd> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
#include "config.hpp" | ||
|
||
#include "cbc.hpp" | ||
|
||
#include "log.hpp" | ||
#include "utf8.hpp" | ||
|
||
|
||
json2lua::pointer_pair<char32_t> | ||
cbc::read | ||
( ) | ||
{ | ||
unsigned char buffer[buffer_size]; | ||
|
||
const auto input = this->input; | ||
const auto code_points = this->buffer; | ||
|
||
auto read = std::fread(buffer, 1, sizeof(buffer)-3, input); | ||
|
||
if ( std::ferror(input) == 0 ) | ||
{ | ||
if ( read > 0 ) | ||
{ | ||
const auto last_byte = buffer[read-1]; | ||
|
||
std::size_t additional = 0; | ||
|
||
if ( (last_byte & 0b11000000u) == 0b11000000u ) | ||
additional = 1; | ||
else if ( (last_byte & 0b11100000u) == 0b11100000u ) | ||
additional = 2; | ||
else if ( (last_byte & 0b11110000u) == 0b11110000u ) | ||
additional = 3; | ||
|
||
if ( additional > 0 ) | ||
{ | ||
log_debug("must read more"); | ||
|
||
const auto second_read = std::fread(buffer+read, 1, additional, input); | ||
if ( std::ferror(input) == 0 ) | ||
{ | ||
if ( second_read == additional ) | ||
{ | ||
read += second_read; | ||
} | ||
else | ||
{ | ||
err("incomplete UTF-8 sequence"); | ||
|
||
throw 4; | ||
} | ||
} | ||
else | ||
{ | ||
err("couldn't read input file"); | ||
|
||
throw 2; | ||
} | ||
} | ||
else if ( (last_byte & 0b10000000u) == 0b10000000u ) | ||
{ | ||
/* it's a continuation byte; read until the next non-continuation character */ | ||
auto buffer_p = buffer+read; | ||
|
||
read_another_character: | ||
const auto c = std::fgetc(input); | ||
if ( c != EOF ) | ||
{ | ||
*buffer_p = c; | ||
++read; | ||
|
||
if ( (c & 0b10000000u) == 0b10000000u ) | ||
{ | ||
++buffer_p; | ||
goto read_another_character; | ||
} | ||
} | ||
else | ||
{ | ||
if ( std::feof(input) == 0 ) | ||
{ | ||
err("couldn't read input file"); | ||
} | ||
else | ||
{ | ||
err("incomplete UTF-8 sequence"); | ||
} | ||
|
||
throw 2; | ||
} | ||
} | ||
|
||
json2lua::pointer_pair<char32_t> rv; | ||
rv.beg = code_points; | ||
rv.end = decode_utf8(buffer, read, code_points); | ||
|
||
return rv; | ||
} | ||
else | ||
{ | ||
if ( std::feof(input) == 0 ) | ||
{ | ||
err("couldn't read anything"); | ||
} | ||
else | ||
{ | ||
err("incomplete JSON"); | ||
} | ||
|
||
throw 3; | ||
} | ||
} | ||
else | ||
{ | ||
err("couldn't read input file"); | ||
|
||
throw 2; | ||
} | ||
} | ||
|
||
void | ||
cbc::write | ||
( | ||
const unsigned char c | ||
) | ||
{ | ||
if ( std::fputc(c, this->output) == EOF ) | ||
{ | ||
err("couldn't write to the output file"); | ||
|
||
throw 1; | ||
} | ||
} | ||
|
||
void | ||
cbc::write | ||
( | ||
const unsigned char * const buffer, | ||
std::size_t size | ||
) | ||
{ | ||
if ( std::fwrite(buffer, 1, size, this->output) != size ) | ||
{ | ||
err("couldn't write to the output file"); | ||
|
||
throw 1; | ||
} | ||
} | ||
|
||
json2lua::pointer_pair<unsigned char> | ||
cbc::encode_string_code_point | ||
( | ||
const char32_t code_point | ||
) | ||
{ | ||
json2lua::pointer_pair<unsigned char> rv; | ||
|
||
rv.beg = this->string_code_units; | ||
rv.end = rv.beg + encode_utf8(code_point, this->string_code_units); | ||
|
||
return rv; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#include <cstdio> | ||
|
||
#include "config/buffer_size.hpp" | ||
#include "../lib/pointer_pair.hpp" | ||
|
||
|
||
class cbc final | ||
{ | ||
private: | ||
char32_t buffer[buffer_size]; | ||
unsigned char string_code_units[4]; | ||
|
||
public: | ||
std::FILE * input; | ||
std::FILE * output; | ||
|
||
json2lua::pointer_pair<char32_t> | ||
read | ||
( ); | ||
|
||
void | ||
write | ||
( | ||
const unsigned char | ||
); | ||
|
||
void | ||
write | ||
( | ||
const unsigned char * const, | ||
const std::size_t | ||
); | ||
|
||
json2lua::pointer_pair<unsigned char> | ||
encode_string_code_point | ||
( | ||
const char32_t | ||
); | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#if __STDC_HOSTED__ != 1 | ||
#error C++ implementation is not hosted | ||
#endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
include_rules | ||
|
||
: buffer_size.hpp.in |> tup varsed %f %o |> %B |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
constexpr decltype(sizeof(int)) buffer_size = @BUFFER_SIZE@; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
#undef DEBUG |
Oops, something went wrong.