-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON import/export: added base64 support for [ubyte] array and nested FBS #4585
Conversation
I can't confirm by reading the code, but I added the test case for input and wrote the expected output in padded strict JSON. This passes the flatcc json test suite including verifying the nested buffer (which is kind of lucky since the monster_test.fbs are not exactly identical). Note that the test fails if flatcc compiles with UQ off (unquoted), meaning the test case fails when it expects strict json, but that is to be expected. I have released my implementation in v0.5.0 earlier today so the above is not in the release. |
Note that I did not add runtime flags to control printing with or without padding - mostly because it was messy in the input, as you said. But it could be added later. I think what we have is fine for interop. |
BTW: the TEST macro in my test takes two string arguments, the first is the input of a monster_test.fbs json file and the second argument is the expected json printed output without spaces if the parse is successful. |
I tried to test the schema, but some semantics have diverged a bit on namespaces and language specific attributes. Removing these from the schema and the schema parses cleanly with base64, base64url attributes. |
ab24e2d
to
9c10590
Compare
include/flatbuffers/util_base64.h
Outdated
return static_cast<Base64Mode>(static_cast<int>(a) | static_cast<int>(b)); | ||
} | ||
|
||
inline bool Base64ModeIsStrict(Base64Mode m) { return (m & kBase64Strict); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove ()
include/flatbuffers/util_base64.h
Outdated
|
||
// upper bound estimate of a size of the base64 decoded sequence | ||
static inline size_t Base64EstimateDecodedSize(size_t src_size) { | ||
return (src_size / 4) * 3 + (src_size % 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments as to where these constants come from, and/or make them into named constants.
samples/monster_generated.h
Outdated
@@ -603,120 +603,6 @@ inline void EquipmentUnion::Reset() { | |||
type = Equipment_NONE; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please run sh generate_code.sh
, this will ensure the generated code in this commit will have the same settings as before.
src/idl_gen_text.cpp
Outdated
@@ -46,9 +48,12 @@ void OutputIdentifier(const std::string &name, const IDLOptions &opts, | |||
// Print (and its template specialization below for pointers) generate text | |||
// for a single FlatBuffer value into JSON format. | |||
// The general case for scalars: | |||
template<typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does this formatting come from? unless done by clang-format, please don't reformat things that are unrelated to your PR.
src/idl_gen_text.cpp
Outdated
@@ -69,10 +74,61 @@ bool Print(T val, Type type, int /*indent*/, Type * /*union_type*/, | |||
return true; | |||
} | |||
|
|||
// Print a vector as bas64-encoded string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing e
src/idl_gen_text.cpp
Outdated
// reserve text-space before temporary allocation | ||
text.reserve(text.size() + req_size + 64); | ||
std::vector<char> buf(req_size); // temporary buffer | ||
size_t b64_len = Base64Encode(b64mode, src, src_size, flatbuffers::vector_data(buf), buf.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this function encode into a vector<char>
and not a string
? If it took a string
, we could pass it text
, and it could append straight where it is needed. See similarly how EscapeString
is called.
src/util_base64.cpp
Outdated
const size_t dst_size, | ||
size_t *const error_position = nullptr) { | ||
// strict base64 table (RFC 4648) | ||
B64_DATA_ALIGNAS(64) static const uint8_t strict_table[256] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this alignment needed? if this is to align it with cache lines for performance, please remove. prefer simple code over optimizations that have not been shown to be a clear bottleneck.
src/util_base64.cpp
Outdated
size_t N4 = src_size / 4; | ||
size_t R4 = src_size % 4; | ||
// deferr last char[4] block to remained block | ||
if ((R4 == 0) && (N4 > 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more comments / better identifier names please. This code is pretty cryptic.
src/util_base64.cpp
Outdated
} | ||
} | ||
|
||
#define TEST_EQ(exp, val) TestEq(exp, val, #exp, __FILE__, __LINE__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move testing code to test.cpp
tests/monster_test.fbs
Outdated
@@ -44,6 +44,13 @@ table Stat { | |||
count:ushort; | |||
} | |||
|
|||
table TestBase64 | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you in-line this table? reduces the amount of new files generated.
At commit 79b80f8 the method ReserveElements was removed from FlatBufferBuilder class. |
@aardappel reservation in some form is necessary, otherwise the base64 decoder becomes very complex. The output size can be estimated conservatively within a small margin allowing the decoder to run without having to check output boundaries and manage flushing. |
Resolved. |
I just realized that this also needs to be tested with union strings and string vectors, and union vectors with strings. |
Can you present examples of these cases? |
I got confused - I forgot it was only ubyte and [ubyte] cannot be part of unions. I was thinking of a use case like loading multiple PEM certificates where each cert type would have different union type so the following could be handled http://how2ssl.com/articles/working_with_pem_files/ This would currently require each cert ot be loaded into a separate table, but may that is not a bad thing. |
I\m not sure exactly what the latest changes do, but it appears to enforcing padding. flatcc isn't very configurable right now, but I believe it accepts unpadded input and in the future will be allowed to generate unpadded. There are some JSON base64 formats floating around that define base64 to be unpadded, so I don't think it is wise to reject it, given that the string is fully delimited. |
This is protocol buffers base64: "JSON value will be the data encoded as a string using standard base64 encoding with paddings. Either standard or URL-safe base64 encoding with/without paddings are accepted." |
@mikkelfj, |
d1de374
to
1491ffd
Compare
@vglavnyy note there is one new suggestion: that base64 decodes also base64url and that base64url also decodes base64 unless explicitly required not to by some optional configuration Currently I'm not sure if flatcc does this, but I know that the implementation is prepared for it and I'd be happy to make this update later. |
The (base64url) decoder implicitly can decode sequence form (base64) encoder, but versa does not. |
@aardappel , Second:
Use appeding of characters to a string (like EscapeString) is impossible for current implementation of base64 encoder. Base64 encoder use direct pointer for writing to destination memory. |
@aardappel , could you review latest changes in the base-PR code? |
Any ETA on when this PR is finalized and merged? I'd like to use it for some projects of mine (among which, https://github.com/KageKirin/flatGLTF) |
include/flatbuffers/util_base64.h
Outdated
// [BASE64]{https://tools.ietf.org/html/rfc4648} | ||
|
||
// a Field doesn't have base64/base64url attribute | ||
static const int kBase64Undefined = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change these into an enum?
include/flatbuffers/util_base64.h
Outdated
static const int kBase64CancelPadding = 4; | ||
|
||
// Helper: FDT is <idl.h>::FieldDef type | ||
template<typename FDT> inline int field_base64_mode(const FDT *fd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make it rely on FieldDef
directly rather than this template trick. You can include idl.h
since this this file is included in places where idl.h
was already included anyway.
include/flatbuffers/util_base64.h
Outdated
return mode; | ||
} | ||
|
||
// calculate number of bytes required to save Decoded result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments are sentences: start with a capital and end with a .
src/idl_gen_text.cpp
Outdated
(void)type; | ||
(void)indent; | ||
auto b64mode = field_base64_mode(fd); | ||
// leave if the Filed doesn't have base64 attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field
src/idl_gen_text.cpp
Outdated
// type T can by any, not only uint8_t | ||
const auto *src = reinterpret_cast<const uint8_t *>(v.data()); | ||
const auto src_size = v.size(); | ||
const auto req_size = Base64EncodedSize(b64mode, src, src_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refactor these functions. Base64EncodedSize
is not needed, since the destination is a vector (or string), so we can simply push_back
. There's no need for an intermediate vector either. Just pass _text
to Base64Encode
.
I'm going to stop my review here since it is clear my comments from last time haven't been addressed. Please confirm you have addressed all my comments (from both reviews) thruout the code before asking me to review again.
33a95b8
to
a1bca88
Compare
@aardappel, Thank you for your review. First part is an interface between the Flatbuffers and a base64 routines. The current interface declared in the “base64_utils.h” and used in the "idl_gen_text.cpp", "idl_parse.cpp" and "test.cpp". The enum “Base64Mode” define working modes for decode/encode routines. The second part is an implementation of base64 routines. There are a lot of implementations of base64 encode/decode routines and anyone can find or write own. |
I'm not sure what all this splitting up buys us. My exact problem with the current PR is that there's an mismatch: the core routines have C-style interfaces that require copying by the user. I proposed the opposite: that the core routines directly work with the types already used by the parser/generator, such that there is less translation steps. So if anything, less splitting up, more integration. |
Ok. I understand your proposal. |
I don't think I understand. Why not just fix this PR with my suggestions? |
I don't know how to implement your suggestions and I need advice.
Proposed:
Future implementation of PrintVector from idl_gen_text.cpp
How to add other types not only UCHAR? |
Yes, you'll likely need templates. Break what encapsulation? There's no need for encapsulation of anything here. |
25fd95a
to
1307c99
Compare
Supported strict RFC4648 Base64 and Base64 with URL and Filename Safe Alphabet. Added two new attributes (base64) and (base64url). Added new option IDLOptions.output_base64_cancel_padding.
@aardappel, could you review this PR again? |
// Takes into account the "opts.base64_cancel_padding" option if a field has | ||
// (base64url) attribute and cancel trailing padding for the output string. | ||
template<typename T> | ||
static inline bool PrintBase64Vector(const Vector<T> &v, Type type, int indent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just make this an if
in the caller rather than template specialization.
const FieldDef *fd); | ||
|
||
// Internal tests for decode/encode routines. | ||
int UtilBase64InerTest(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to test.cpp
const IDLOptions &opts, std::string *_text, | ||
const FieldDef *fd = nullptr) { | ||
// Try to print an array as base64 string. | ||
if (PrintBase64Vector(v, type, indent, opts, _text, fd)) return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't pass it opts
etc, only call this function if base64 is requested.
val.constant = NumToString(off); | ||
} else if ((token_ == kTokenStringConstant) && !attribute_.empty()) { | ||
uoffset_t off; | ||
auto decoded = ParseBase64Vector(val.type.VectorType(), opts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check here if base64 is requested, not inside the function
// Encodes the input byte sequence into the base64 encoded string. | ||
static void Base64Encode(Base64Mode base64_mode, const uint8_t *src, | ||
size_t src_size, std::string *_text) { | ||
const auto req_size = B64EncodeImpl(base64_mode, src, src_size, nullptr, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the complexity in this function is unnecessary. a std::string
can have variable amount of characters added. So remove this code for computing the size, and inline this function with B64EncodeImpl
and simplify. Same for decode. This is the third time I am asking. I will stop my review here. Please do not ask me to review without addressing comments.
I apologize for wasting your time. |
Why don't you simply address my code review suggestions? From your other code you are clearly capable of doing so. The biggest waste of time is creating a feature and then not merging it because you don't want to make changes to your code. |
The problem was with me and my hidden goal. Could we once again return to the beginning of PR?
I believe that these routines should be implemented inside util_base64.cpp without dependencies from flatbuffers. In this case, one can easily replace the encoder/decoder if it is needed. |
So, I know Wouters position is different here. But in my view, the "reserve" facility in STL i exactly designed for the case where you feed in a known quantity of data or need to process that data in-memory. So while I can that in most cases messing with reserve is just code obfuscation, I think that does not apply here. Especially considering that a base64 string could be in the size of many megabytes. |
Now I agree with Wouter. The code should be simple and clean and optimization doesn't help to make the code simple. In the proposed solution, only CreateUninitializedVector can't be avoided. Update: |
All my suggestions are in my past code-review comments. |
Issue #4393
Supported strict RFC4648 Base64 and Base64 with URL and Filename Safe Alphabet.
Added two new attributes (base64) and (base64url).
Added new option IDLOptions.output_base64_cancel_padding.
The Monster schema extended and regenerated for testing purposes.
Problem:
I don't know what to do with Base64SelftTest function.
https://github.com/vglavnyy/flatbuffers/blob/base64-PR/src/util_base64.cpp#L337-L448
This is the test of internal implementations of base64 encoder and decoder.
I can remove it from PR.
@mikkelfj can you review this PR for compatibility with flatcc?