-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add scalar WKB reader/writer #42
feat: add scalar WKB reader/writer #42
Conversation
src/s2geography/wkb_test.cc
Outdated
void wkbRoundTrip(const std::string wktIn, | ||
const std::basic_string<uint8_t> &wkbIn) { | ||
WKTWriter wkt_writer(2); | ||
WKBReader reader; | ||
WKBWriter writer; | ||
|
||
auto geog = reader.read_feature(wkbIn); | ||
auto wkbOut = writer.write_feature(*geog); | ||
auto geogOut = reader.read_feature(wkbOut); | ||
std::string wktOut = wkt_writer.write_feature(*geogOut); | ||
EXPECT_EQ(wktOut, wktIn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit a mess, but due to precision issues(?) I can't exactly compare roundtripped WKB bytes .. So reading the roundtripped WKB in again and then writing to WKT to ensure we have the correct WKT after the 1.5 WKB roundtrip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that strategy is OK (that should be sufficient to check that everything was passed through successfully). We don't quite have a Geography::Equals()
, but in theory a future approach might be to read both the WKT and the WKB and check that the result is equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we shouldn't template it for WKT/WKB
I think that can be a battle for another day unless you feel strongly about it! Reducing repetition for code that is likely to change is good (I don't forsee this code changing much or at all).
As a nit, variables should mostly be snake_case
. I started writing this before generally adopting Arrow C++'s general pattern of methods/functions being UpperCammelCase()
and there is still a little lowerCamelCase
because I wrote some of this not to long after writing a lot of Java code (but I'm trying to phase it out).
src/s2geography/wkb.cc
Outdated
return std::move(out_[0]); | ||
} | ||
|
||
std::unique_ptr<Geography> WKBReader::read_feature(const std::basic_string<uint8_t>& bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since writing std::basic_string<uint8_t>
in GeoArrow tests, I've since realized that this will fail to compile without warnings in modern clang
(something about char_traits and uint8_t
not being a character). Probably std::string_view
is best here (or const uint8_t* data, size_t len
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Your test data can be std::vector<uint8_t>
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am also running into the issue that std::basic_string<uint8_t>
doesn't really work easily with binding that in Python ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we fine with already using C++17 features? (I don't know what the R package requires?) (for std::string_view
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S2 now requires C++17 and so we're free to use it! The R package has been updated to use the latest version of S2 and the version of R that required C++11 is now off the support matrix (as of April).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I made the input take a basic_string_view<uint8_t>
, but then for writing I am still using std::basic_string<uint8_t>
for the return type (given we can't use a view type as return value, AFAIU?)
Now, I don't know how important it is to use uint8
for the binary data, or if we could also use char
and so essentially then std::string_view
/std::string
for input/return values (AFAIU both approaches are used in the wild to represent binary data?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I made the input take a
basic_string_view<uint8_t>
This will be rejected by future clang ( llvm/llvm-project@e30a148 ), if you're curious. We don't have very tight warning flags, which I think is why this is not failing on MacOS CI.
we could also use char
I think this is what you want here (really what you want is std::span<uint8_t>()
, but that's C++20 and we're not there yet). You could return std::vector<uint8_t>
since you don't need a view there and we're making a copy anyway.
given we can't use a view type as return value
In theory you could (store the WKB value as a member of the WKBWriter and return a view of it), but in this case I don't think it's any ore efficient (because the underlying GeoArrow object doesn't expose a view of the memory it reserved for all of this, so there's a copy coming anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I would then just use char
, then isn't the easiest to just return std::string
and have the caller know it not is actually a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, removed any usage of combination of basic_string(_view)
with the non-char type uint8
src/s2geography/wkb_test.cc
Outdated
void wkbRoundTrip(const std::string wktIn, | ||
const std::basic_string<uint8_t> &wkbIn) { | ||
WKTWriter wkt_writer(2); | ||
WKBReader reader; | ||
WKBWriter writer; | ||
|
||
auto geog = reader.read_feature(wkbIn); | ||
auto wkbOut = writer.write_feature(*geog); | ||
auto geogOut = reader.read_feature(wkbOut); | ||
std::string wktOut = wkt_writer.write_feature(*geogOut); | ||
EXPECT_EQ(wktOut, wktIn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that strategy is OK (that should be sufficient to check that everything was passed through successfully). We don't quite have a Geography::Equals()
, but in theory a future approach might be to read both the WKT and the WKB and check that the result is equal.
@paleolimbot good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I lost that thread. Thanks!
This is adding a similar wrapper around the GeoArrow reader/writer as we have for WKT.
(it's actually sufficiently similar code that I am wondering if we shouldn't template it for WKT/WKB ..)