Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add scalar WKB reader/writer #42

Merged

Conversation

jorisvandenbossche
Copy link
Collaborator

This is adding a similar wrapper around the GeoArrow reader/writer as we have for WKT.
(it's actually sufficiently similar code that I am wondering if we shouldn't template it for WKT/WKB ..)

Comment on lines 8 to 18
void wkbRoundTrip(const std::string wktIn,
const std::basic_string<uint8_t> &wkbIn) {
WKTWriter wkt_writer(2);
WKBReader reader;
WKBWriter writer;

auto geog = reader.read_feature(wkbIn);
auto wkbOut = writer.write_feature(*geog);
auto geogOut = reader.read_feature(wkbOut);
std::string wktOut = wkt_writer.write_feature(*geogOut);
EXPECT_EQ(wktOut, wktIn);
Copy link
Collaborator Author

@jorisvandenbossche jorisvandenbossche Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit a mess, but due to precision issues(?) I can't exactly compare roundtripped WKB bytes .. So reading the roundtripped WKB in again and then writing to WKT to ensure we have the correct WKT after the 1.5 WKB roundtrip

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that strategy is OK (that should be sufficient to check that everything was passed through successfully). We don't quite have a Geography::Equals(), but in theory a future approach might be to read both the WKT and the WKB and check that the result is equal.

Copy link
Owner

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we shouldn't template it for WKT/WKB

I think that can be a battle for another day unless you feel strongly about it! Reducing repetition for code that is likely to change is good (I don't forsee this code changing much or at all).

As a nit, variables should mostly be snake_case. I started writing this before generally adopting Arrow C++'s general pattern of methods/functions being UpperCammelCase() and there is still a little lowerCamelCase because I wrote some of this not to long after writing a lot of Java code (but I'm trying to phase it out).

return std::move(out_[0]);
}

std::unique_ptr<Geography> WKBReader::read_feature(const std::basic_string<uint8_t>& bytes) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since writing std::basic_string<uint8_t> in GeoArrow tests, I've since realized that this will fail to compile without warnings in modern clang (something about char_traits and uint8_t not being a character). Probably std::string_view is best here (or const uint8_t* data, size_t len).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Your test data can be std::vector<uint8_t>)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am also running into the issue that std::basic_string<uint8_t> doesn't really work easily with binding that in Python ..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we fine with already using C++17 features? (I don't know what the R package requires?) (for std::string_view)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S2 now requires C++17 and so we're free to use it! The R package has been updated to use the latest version of S2 and the version of R that required C++11 is now off the support matrix (as of April).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I made the input take a basic_string_view<uint8_t>, but then for writing I am still using std::basic_string<uint8_t> for the return type (given we can't use a view type as return value, AFAIU?)

Now, I don't know how important it is to use uint8 for the binary data, or if we could also use char and so essentially then std::string_view/std::string for input/return values (AFAIU both approaches are used in the wild to represent binary data?)

Copy link
Owner

@paleolimbot paleolimbot Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I made the input take a basic_string_view<uint8_t>

This will be rejected by future clang ( llvm/llvm-project@e30a148 ), if you're curious. We don't have very tight warning flags, which I think is why this is not failing on MacOS CI.

we could also use char

I think this is what you want here (really what you want is std::span<uint8_t>(), but that's C++20 and we're not there yet). You could return std::vector<uint8_t> since you don't need a view there and we're making a copy anyway.

given we can't use a view type as return value

In theory you could (store the WKB value as a member of the WKBWriter and return a view of it), but in this case I don't think it's any ore efficient (because the underlying GeoArrow object doesn't expose a view of the memory it reserved for all of this, so there's a copy coming anyway).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I would then just use char, then isn't the easiest to just return std::string and have the caller know it not is actually a string?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, removed any usage of combination of basic_string(_view) with the non-char type uint8

Comment on lines 8 to 18
void wkbRoundTrip(const std::string wktIn,
const std::basic_string<uint8_t> &wkbIn) {
WKTWriter wkt_writer(2);
WKBReader reader;
WKBWriter writer;

auto geog = reader.read_feature(wkbIn);
auto wkbOut = writer.write_feature(*geog);
auto geogOut = reader.read_feature(wkbOut);
std::string wktOut = wkt_writer.write_feature(*geogOut);
EXPECT_EQ(wktOut, wktIn);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that strategy is OK (that should be sufficient to check that everything was passed through successfully). We don't quite have a Geography::Equals(), but in theory a future approach might be to read both the WKT and the WKB and check that the result is equal.

src/s2geography/wkb_test.cc Show resolved Hide resolved
@jorisvandenbossche
Copy link
Collaborator Author

@paleolimbot good to go?

Copy link
Owner

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I lost that thread. Thanks!

@jorisvandenbossche jorisvandenbossche merged commit 4495eb8 into paleolimbot:main Nov 16, 2024
4 checks passed
@jorisvandenbossche jorisvandenbossche deleted the wkb-reader-writer branch November 16, 2024 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants