-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic C API for PluralRules #552
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Just some assorted comments.
typedef enum { | ||
PluralRuleType_Cardinal, | ||
PluralRuleType_Ordinal | ||
} PluralRuleType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typedef enum { | |
PluralRuleType_Cardinal, | |
PluralRuleType_Ordinal | |
} PluralRuleType; | |
typedef enum { | |
ICU4X_PluralRuleType_Cardinal, | |
ICU4X_PluralRuleType_Ordinal | |
} ICU4X_PluralRuleType; |
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention for c enums that I'm used to is all caps, but that can be hard on the eyes. Maybe Icu4x_ as a prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Icu4x_ as a prefix?
Mmmm, I don't really like the aesthetics of mixed-case on "ICU4X", especially since the first character "I" is confusable with lowercase letter "l". I'd rather it be all uppercase or all lowercase.
|
||
|
||
CreatePluralRulesResult plural_rules_create(Locale* locale, ErasedDataProvider* provider, PluralRuleType ty); | ||
PluralCategory plural_rules_select(const PluralRules* rules, PluralOperands op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've written PluralOperands
as a data bag (which I agree with, at least for now). We definitely want it in the header file so it can be directly constructed from C, but in the function signature, do we want to pass by value, or should we pass by pointer, just like object types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pointer, to me this seems like idiomatic C:
PluralOperands p;
PluralCategory cat = plural_rules_select(rules, &p);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it doesn't really matter all that much, and I typically try to pass Copy types by value unless they're huge (this one isn't huge)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's that I don't think of PluralOperands as a value type; I think of it as a bag of values, and I expect a bag of values to be passed by reference/pointer.
We probably don't need PluralOperands to implement Copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using it as a Copy type within the bindings though I could call .clone()
. It's just that .clone()
is typically considered to be a heavy operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, good point. PluralOperands can implement Copy but still be passed by reference, right?
I guess it comes down to a personal preference, unless there are performance impacts (i.e., I would guess that passing a 64-bit pointer is faster than Copy
ing a struct that is a bit larger than that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are passed by reference
components/capi/src/pluralrules.rs
Outdated
pub extern "C" fn plural_rules_create( | ||
locale: &ICULocale, | ||
provider: &mut ErasedDataProvider, | ||
ty: PluralRuleType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of where we should think about the implications for data slicing. There are two data keys for PluralRules: CARDINAL and ORDINAL. In Rust, you can pass in the type, and code slicing will remove the unused data key for you. However, in C, you get the whole function, so both data keys are compiled, even if you need only one.
We could:
- Manually make custom entrypoints for each Rust API that have one function per data key, like
icu4x_plural_rules_open_cardinal()
andicu4x_plural_rules_open_ordinal()
- Do something more along the lines of what I suggested in wrapper_layer.md, where the C functions wrap Rust one layer below the ergonomic Rust API.
- Ignore the problem 😢.
- Other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (1.) is fine. I wouldn't include it in the API just yet but we can as needed I think.
A problem will be if we wish to support non-erased data providers, we'll have multiple _create
functions from the angle of the data providers. So I'm wary of a multiplicative blowup in constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally thinking that we would do something more along the lines of:
- The entrypoint APIs like
icu4x_plural_rules_create()
take a data struct, not a data provider - The code to resolve from data provider to data struct is written in the wrapper layer
Since the data provider may be async, this approach is cleaner since it doesn't require that icu4x_plural_rules_create()
also be async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about if we did something along the lines of
ICU4X_Provider_PluralRulesV1* data = icu4x_provider_load_plural_rules_cardinal(provider, ...);
ICU4X_PluralRules* rules = icu4x_plural_rules_create(data, ...);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... this does make it easier for async and stuff.
Do we have any cases where the ICU4X object (like PluralRules) would need to care about the data provider after it has been constructed?
I think we should land the current code as is and file a followup for improving the data provider FFI story, because first off we'd have to add internal APIs for this everywhere.
components/capi/src/pluralrules.rs
Outdated
#[repr(C)] | ||
pub struct PluralOperands { | ||
pub i: u64, | ||
pub v: usize, | ||
pub w: usize, | ||
pub f: u64, | ||
pub t: u64, | ||
pub c: usize, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you duplicating the enums only temporarily, or do you intend that we should always duplicate them instead of adding #[repr(C)]
to the main types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer permanent duplication, the conversions below will stop compiling if you make incompatible changes, and this allows us to potentially change the Rust side while keeping the C side compatible.
My design goal here is that everything relevant to FFI is in this crate itself so people outside this crate don't need to care about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer-term, do you envision auto-generating the FFI enums from Rust enums, or is it by design that you need to update it in both places (to make FFI changes more explicit)?
How do we deal with API docs if we have duplicated structs? If the Rust struct has API docs, what happens with the FFI struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer-term, do you envision auto-generating the FFI enums from Rust enums, or is it by design that you need to update it in both places (to make FFI changes more explicit)?
By design, IMO. I talk about this in the "bridge blocks are self-contained" section of the diplomat
doc. cxx
has this restriction and I think it's a good thing, requiring your tooling to understand your crate graph can be fraught and subject to things moving out from underneath you.
The API docs are a good point. I don't know what to do here, we could have manual comments asking people to update things cross file, but that's not great. We could potentially teach tooling to scrape docs (and only docs) from other parts of the codebase, but I'm super wary of scraping anything other than docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should come up with a story for docs soon, but we don't need it right now. Lack of documentation is a key pain point for ICU and ICU wrappers like PyICU.
Maybe something along the lines of #[doc(include = "...")]
could help?
https://doc.rust-lang.org/unstable-book/language-features/external-doc.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should at least have a hyperlink back to the main Rust type from the docstring.
Really, the place where we need the docs is in the .h
files, and we can't use rustdoc
there anyway. We'll need to come up with a solution for documenting the C header files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would just hyperlink back tbh
I think some of the automated tools can copy docs over as comments (perhaps in ways pandoc will like?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little nervous about letting the C representation drift too far from the Rust structures. There is a Rust library in Firefox with a heavyweight FFI layer that maps between internal Rust types and FFI Rust types that are exposed to C. That has been difficult to test properly, and I think we've had more bugs in that FFI layer than in the Rust or C++ code on either side of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't drift unless intentionally; due to the From
implementation we are essentially checking that they always have identical fields. Any change will cause a compile error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the intentional drift that I was worried about :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good!
typedef enum { | ||
PluralRuleType_Cardinal, | ||
PluralRuleType_Ordinal | ||
} PluralRuleType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention for c enums that I'm used to is all caps, but that can be hard on the eyes. Maybe Icu4x_ as a prefix?
uint64_t f; | ||
uint64_t t; | ||
size_t c; | ||
} PluralOperands; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the number of operands we have I think this is a good approach. The only other thing that comes to mind would be an enum for the possible operand types, and then a struct with the operand type and a union (yucky...) to hold possible values. This would allow callers to pass in an array of values. That might be something to consider in the future if we expose an API with a large number of options.
|
||
typedef struct { | ||
PluralRules* rules; | ||
bool success; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in this case, a nullptr for rules could indicate failure, so success is redundant. If results are always going to be ok or error, we could just use the pointer to indicate this. If not, then maybe we should have an enum of possible results, which could just be Ok/Error for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of adding a proper error enum to this later, the success
is just a placeholder.
components/capi/ctest/Makefile
Outdated
@@ -0,0 +1,25 @@ | |||
# This file is part of ICU4X. For terms of use, please see the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're eventually going to have a lot of examples for the C API, I'm wondering if using make is going to cause us unpleasant experiences in the future. I realize it is generic and widely available, but it isn't that nice to work with. On the other hand, if we choose something else, then it is an extra dependency, and something we need to get working in CI. Offhand, I was thinking of something like meson as an alternative build tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like make very much either, but it is the de-facto C build system so I figured we should use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use CMake so that the MSVC build files are auto-generated for Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not using Make is an option we're willing to consider, we should probably have a discussion on C build systems before we chose one off hand. I've used CMake in the past and while it is definitely nicer to work with than Make, it still has some oddities. Other tools, like Meson, can also generate msvc build files, so it would be nice to determine our requirements and see what is out there that would be usable. I'd like to see us pick something as user friendly as possible for example code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great; but I don't think we necessarily need to block on this just yet -- this is just a quick and dirty test for the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, don't block on this.
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
I renamed:
I decided not to use underscores in the type name just to use consistent style with Rust, since C doesn't quite have any one well-accepted style. |
Pull Request Test Coverage Report for Build 409f2aa5703e52f3c693b9d50215dc0c8c2baaf7-PR-552
💛 - Coveralls |
} | ||
} | ||
|
||
pub fn from_boxed(x: Box<dyn ErasedDataProvider<'static>>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you did this mem::transmute(x)
thing for the ErasedDataProvider trait object instead of Box::into_raw
like you did for PluralRules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Box::into_raw
is T: Sized
😄
there is currently no stable "raw" representation of trait objects. There's a pointer metadata API in the works that will give us this, in the meantime this is safe because we only care about getting the layout right, and mem::transmute
errors if you get the layout wrong.
Codecov Report
@@ Coverage Diff @@
## main #552 +/- ##
=======================================
Coverage ? 72.57%
=======================================
Files ? 123
Lines ? 6644
Branches ? 0
=======================================
Hits ? 4822
Misses ? 1822
Partials ? 0 Continue to review full report at Codecov.
|
Added lots of comments |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! I think it makes sense to land this now and we can iterate on this in follow up work.
That said, I'd like to see #pragma once
replaced with include guards before this lands unless someone has a strong reason to keep them.
I can make that change |
pragmas removed! |
Thanks! Going to file some followups |
Followups:
|
This PR adds an
icu_capi
crate that contains a C API, as well as a simple C test file that can be built and run that is able to callPluralRules::select
with an fs data provider.This does not:
cfg()
stuff to make it possible to pull in selective APIsThese are all things we can add later as we explore writing this C API.
Some things I still need to do:
But it's ready for preliminary review.